[Reader-list] .Net / Hailstorm Initiative

Fri Jul 6 14:41:42 IST 2001

>It shouldn't get parsed by any other parser then mso 9 (Microsoft Office 9,
>aka MS Office 2000) since that's what the if statement in the beginning
>seems to say. If you take that if statement out you'll end up with:
>
><xml>
>  <o:DocumentProperties>
>   <o:Author>Menso Heus</o:Author>
>   <o:LastAuthor>Menso Heus</o:LastAuthor>
>   <o:Revision>1</o:Revision>
>   <o:TotalTime>0</o:TotalTime>
>   <o:Created>2001-07-06T00:23:00Z</o:Created>
>   <o:LastSaved>2001-07-06T00:23:00Z</o:LastSaved>
>   <o:Pages>1</o:Pages>
>   <o:Company>None</o:Company>
>   <o:Lines>1</o:Lines>
>   <o:Paragraphs>1</o:Paragraphs>
>   <o:Version>9.4119</o:Version>
>  </o:DocumentProperties>
></xml>
>
>Which is, as far as I know, perfectly good XML and should be parseable by any
>parser! The confusion however, kept alive by people who don't fully research
>the 'problem' or what MS wanted to do with XML, is totally different.

This you say should work and be parsed by any other parser. Perhaps. 
Theoretically.

But we have experienced this quite the opposite. When a word document 
is saved as html, its not just the metadata which is encoded in the 
unparseable way. The entire document is inherently replete with code 
that cannot be removed without destroying the style of the document. 
So, removing the top tag does nothing - and I was not able to run the 
document on any other platform (e.g. Mozilla).

This unfortunately increased my workload once by about 10 times as 
all the documents that i had to make browser-compatible had to be 
remade (and re-formatted!) in html!

-- 
Monica Narula
Sarai:The New Media Initiative
29 Rajpur Road, Delhi 110 054
www.sarai.net