[Reader-list] .Net / Hailstorm Initiative
Monica Narula
monica at sarai.net
Fri Jul 6 14:41:42 IST 2001
>It shouldn't get parsed by any other parser then mso 9 (Microsoft Office 9,
>aka MS Office 2000) since that's what the if statement in the beginning
>seems to say. If you take that if statement out you'll end up with:
>
><xml>
> <o:DocumentProperties>
> <o:Author>Menso Heus</o:Author>
> <o:LastAuthor>Menso Heus</o:LastAuthor>
> <o:Revision>1</o:Revision>
> <o:TotalTime>0</o:TotalTime>
> <o:Created>2001-07-06T00:23:00Z</o:Created>
> <o:LastSaved>2001-07-06T00:23:00Z</o:LastSaved>
> <o:Pages>1</o:Pages>
> <o:Company>None</o:Company>
> <o:Lines>1</o:Lines>
> <o:Paragraphs>1</o:Paragraphs>
> <o:Version>9.4119</o:Version>
> </o:DocumentProperties>
></xml>
>
>Which is, as far as I know, perfectly good XML and should be parseable by any
>parser! The confusion however, kept alive by people who don't fully research
>the 'problem' or what MS wanted to do with XML, is totally different.
This you say should work and be parsed by any other parser. Perhaps.
Theoretically.
But we have experienced this quite the opposite. When a word document
is saved as html, its not just the metadata which is encoded in the
unparseable way. The entire document is inherently replete with code
that cannot be removed without destroying the style of the document.
So, removing the top tag does nothing - and I was not able to run the
document on any other platform (e.g. Mozilla).
This unfortunately increased my workload once by about 10 times as
all the documents that i had to make browser-compatible had to be
remade (and re-formatted!) in html!
--
Monica Narula
Sarai:The New Media Initiative
29 Rajpur Road, Delhi 110 054
www.sarai.net
More information about the reader-list
mailing list