Mark Watson’s Artificial Intelligence Books and Blog

Share this post

Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

markwatson.com

Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

Mark Watson
Jun 5, 2004
Share
Share this post

Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

markwatson.com

I have spent more time than I would like to admit writing Java code to pull plain text from Microsoft Word, PowerPoint, etc. files. This morning, I added support for reading OpenOffice.org documents to my Knowledge Management system: easy!

It took about 15 minutes of coding: used the ZipFile API to read the top level document file, and found the ZIP entry labeled "content.xml", got an input stream for this ZIP entry, fed it to a custom SAX parser class that simply aggregated character data inside <text:p> tags.

Share
Share this post

Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

markwatson.com
Comments
Top
New

No posts

Ready for more?

© 2023 Mark Watson
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing