Why the ODF is better than Microsoft's document formats
It takes a few lines of Ruby code to process OpenOffice.org document files:
require 'rubygems'
require 'rexml/document'
require 'rexml/streamlistener'
require 'zip/zipfilesystem' # install with gem
include REXML
class OOXmlHandler
include StreamListener
attr_reader :plain_text
def initialize; @plain_text = ""; @last_tag_name =""; end
def tag_start name, attrs; @last_tag_name = name; end
def text s
@plain_text << s << "\n" if @last_tag_name.index('text')
end
end
class ReadOpenOffice
attr_reader :text
def initialize file_path
Zip::ZipFile.open(file_path) { |zipFile|
xml_handler = OOXmlHandler.new
Document.parse_stream((zipFile.read('content.xml')), xml_handler)
@text = xml_handler.plain_text
}
end
end
puts ReadOpenOffice.new('KBrecipes.odt').text
I have spent too much of my time over the last 10 years dealing "programatically" with Microsoft document formats. I am tired of wasting my time when open document formats are so much easier and less expensive to use.