Mark Watson’s Artificial Intelligence Books and Blog

Share this post

Archiving data (semantic web, business, etc.) in XML

markwatson.com

Archiving data (semantic web, business, etc.) in XML

Mark Watson
Aug 18, 2004
Share
Share this post

Archiving data (semantic web, business, etc.) in XML

markwatson.com

The other night I needed some data that I had processed a few years ago - no problem; I have been archiving data in adhoc XML documents for years. I say adhoc because I usually don't use a DTD or Schema to define structure or to validate XML - instead, I write a program that collects and/or processes data and writes directly to well formed XML files - format determined by the application.

The important thing is that I can look at an old XML data file, see the format that I used, and in a minute or two have a little code that uses a SAX type parser to get out what I need. I have used XML files for:

  • Data scraped from the web matching board of directors members with companies (used for an experiment to detect interlocking board members)

  • Data form the CIA World Fact Book for countries

  • US State and city names

  • Categorization data from training on the 2 gigabyte Reuter's news story corpus

  • etc.

I used to keep data in a relational database - handy for adhoc queries, etc., but now I favor simply archiving interesting data in XML files.

I have thought about setting up a repository of free interesting data in XML - hopefully if I share with others then I will get some interesting stuff back in return. That is on my to-do list :-)

Share
Share this post

Archiving data (semantic web, business, etc.) in XML

markwatson.com
Comments
Top
New

No posts

Ready for more?

© 2023 Mark Watson
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing