Saturday, June 23, 2007

Good video: Knowledge Representation and the Semantic Web

Peter Patel-Schneider gave this talk at Google January, 2006.

Peter Patel-Schneider has an interesting perspective: while he has been very involved with Semantic Web technologies like OWL and descriptive logic reasoners (author of the Classic system), he also appears to have a reasonable amount of skepticism. His talk is good because he starts off clearly explaining RDF, RDFS, and the improved formalism and expressiveness of OWL. I especially enjoyed his summarization of different types of logic, what their computational limitations and capabilities are.

I have mixed feeling about trying to implement the Semantic Web using a formal semantically rich language like OWL: I am concerned that encoding information in OWL simply takes too much effort. At the other end of the complexity spectrum, tagging is a very light weight expression of the Semantic Web but tagging is inadequate (but easy enough that many people make the effort).

I have a short list of Semantic Web resources that I find useful.

Tuesday, June 12, 2007

N-GRAM analysis using Ruby

I dusted off some old code today to look at common word pairs in some customer data. NGRAM analysis finds the most common bi-grams (2 word combinations), tri-grams (3 word combinations), etc. The code is simple, and I share it here in case you ever need to do the same thing:
require 'zip/zipfilesystem'

def words text
text.downcase.scan(/[a-z]+/)
end

Zip::ZipFile.open('../text.txt.zip') { |zipFile| # training data
$words = words(zipFile.read('text.txt')) # is in a ZIP file
}

bi_grams = Hash.new(0)
tri_grams = Hash.new(0)

num = $words.length - 2
num.times {|i|
bi = $words[i] + ' ' + $words[i+1]
tri = bi + ' ' + $words[i+2]
bi_grams[bi] += 1
tri_grams[tri] += 1
}

puts "bi-grams:"
bb = bi_grams.sort{|a,b| b[1] <=> a[1]}
(num / 10).times {|i| puts "#{bb[i][0]} : #{bb[i][1]}"}
puts "tri-grams:"
tt = tri_grams.sort{|a,b| b[1] <=> a[1]}
(num / 10).times {|i| puts "#{tt[i][0]} : #{tt[i][1]}"}
Output might look like this:
bi-grams:
in the : 561
in Java : 213
...
tri-grams:
in the code : 119
Java source code : 78
...
Cool stuff. Ruby is my favorite language for tool building.

Sunday, June 10, 2007

I have a new page on Knowledge Management

I have a new page on Knowledge Management where I share some philosophy on why I believe that open source plays such an important role in building effective KM systems. I believe that as much as possible it is best to use available open source infrastructure software and use available resources for "value add" user modeling and AI components. I list on the right side of this web page some of the better free resources for doing KM work.

Sunday, June 03, 2007

The URL for this blog has changed

Please use the new URL http://markwatson.com/aiblog/ The old URL will redirect.

The new RSS feed is: http://artificial-intelligence-theory.blogspot.com/atom.xml

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [ATOM]