Friday, June 20, 2008
Very nice: Elsevier IJCAI AI Journal articles now available for free as PDFs
This requires a free sign up process. I just registered and I found a dozen articles on topics of interest to me that are now in my reading queue.
New version of Numenta software is available
I prefer open source software - the license applied to an open source software project lets me know up front what my usage rights and obligations are.
The Numenta Platform for Intelligent Computing (NuPIC) is free for academic and non-commercial research use, but there is so far no definitive information on commercial licensing costs. As a result of this, even though I like the ideas behind NuPIC, I spend relatively little time playing with the examples. I very much enjoyed Jeff Hawkin's book On Intelligence
(Amazon purchase link) and I will probably devote a lot more time to experimenting with NuPIC when all of the licensing issues are nailed down.
The Numenta Platform for Intelligent Computing (NuPIC) is free for academic and non-commercial research use, but there is so far no definitive information on commercial licensing costs. As a result of this, even though I like the ideas behind NuPIC, I spend relatively little time playing with the examples. I very much enjoyed Jeff Hawkin's book On Intelligence
Saturday, March 29, 2008
Protégé OWL Ontology Editor
I installed Protégé version 4 alpha last night and it has been solid for me so far. It has been over a year since I upgraded my local Protégé installation, and I like these (new ?) features a lot:
Long term, I would like a semi-automatic tool for populating ontologies via custom scrapper libraries. I say "semi-automatic" because it would be useful to integrate with Protégé for manual editing and browsing, while supporting external applications accessing data read-only (?) via the Java OWL APIs.
- Saved XML for OWL ontologies is very readable, with good automatically generated comments and a nice layout
- Use of the Java OWL API
- Both Fact++ (using JNI) and Pellet 1.5 are smoothly integrated
- The Owlviz Plug-in seems to display graphs faster
- Drag and drop can be used rearrange class hierarchies
Long term, I would like a semi-automatic tool for populating ontologies via custom scrapper libraries. I say "semi-automatic" because it would be useful to integrate with Protégé for manual editing and browsing, while supporting external applications accessing data read-only (?) via the Java OWL APIs.
Labels: Java, OWL, semantic web
Saturday, February 23, 2008
Ruby API for accessing Freebase/Metaweb structured data
I had a good talk with some of the Metaweb developers last year and started playing with their Python APIs for accessing structured data. I wanted to be able to use this structured data source in a planned Ruby project and was very pleased to see Christopher Eppstein's new project that provides an ActiveRecord style API on top of Freebase. Here is the web page for Christopher's Freebase API project. Assuming that you do a "gem install freebase", using this API is easy; some examples:
require 'rubygems'You will want to use this API interactively: use the Freebase web site to find type hierarchies that you are interested in, fetch the first object matching a type hierarchy (e.g., Types -> Astronomy -> Asteroid) and pretty print the fetched object to see what data fields are available.
require "freebase"
require 'pp'
an_asteroid = Freebase::Types::Astronomy::Asteroid.find(:first)
#pp "an_asteroid:", an_asteroid
puts "name of asteroid=#{an_asteroid.name}"
puts "spectral type=#{an_asteroid.spectral_type[0].name}"
#all_asteroids = Freebase::Types::Astronomy::Asteroid.find(:all)
#pp "all_asteroids:", all_asteroids
a_company = Freebase::Types::Business::Company.find(:first)
#pp "a_company:", a_company
puts "name=#{a_company.name}"
puts "parent company name=#{a_company.parent_company[0].name}"
Labels: knowledge representation, Ruby, semantic web
My OpenCalais Ruby client library
Reuters has a great attitude about openly sharing data and technology. About 8 years ago, I obtained a free license for their 1.2 gigabytes of semantically tagged news corpus text - very useful for automated training of my KBtextmaster system as well as other work.
Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
# Copyright Mark Watson 2008. All rights reserved.Notice that this code expects an environment variable to be set with your OpenCalais access key - you can just hardwire your key in this code if you want. Here is some sample use:
# Can be used under either the Apache 2 or the LGPL licenses.
require 'simple_http'
require "rexml/document"
include REXML
require 'pp'
MY_KEY = ENV["OPEN_CALAIS_KEY"]
raise(StandardError,"Set Open Calais login key in ENV: 'OPEN_CALAIS_KEY'") if !MY_KEY
PARAMS = "¶msXML=" + CGI.escape('<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><c:processingDirectives c:contentType="text/txt" c:outputFormat="xml/rdf"></c:processingDirectives><c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="17cabs901" c:submitter="ABC"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>')
class OpenCalaisTaggedText
def initialize text=""
data = "licenseID=#{MY_KEY}&content=" + CGI.escape(text)
http = SimpleHttp.new "http://api.opencalais.com/enlighten/calais.asmx/Enlighten"
@response = CGI.unescapeHTML(http.post(data+PARAMS))
end
def get_tags
h = {}
index1 = @response.index('terms of service.-->')
index1 = @response.index('<!--', index1)
index2 = @response.index('-->', index1)
txt = @response[index1+4..index2-1]
lines = txt.split("\n")
lines.each {|line|
index = line.index(":")
h[line[0...index]] = line[index+1..-1].split(',').collect {|x| x.strip} if index
}
h
end
def get_semantic_XML
@response
end
def pp_semantic_XML
Document.new(@response).write($stdout, 0)
end
end
tt = OpenCalaisTaggedText.new("President George Bush and Tony Blair spoke to Congress")
pp "tags:", tt.get_tags
pp "Semantic XML:", tt.get_semantic_XML
puts "Semantic XML pretty printed:"
tt.pp_semantic_XMLThe tags print as:"tags:"OpenCalais looks like a great service. I am planning on using their service for a technology demo, merging in some of my own semantic text processing tools. I might also use their service for training other machine learning based systems. Reuters will also offer a commercial version with guaranteed service, etc.
{"Organization"=>["Congress"],
"Person"=>["George Bush", "Tony Blair"],
"Relations"=>["PersonPolitical"]}
Labels: NLP, Ruby, semantic web
Monday, February 04, 2008
NLTK: The Natural Language Toolkit
I have a 22 year history of working with natural language processing, but for the most part this was a low level of effort (perhaps averaging 3 to 5 weeks a year). For learning (and perhaps for some production work if you extract the parts that you need) I can very much recommend NLTK.
NLTK developers (or aggregators since NLTK is an aggregate of smaller projects, with a lot of new work added) Steven Bird, Ewan Klein, and Edward Loper are writing a complete book on NLP using NLTK that looks good. I wish that I had a good resource like this 22 years ago!
NLTK developers (or aggregators since NLTK is an aggregate of smaller projects, with a lot of new work added) Steven Bird, Ewan Klein, and Edward Loper are writing a complete book on NLP using NLTK that looks good. I wish that I had a good resource like this 22 years ago!
Tuesday, January 22, 2008
texai.org
Stephen Reed is doing some interesting research at texai.org. This project is interesting to me because there is a lot of overlap with my own KBSportal.com project (which I am currently re-writing in Ruby and Ruby on Rails: the first two versions were in Common Lisp and Java): RDF, Sesame, knowledge representation, and NLP.
For me, working on KBSportal.com has been a learning process, and I think that texai also serves that purpose for Stephen.
For me, working on KBSportal.com has been a learning process, and I think that texai also serves that purpose for Stephen.
Subscribe to Posts [Atom]