Saturday, February 23, 2008
Ruby API for accessing Freebase/Metaweb structured data
I had a good talk with some of the Metaweb developers last year and started playing with their Python APIs for accessing structured data. I wanted to be able to use this structured data source in a planned Ruby project and was very pleased to see Christopher Eppstein's new project that provides an ActiveRecord style API on top of Freebase. Here is the web page for Christopher's Freebase API project. Assuming that you do a "gem install freebase", using this API is easy; some examples:
require 'rubygems'You will want to use this API interactively: use the Freebase web site to find type hierarchies that you are interested in, fetch the first object matching a type hierarchy (e.g., Types -> Astronomy -> Asteroid) and pretty print the fetched object to see what data fields are available.
require "freebase"
require 'pp'
an_asteroid = Freebase::Types::Astronomy::Asteroid.find(:first)
#pp "an_asteroid:", an_asteroid
puts "name of asteroid=#{an_asteroid.name}"
puts "spectral type=#{an_asteroid.spectral_type[0].name}"
#all_asteroids = Freebase::Types::Astronomy::Asteroid.find(:all)
#pp "all_asteroids:", all_asteroids
a_company = Freebase::Types::Business::Company.find(:first)
#pp "a_company:", a_company
puts "name=#{a_company.name}"
puts "parent company name=#{a_company.parent_company[0].name}"
Labels: knowledge representation, Ruby, semantic web
My OpenCalais Ruby client library
Reuters has a great attitude about openly sharing data and technology. About 8 years ago, I obtained a free license for their 1.2 gigabytes of semantically tagged news corpus text - very useful for automated training of my KBtextmaster system as well as other work.
Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
# Copyright Mark Watson 2008. All rights reserved.Notice that this code expects an environment variable to be set with your OpenCalais access key - you can just hardwire your key in this code if you want. Here is some sample use:
# Can be used under either the Apache 2 or the LGPL licenses.
require 'simple_http'
require "rexml/document"
include REXML
require 'pp'
MY_KEY = ENV["OPEN_CALAIS_KEY"]
raise(StandardError,"Set Open Calais login key in ENV: 'OPEN_CALAIS_KEY'") if !MY_KEY
PARAMS = "¶msXML=" + CGI.escape('<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><c:processingDirectives c:contentType="text/txt" c:outputFormat="xml/rdf"></c:processingDirectives><c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="17cabs901" c:submitter="ABC"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>')
class OpenCalaisTaggedText
def initialize text=""
data = "licenseID=#{MY_KEY}&content=" + CGI.escape(text)
http = SimpleHttp.new "http://api.opencalais.com/enlighten/calais.asmx/Enlighten"
@response = CGI.unescapeHTML(http.post(data+PARAMS))
end
def get_tags
h = {}
index1 = @response.index('terms of service.-->')
index1 = @response.index('<!--', index1)
index2 = @response.index('-->', index1)
txt = @response[index1+4..index2-1]
lines = txt.split("\n")
lines.each {|line|
index = line.index(":")
h[line[0...index]] = line[index+1..-1].split(',').collect {|x| x.strip} if index
}
h
end
def get_semantic_XML
@response
end
def pp_semantic_XML
Document.new(@response).write($stdout, 0)
end
end
tt = OpenCalaisTaggedText.new("President George Bush and Tony Blair spoke to Congress")
pp "tags:", tt.get_tags
pp "Semantic XML:", tt.get_semantic_XML
puts "Semantic XML pretty printed:"
tt.pp_semantic_XMLThe tags print as:"tags:"OpenCalais looks like a great service. I am planning on using their service for a technology demo, merging in some of my own semantic text processing tools. I might also use their service for training other machine learning based systems. Reuters will also offer a commercial version with guaranteed service, etc.
{"Organization"=>["Congress"],
"Person"=>["George Bush", "Tony Blair"],
"Relations"=>["PersonPolitical"]}
Labels: NLP, Ruby, semantic web
Wednesday, October 04, 2006
Software environments for working on AI projects
In the new global economy of driving production and service costs towards zero, it makes a lot of sense for computer scientists to learn specialized skills to differentiate themselves in the marketplace. Since you are reading this blog I assume that you are interested in learning more about AI so I thought that I would list the AI development environments that I have found to be particularly useful - and a lot of them are free.
Classic AI Languages
Although not strictly required for work in AI, a few AI oriented languages have proven especially useful in the past: Lisp, Scheme, and Prolog. Scheme is a great language but suffers from an "embarrassment of riches": there are almost too many fine implementations available to choose from. That said, I would recommend the excellent and free DrScheme and MzScheme as a very good place to start because it is supported by a repository of useful libraries that are very easy to install. If you want to mix logic programming with Scheme then the following book (with examples that work with DrScheme) is recommended: The Reasoned Schemer
If you want to use Common Lisp (which is what I use for most of my AI development consulting) there are two commercial products that are very good and have free (non-commercial only!) versions: Franz Lisp and LispWorks. There is no need however to stick just with commercial offerings: SBCL (MIT license) and CLisp (GPL license) are two good choices among many.
If you want to use Prolog, the open source (LGPL) SWI-prolog and the commercial Amzi Prolog are both excellent choices and have lots of third party libraries.
Scripting Languages
I have found two scripting scripting languages to be particularly useful for AI projects: Ruby and Python. Python has more third party libraries and projects for AI but I personally enjoy developing in Ruby.
Pick an environment and stick with it
Believe it or not, I tend to follow this advice myself: I tend to use one language for a year or so, and then switch (usually because of customer preference or the availability of a great library written in one specific language). It pays to take the time to master one language and environment, then use that environment a lot.
So my advice is to spend just a few hours each with a few of my suggestions in order to pick one to learn really well. Once you pick a language stick with it until you master it.
Classic AI Languages
Although not strictly required for work in AI, a few AI oriented languages have proven especially useful in the past: Lisp, Scheme, and Prolog. Scheme is a great language but suffers from an "embarrassment of riches": there are almost too many fine implementations available to choose from. That said, I would recommend the excellent and free DrScheme and MzScheme as a very good place to start because it is supported by a repository of useful libraries that are very easy to install. If you want to mix logic programming with Scheme then the following book (with examples that work with DrScheme) is recommended: The Reasoned Schemer
If you want to use Common Lisp (which is what I use for most of my AI development consulting) there are two commercial products that are very good and have free (non-commercial only!) versions: Franz Lisp and LispWorks. There is no need however to stick just with commercial offerings: SBCL (MIT license) and CLisp (GPL license) are two good choices among many.
If you want to use Prolog, the open source (LGPL) SWI-prolog and the commercial Amzi Prolog are both excellent choices and have lots of third party libraries.
Scripting Languages
I have found two scripting scripting languages to be particularly useful for AI projects: Ruby and Python. Python has more third party libraries and projects for AI but I personally enjoy developing in Ruby.
Pick an environment and stick with it
Believe it or not, I tend to follow this advice myself: I tend to use one language for a year or so, and then switch (usually because of customer preference or the availability of a great library written in one specific language). It pays to take the time to master one language and environment, then use that environment a lot.
So my advice is to spend just a few hours each with a few of my suggestions in order to pick one to learn really well. Once you pick a language stick with it until you master it.
Labels: Lisp, Prolog, Ruby, Scheme
Subscribe to Posts [Atom]
