Saturday, February 23, 2008
My OpenCalais Ruby client library
Reuters has a great attitude about openly sharing data and technology. About 8 years ago, I obtained a free license for their 1.2 gigabytes of semantically tagged news corpus text - very useful for automated training of my KBtextmaster system as well as other work.
Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
# Copyright Mark Watson 2008. All rights reserved.Notice that this code expects an environment variable to be set with your OpenCalais access key - you can just hardwire your key in this code if you want. Here is some sample use:
# Can be used under either the Apache 2 or the LGPL licenses.
require 'simple_http'
require "rexml/document"
include REXML
require 'pp'
MY_KEY = ENV["OPEN_CALAIS_KEY"]
raise(StandardError,"Set Open Calais login key in ENV: 'OPEN_CALAIS_KEY'") if !MY_KEY
PARAMS = "¶msXML=" + CGI.escape('<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><c:processingDirectives c:contentType="text/txt" c:outputFormat="xml/rdf"></c:processingDirectives><c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="17cabs901" c:submitter="ABC"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>')
class OpenCalaisTaggedText
def initialize text=""
data = "licenseID=#{MY_KEY}&content=" + CGI.escape(text)
http = SimpleHttp.new "http://api.opencalais.com/enlighten/calais.asmx/Enlighten"
@response = CGI.unescapeHTML(http.post(data+PARAMS))
end
def get_tags
h = {}
index1 = @response.index('terms of service.-->')
index1 = @response.index('<!--', index1)
index2 = @response.index('-->', index1)
txt = @response[index1+4..index2-1]
lines = txt.split("\n")
lines.each {|line|
index = line.index(":")
h[line[0...index]] = line[index+1..-1].split(',').collect {|x| x.strip} if index
}
h
end
def get_semantic_XML
@response
end
def pp_semantic_XML
Document.new(@response).write($stdout, 0)
end
end
tt = OpenCalaisTaggedText.new("President George Bush and Tony Blair spoke to Congress")
pp "tags:", tt.get_tags
pp "Semantic XML:", tt.get_semantic_XML
puts "Semantic XML pretty printed:"
tt.pp_semantic_XMLThe tags print as:"tags:"OpenCalais looks like a great service. I am planning on using their service for a technology demo, merging in some of my own semantic text processing tools. I might also use their service for training other machine learning based systems. Reuters will also offer a commercial version with guaranteed service, etc.
{"Organization"=>["Congress"],
"Person"=>["George Bush", "Tony Blair"],
"Relations"=>["PersonPolitical"]}
Labels: NLP, Ruby, semantic web
Comments:
<< Home
But its not perfect:
This works
Lehman Brothers analyst downgrades Amerigroup Corp.
This does not (first time-it seems to learn ? perfectly?)
Lehman Brothers downgrades Amerigroup Corp.
Lehman Brothers slahes Amerigroup Corp. price target.
Do your tools help with this?
This works
Lehman Brothers analyst downgrades Amerigroup Corp.
This does not (first time-it seems to learn ? perfectly?)
Lehman Brothers downgrades Amerigroup Corp.
Lehman Brothers slahes Amerigroup Corp. price target.
Do your tools help with this?
Hello Peter, to answer your question: no. My client library adds no extra functionality to OpenCalais - it is meant to allow Ruby clients to use OpenCalais as it is.
i did notice that this tool does not (thanks for it by the way, it works very nicely) but I believe that I somewhere on your web site read that you were planning to hook it up to your semantical tools and I was referring to those. Thanks.
Post a Comment
<< Home
Subscribe to Posts [Atom]

