Saturday, March 29, 2008

Protégé OWL Ontology Editor

I installed Protégé version 4 alpha last night and it has been solid for me so far. It has been over a year since I upgraded my local Protégé installation, and I like these (new ?) features a lot:I have started working again on an old KnowledgeBooks project: an OWL ontology for news stories and associated "Semantic Scrappers" to populate the ontology from plain texts of news stories. I am still in the experimentation stage for the semantic scrapper: I am trying to decide between pure Ruby code, Java using the available OWL APIs, or a combination of JRuby and the Java OWL APIs. I am currently playing with these three options - for now, I am in no hurry to choose a single option. Using a dynamic language like Ruby has a lot of advantages as far as generating "OWL classes" automatically from an ontology, etc. That said, the Java language has the best semantic web tools and libraries.

Long term, I would like a semi-automatic tool for populating ontologies via custom scrapper libraries. I say "semi-automatic" because it would be useful to integrate with Protégé for manual editing and browsing, while supporting external applications accessing data read-only (?) via the Java OWL APIs.

Labels: , ,


Saturday, February 23, 2008

Ruby API for accessing Freebase/Metaweb structured data

I had a good talk with some of the Metaweb developers last year and started playing with their Python APIs for accessing structured data. I wanted to be able to use this structured data source in a planned Ruby project and was very pleased to see Christopher Eppstein's new project that provides an ActiveRecord style API on top of Freebase. Here is the web page for Christopher's Freebase API project. Assuming that you do a "gem install freebase", using this API is easy; some examples:
require 'rubygems'
require "freebase"
require 'pp'

an_asteroid = Freebase::Types::Astronomy::Asteroid.find(:first)
#pp "an_asteroid:", an_asteroid
puts "name of asteroid=#{an_asteroid.name}"
puts "spectral type=#{an_asteroid.spectral_type[0].name}"

#all_asteroids = Freebase::Types::Astronomy::Asteroid.find(:all)
#pp "all_asteroids:", all_asteroids

a_company = Freebase::Types::Business::Company.find(:first)
#pp "a_company:", a_company
puts "name=#{a_company.name}"
puts "parent company name=#{a_company.parent_company[0].name}"
You will want to use this API interactively: use the Freebase web site to find type hierarchies that you are interested in, fetch the first object matching a type hierarchy (e.g., Types -> Astronomy -> Asteroid) and pretty print the fetched object to see what data fields are available.

Labels: , ,


My OpenCalais Ruby client library

Reuters has a great attitude about openly sharing data and technology. About 8 years ago, I obtained a free license for their 1.2 gigabytes of semantically tagged news corpus text - very useful for automated training of my KBtextmaster system as well as other work.

Reuters has done it again, releasing free access to OpenCalias semantic text processing web services. If you sign up for a free access key (good for 20,000 uses a day of their web services), then you can use my Ruby client library:
# Copyright Mark Watson 2008. All rights reserved.
# Can be used under either the Apache 2 or the LGPL licenses.

require 'simple_http'

require "rexml/document"
include REXML

require 'pp'

MY_KEY = ENV["OPEN_CALAIS_KEY"]
raise(StandardError,"Set Open Calais login key in ENV: 'OPEN_CALAIS_KEY'") if !MY_KEY

PARAMS = "¶msXML=" + CGI.escape('<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><c:processingDirectives c:contentType="text/txt" c:outputFormat="xml/rdf"></c:processingDirectives><c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="17cabs901" c:submitter="ABC"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>')

class OpenCalaisTaggedText
def initialize text=""
data = "licenseID=#{MY_KEY}&content=" + CGI.escape(text)
http = SimpleHttp.new "http://api.opencalais.com/enlighten/calais.asmx/Enlighten"
@response = CGI.unescapeHTML(http.post(data+PARAMS))
end
def get_tags
h = {}
index1 = @response.index('terms of service.-->')
index1 = @response.index('<!--', index1)
index2 = @response.index('-->', index1)
txt = @response[index1+4..index2-1]
lines = txt.split("\n")
lines.each {|line|
index = line.index(":")
h[line[0...index]] = line[index+1..-1].split(',').collect {|x| x.strip} if index
}
h
end
def get_semantic_XML
@response
end
def pp_semantic_XML
Document.new(@response).write($stdout, 0)
end
end
Notice that this code expects an environment variable to be set with your OpenCalais access key - you can just hardwire your key in this code if you want. Here is some sample use:
tt = OpenCalaisTaggedText.new("President George Bush and Tony Blair spoke to Congress")

pp "tags:", tt.get_tags
pp "Semantic XML:", tt.get_semantic_XML
puts "Semantic XML pretty printed:"
tt.pp_semantic_XML
The tags print as:
"tags:"
{"Organization"=>["Congress"],
"Person"=>["George Bush", "Tony Blair"],
"Relations"=>["PersonPolitical"]}
OpenCalais looks like a great service. I am planning on using their service for a technology demo, merging in some of my own semantic text processing tools. I might also use their service for training other machine learning based systems. Reuters will also offer a commercial version with guaranteed service, etc.

Labels: , ,


Tuesday, April 17, 2007

Something like Google page rank for semantic web URIs

A key idea of the semantic web is reusing other people's URIs - this reuse forms interconnections between different RDF data stores. The problem, as I see it, is that while URIs are unique, we will end up with a situation where (as an example) there might be thousands of URIs for NewsItem. The Swoogle semantic web search engine is useful for finding already existing URIs - a good first step toward reuse.

Swoogle will return results using their type of 'page ranking'.

So, we need two things to accelerate the adoption and utility of the semantic web: web sites must start to include RDF and this included RDF must reuse common URIs for concepts and instances.

Labels:


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]