Note on July 7, 2014 I switched my blog back to hosting on Blogspot: http://blog.markwatson.com
Subscribe to: Atom posts

04 Aug 2013

Semantic web and linked data sources

Back in the 1980s I found useful information on the Internet by manually collecting and organizing FTP site file description lists that people would curate. Trying to find useful linked data sometimes feels like being back in the 1980s again before the web and before content was indexed for search.

While there are available semantic search engines like sindice.com I still find it useful to have a ready list of linked data sources, and I still spend some time doing manual browsing. To use sindice.com to find subject URIs for an application, enter a search term and on the results pages manually explore the Inspect (cache) links. If you have not used sindice.com before then a useful exercise is to search for the city you live in to find linked data.

I have spent time becoming familiar with commonly used linked data resources like:

While it is easier to manually browse using web interfaces like the faceted search browser for DBPedia or snorql web SPARQL query interfaces, ultimately you need to write code (preferably in a scripting language like Python or Ruby) to fetch useful data using SPARQL queries. I like to work "bottom up" by writing a collection of very short scripts that each gets one type of data. When I feel confident of being able to reliably retrieve linked data (assuming that public SPARQL endpoints are running!), then I use these scripts to build an application.

While the long term goal is writing information gathering software agents that autonomously explore linked data on the web, I think that it is OK for now doing some of the work manually. As an example of manual effort, there will in general be many different URIs on the web representing the same thing. You may very well need to spend some time using the sameas.org service to identify equivalent URIs which you can declare to be the same in your applications using the owl:sameAs property.

While the Semantic Web and linked data initiatives have been slow getting started, there now many public linked data sources and software tools for both publishing linked data and for "spidering" it for use in applications. The idea is to be creative and imagine ways that your applications could be better if they had access to much more data than you maintain yourself on your own systems. Another avenue for expressing your creativity as a developer is planning for non-availability of 3rd party data, because this does happen.


Do you want to comment on this blog article? Then please email your comment. I will add your comment with your name (I will not show your email address when publishing your comment).