Semantic web and linked data sources
Back in the 1980s I found useful information on the Internet by manually collecting and organizing FTP site file description lists that people would curate. Trying to find useful linked data sometimes feels like being back in the 1980s again before the web and before content was indexed for search.
While there are available semantic search engines like sindice.com I still find it useful to have a ready list of linked data sources, and I still spend some time doing manual browsing. To use sindice.com to find subject URIs for an application, enter a search term and on the results pages manually explore the Inspect (cache) links. If you have not used sindice.com before then a useful exercise is to search for the city you live in to find linked data.
I have spent time becoming familiar with commonly used linked data resources like:
- dbpedia.org - contains structured data from Wikipedia info boxes. Start exploring using the SPARQL endpoint and the faceted search browser
- datahub.io/ - a public repository of data sets. For example, the linked data movie database SPARQL endpoint (visiting a SPARQL endpoint in your browser will not yield useful results - you need to write some code, or use the snorql web interface).
- The W3.org maintains a list of public linked data sources
While the long term goal is writing information gathering software agents that autonomously explore linked data on the web, I think that it is OK for now doing some of the work manually. As an example of manual effort, there will in general be many different URIs on the web representing the same thing. You may very well need to spend some time using the sameas.org service to identify equivalent URIs which you can declare to be the same in your applications using the owl:sameAs property.
While the Semantic Web and linked data initiatives have been slow getting started, there now many public linked data sources and software tools for both publishing linked data and for "spidering" it for use in applications. The idea is to be creative and imagine ways that your applications could be better if they had access to much more data than you maintain yourself on your own systems. Another avenue for expressing your creativity as a developer is planning for non-availability of 3rd party data, because this does happen.