Monday, February 15, 2010

Semantic Web: an alternative for RDFa

A few years ago I thought that XHTML would eventually be widely used but when the W3C decided to standardize on HTML5 (which I love for non Semantic Web reasons), that may have been the beginning of the end for RDFa because RDFa is an XML application.

I believe that a better alternative in a HTML5 world is to keep RDF separate from web pages but have a clear set of rules for finding RDF data files that correspond to web pages (either static or generated). One rule might be to look for a file named index.rdf for top level domain URLs; for example, see if http://markwatson.com/index.rdf exists for http://markwatson.com. For a URL like http://markwatson.com/hobbies look for http://markwatson.com/hobbies.rdf or http://markwatson.com/hobbies/index.rdf.

Although CMS support (e.g., Drupal) for RDFa and helper libraries like the RDFa Rails plugin might make it fairly easy for some web sites to provide RDFa, I think that we need something simpler that might be adopted by more web sites.

I am writing an open source tool (that will be an example program in the Semantic Web book I am writing) that will generate RDF data from web pages. I'll post a link when the code is ready.

Labels:


Thursday, January 21, 2010

The beauty of Latex: my AllegroGraph book becomes two books, one for JVM languages and one for Lisp

I have been working on and off for 16 months on a book about Semantic Web (or Linked Data) application programming using the AllegroGraph product. I have decided to substantially increase the scope of this applications/tutorial style book to also include support for Sesame. The figure on the left shows the software architecture road map for the book using JVM languages.

I am splitting the book into two volumes, and using Latex makes this really easy to share small amounts of common material so both books stand on their own. Latex also makes it easy to combine both books into one all-inclusive book, eliminating the duplicated parts. The two volumes are:Both AllegroGraph and Sesame are great development tools, but fill different needs. On projects that can support a several thousand dollar a year per server license fee, I would choose Common Lisp + AllegroGraph for development. AllegroGraph is very scalable and the Lisp APIs are really nice to work with. For Java (or other JVM languages) applications, I would still choose AllegroGraph for the scalability and support if a project can support the license costs. The good thing is that for most small to medium size projects, the free version of AllegroGraph or the open source Sesame project both are good choices, so as a developer you have some real flexibility. There are also other good RDF data store platforms like Jena, Joseki, Kowari, Redland, 4store, Swi-Prolog Semantic Web library, Talis, Virtuoso, etc. but I have relatively little (or in some cases no) experience with these. I use AllegroGraph and Sesame so that is what I write about.

Labels: , , , , , , ,


Monday, July 20, 2009

Book project, Google Wave, and a kayaking video

Except for some consulting work, my big project is a new book on using AllegroGraph for writing Semantic Web applications. Lots of work, but also a lot of fun.

I received a Google Wave Sandbox invitation today. I am going to try to spend an hour or two a day with Wave to get up to speed. Fortunately, I am 100% up to speed using the Java AppEngine (initially, Wave Robots, etc. get hosted on AppEngine, either Java or Python versions) and I have some experience with GWT - so I should already be in good shape -- but I need to write some code :-)

My wife took a short video of me kayaking yesterday.

Labels: , , , ,


Wednesday, July 08, 2009

Continuing to work on my AllegroGraph book

I started this book late last year, but set it aside to write my Apress Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing.

I don't think that the market will be large for an AllegroGraph (AG) book, but after using AG on one customer project and experimenting (off and on) with it for several years, I decided that it was Semantic Web technology worth mastering. AG is a commercial product, but a free server version (supports Lisp, Ruby, Java, and Python clients) is available that is limited to 50 million RDF triples (a large limit, so many projects can simply use the free version).

AG supports the Sesame (an open source Java RDF data store) REST style APIs so if you stick with SPARQL and only RDFS reasoning, you get portability to also use a BSD licensed alternative. That said, my reason for using AG is all of the proprietary extra goodies!

In addition to a few Lisp, Python, Ruby, and Java client examples, I am going to incorporate a lot of useful Common Lisp utilities for information processing that I have been working on for many years: this will motivate me to package up a great deal of my Common Lisp code and release it with an open source license. I plan on releasing the book for free as a PDF file and as a physical book for people who want to purchase it. The book and the open source examples should be available before the end of this year.

Labels: ,


Wednesday, February 04, 2009

Web 3.0: not just Semantic Web and Linked Data, also interop on languages and platforms

I am working on a 'Web 3.0' book so I am having a lot of fun and extending my own knowledge on linked data and other Semantic Web technologies. However, when thinking about the evolution of the web, I don't think that distributed semantically enabled data stores are anywhere near to the whole story. The evolution of the web now coincides with a very large change in our world-wide society: a move to what I call the "great frugality" of value/production based society and economic systems. While I look forward to a world wide shift towards increased emphasis of local infrastructure (definitely food production, and when possible light manufacturing), the evolving web is what can still keep us connected both to friends and colleagues with the same interests and to potential business partners, no matter where we live.

A big part of a shift towards a value/production based Web 3.0 that combines material for human readers and linked business software systems is the reduction of cost through open source software. It is clear that when using and building highly distributed systems on the web platform that we need to take advantage of multiple platforms (Java, Ruby/Rails, PHP, Pyhton/Django, etc.) I noticed that IBM is releasing a new version of Project Zero that provides an integrated Java and PHP deployment platform. My personal platforms of choice are Rails and Server side Java so I prefer Sun's Glashfish/JRuby/Rails/Java bundle.

The point that I am making is that platform choice is often guided by what combination of major open source web application frameworks best fit our business needs. A secondary concern is how we merge and integrate applications like (for example) PHP based SugarCRM, Java Business Intelligence stacks, and custom Ruby on Rails applications.

The final piece of the "great frugality" is learning to live with and accept open source licenses like the GPL and AGPL that to a large degree forces the sharing of infrastructure software. This can be an expensive mistake: failure to take advantage of cost reduction from open software infrastructure, while gaining either competitive advantages or at least efficiency and profitability due to business processes and knowledge.

Labels: ,


Wednesday, December 03, 2008

Good article on adding security to Semantic Web applications

This article on the Sun BabelFish blog provides good design and some implementation notes for using SSL and (possibly) self-signed certificates for authentication for software agents that have write access to the URI that they are associated with. Useful stuff.

Labels:


Monday, November 24, 2008

Something fun: new book project on the Semantic Web using AllegroGraph

The book is about 15% done (about 50 pages so far) and a rough draft PDF file is available. I realize that the market for this book will be small because AllegroGraph is a commercial product. However, Franz does make a non-commercial use version available for free, so my expectation is that when the book is done (between 2 and 6 months, depending on how busy my consulting schedule is) a fair number of people will enjoy the book with the non-commercial version of AllegroGraph. The finished book will be available for free as a PDF file and as a print book from lulu.com.

This book is fairly easy for me to write because I have existing coding experiments for just about all the Semantic Web application examples in the book. Also, since there are so many good Semantic Web references on the web and in existing books, I am only covering the SW technology that is used in the book examples. I want the book to be self contained: just enough tutorial and reference material covering AllegroGraph and other SW technologies so readers can completely understand the application examples.

Labels:


Monday, October 06, 2008

Swi-Prolog and the Semantic Web

A long time ago, my first useful experiments with using RDF were based on (after trying other tools) using Swi-Prolog's semantic web libraries. Since then, I have also been using other tools (mostly Sesame, some Jena, and some Franz's commercial AllegroGraph product - which I am planning on writing a short 'applications' book on, BTW, after I finish my Java AI book).

I noticed (see linked PDF paper) this morning that the RDFizing and Interlinking the EuroStat Data Set Effort (riese) architecture (diagram) uses Swi-Prolog on the back end. Very cool. The riese web site itself is interesting: human readable web pages with embedded RDFa for semantic web software agents. (Make sure you view page source on your browser.)

Labels: ,


Monday, September 15, 2008

Distributed robust system for provenance and trust in Semantic Web Applications and Tim Berners-Lee's new World Wide Web Foundation

With some reluctance, I am going to toss out what I think is a great business idea that is too large and resource intensive for me to pursue myself: develop the infrastructure and business models for a network graph (not a hierarchy) of "trust providers" similar to issuers like Thawte of SSL certificates, but for semantic web data.

First, I want to describe the problem to be solved: assuming the existence of RDF/RDFS/OWL data on the web, how do you know what is correct and what is faked for whatever nefarious reasons? What is the provenance of the data? Even human readers have a difficult time separating out real information from rumors, errors, and outright lies on the web.

Proposed solution: organizations "sign" data with a certificate for either a fee or other motivation. Using the current technology, RDF triples would be reified with one or more "trust tokens" (also implemented as RDF) from known signers who vouch for the provenance and accuracy of data. For now, this rating would have to be performed by human analysts, but could hopefully be done quickly and not too expensively with something like Amazon's Mechanical Turk system. I don't see this trust measurement as a Boolean trust or no-trust value - rather, a numeric range. Further: known signers can rate other signers. Signers would have a trust score. Accuracy and provenance of data could thus be assigned trust score based on the trust ratings given by one or more signers and the trust score of the signers themselves. The problem is to make this process of assignment a small fraction of the cost of producing RDF/RDFSOWL knowledge sources while adding significant extra value.

There is a lot of literature; try searching for "web of trust semantic web" and "provenance semantic web". When I read about Tim Berners-Lee's new World Wide Web Foundation this morning I started to hope that they might develop some open and free infrastructure software to support trust annotation of data. The high economic cost of quality trust-rated RDF/RDFS/OWL knowledge sources is definitely a problem, but it is difficult to even imagine the possible range of financial and social benefits. Having standard open source software to manage trust would help reduce costs for providing trust and provenance data through a network of cooperating trust providers.

Labels:


Thursday, July 24, 2008

Dynamic language 'goodness': comparing JRuby and Java Semantic Web example programs

Although there are several Semantic Web libraries or frameworks that I like to use, I had to choose just one for a DevX article that I am finishing up. I chose to use Sesame. After covering what I think are some "big wins" of using RDF/RDFs/OWL (for some applications) I present some example programs that I hope that readers have lots of fun with. The "wrapper" library that I wrote for Sesame works fine for both Java (which Sesame is written in) and JRuby. I must say that for experimenting with Sesame, JRuby is a lot nicer because the example programs are much shorter and with Ruby duck typing it is easier to write callback handlers, etc. for my wrapper library. Being able to work interactively in a JRuby jirb shell is also a big win for experimenting with code, different SPARQL queries, etc.

Labels: , , ,


Saturday, May 17, 2008

Book review: "Semantic Web for the Working Ontologist"

Dean Allemang and Jim Hendler's book provides a good overview of data modeling for the Semantic Web. Amazon purchase link: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. As someone who has invested a lot of time with both open source tools (Jena, Redland, Sesame, OwlApi, Protege, and Swi-Prologs Semantic Web libraries) and a commercial product (Franz AllegroGraph) it is refreshing to read a good book that abstracts away details like specific tools and RDF XML serialization and covers concepts and modeling how-to issues. I found it useful to enjoy this book at a high level while stopping occasionally to pause and experiment at the low level with OwlAPI, Redland, Protege, Sesame, and AllegroGraph. BTW, I wish that someone had told me years ago to never view XML serialization of RDF :-) The authors choice of showing XML serialization one time and then using N3 is very good.

There are a few tiny annoyances with this book, the primary one being small errors in the text that should have been caught in technical review. These do not however detract at all from the usefulness of the book - it is just too bad that such a very well thought out book has easily fixed mistakes.

For me one of the potential uses of this book is to loan it to or recommend it to customers who might want or need to use Semantic Web technology: I make my living as a consultant and it is important to have well informed customers and this book will provide a good understanding and rational for technically inclined customers, especially people with strong domain knowledge who want to (and can) directly participate in modeling efforts.

Labels:


Saturday, May 05, 2007

Interesting technology: AllegroGraph

I am using Franz's AllegroGraph for two proof of concept projects for a customer: one using the Java APIs (free version) and one using the Lisp version that is unlimited in the size of stored data. RDF storage and querying is not easy technology to use (at least for me) but looks very promising.

The thing that I find interesting about using AllegroGraph is that you are dealing with disk-based persistent data, but not dealing with objects - not dealing with object relational mapping, etc. Instead, you work with graph data structures that are stored on disk, with parts cached in memory. Interesting stuff.

Still, dealing with RDF is not optimal, compared to dealing with graphs in memory. As an example: I used to work a lot with Rete networks using Lisp (hacking Charles Forgy's Lisp code) and dealing with graph data structures built up with Lisp lists, cons, etc. is just easier to do. In memory graphs, semantic networks, etc. are just easier for me to wrap my thoughts around. However, approaches like AllegroGraph have the advantage of scalability.

Labels: ,


Tuesday, April 17, 2007

The Semantic Web, Parrots, and AI

Two different subjects today: I just added a blog entry on the semantic web on my AI blog and our pet parrot. One (possible) route to understanding how to do AI is to appreciate problem solving abilities in the natural world. Our young Meyers parrot is a good problem solver but it takes him a while. Earlier this morning, I was reading in bed and had fetched our parrot so he could run around like crazy on and under our bedspread - good for burning off energy. Our parrot wanted to get at some of my stuff on my night stand, but his way was blocked, except for a space between two water bottles which, try as he might he could not squeeze through and he could not move the water bottles. He spent about 2 minutes walking back and forth thinking about the sad situation he was confronted with when he suddenly lowered one wing, raised the other, moving his shoulders close together and then simply walked right through the "water bottle gap" :-)

Our small parrot must have some abstract world model of objects and his own body. Why and how he thought of raising one shoulder while lowering the other to compress the width of his shoulders is a mystery to me, but I believe that this was possibly an example of abstract thinking.

Labels: ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]