Friday, February 09, 2007
Semantic Web: through the back door with HTML and CSS
Saturday, December 16, 2006
Public web applications and knowledge workers
The key thing is that all of these public web services allow you to export your data for archival backup, use in utility scripts and programs, etc. Most also support, in addition to manual data export, web service interfaces.
The only work that I perform "locally" is programming in Emacs+Common Lisp, various languages in Eclipse, and writing large documents using either Latex or OpenOffice.org. Even for "local work", all of my working materials are stored on leased managed servers in subversion repositories.
Labels: knowledge management
Wednesday, October 18, 2006
More on personal information management
While there are very good systems like Piggy Bank for creating your own meta-data store (in RDF) for web sites that you visit, this requires extra work. The sweet-spot for automating the collection and use of our own meta-data and data is being able to automatically use information that you may already have for your own del.iciu.us tags, tags you have applied to RSS feed items in Google Reader, etc. Most of us use public web portals as part of our work flow, but there are definitely unexplored possibilities for customizing our own knowledge management environments.
Labels: knowledge management
Sunday, September 03, 2006
Disconnect between thinking about a problem and programming
While state of the art IDEs like IntelliJ for Java and VisualStudio for .Net languages provide a comfortable working environment, I must say that both Java and the .Net languages are poor choices for many programming tasks.
Scripting languages like Ruby and Python help this thinking vs. programming disconnect in one important way: for small programming tasks, very short programs are sufficient and we can keep track of both problem task thinking and programming.
What about large projects? There are two good alternatives in programming languages: Common Lisp and Smalltalk:
Common Lisp lends itself really well to growing your own application specific language (using macros if you like, and functions). Once you build up an application specific language, a lot of the complexity of even complex programs goes away. Even more importantly, domain specific languages should help close the gap between thinking about problem solutions and programming these solutions.
The downside of Common Lisp is that while Emacs based IDEs are effective environments, even with add on code browsers, I find exploring large Common Lisp software projects to be tedious.
Smalltalk implementations generally have great code browsers because the simplicity and regularity of the language make it easier to automatically process the structure and semantics of code. Smalltalk blocks and closures, like in Ruby, allow many concise coding tricks - shorter programs are easier to understand and modify.
Labels: IT, knowledge management, Lisp, Smalltalk
Thursday, August 10, 2006
OpenCyc 1.0, AI in general
To me, AI is all about writing software that makes decisions given uncertain and sometimes contradictory information. AI is about modeling problem domains and working both within that model and changing the model as new information becomes available. AI is about using problem domain models to provide human users with useful, interesting, and unexpected results by matching a model of a user's inquiry. AI is about solving the game of Go: the branching complexity of the game is so great that having perfect information is not enough.
So, a tool like OpenCyc is not really a match to my personal view of what AI development is: Cyc and OpenCyc try to define ontolological knowledge of real world common sense knowledge. I appreciate decades of hard work, and I have myself spent many hours experimenting with earlier versions of OpenCyc - so kudos for the 1.0 delivery.
Still, I tend to view "AI problems" as being problems restricted to narrow domains but still made very difficult or impossible by uncertainty, missing information, and time or memory constraints on algorithms.
Labels: AI, knowledge management
Sunday, August 06, 2006
Globally unique identifiers
It is possible to write software that detects duplicate feeds, but comparing two articles is not an inexpensive operation, and when comparing a very large number of feeds, the O(N^2) runtime is painful. I have experimented with much a less accurate algorithm: hash NGRAMs of articles and check for duplication using a hash lookup. I have found that this gives poor results - at least in my experiments. If you do partial matching of NGRAMs, you are back to O(N^2). (If anyone knows a good way to handle this, let me know :-)
Globally unique identifiers help solve many duplication problems, makes it easier to implement container relations, and in general ATOM just seems to be a better and more scalable platform than RSS 2.0 for complex new applications.
Labels: IT, knowledge management
Wednesday, July 19, 2006
Good point: disinformation and the Semantic Web
After Berners-Lee's talk, Peter Norvig in the question period posed the problem of people publishing fake data in much the same way they try to cheat to increase the page ranking on their web sites. I had not thought of that problem, and it is a tough problem to deal with: what happens to trust mechanisms when some people actively try to fake the meta data on their web sites? While I was walking to lunch with Norvig a few years ago, I brought up a related problem: assume that for narrow domains of discourse (e.g., political news, financial news, etc.) that you could largely automate the creation of RDF from natural language text on web sites. I personally believe that this is achievable right now, with a lot of effort. The problem that I posed at lunch was (besides the technical challenges of dealing with potentially trillions of RDF triples) the problem of dealing with lots of conflicting information while factoring in different levels of trust.
Labels: knowledge management
Sunday, June 11, 2006
Different doument types, different work flows
I also like my design artifacts to look good, even if I am the only person who sees them. Two highly recommended tools are AbiWord and OmniGraffle because their default file formats are plain XML text files.
I admit that disk space and network bandwidth are close to free now but I still like to keep a project directory small. By using design tools that have small file footprints, most projects (source files, build scripts, tests, and design artifacts) are small and tidy.
Labels: knowledge management
Thursday, May 25, 2006
New PowerLoom site
Labels: AI, knowledge management
Sunday, May 21, 2006
Dealing with Knowledge Artifacts that are still in paper form
Fortunately, most journals are also available online, and articles can be copied for personal use. Before throwing out old journals I take a quick look for articles that might be of use in the future and I do a web search including the journal name and the article name. Articles in the ACM Portal or AAAI Digital Library (for example) can be copied locally for personal use by members after logging in. I used to keep journals, in paper form, almost forever but now having just high (possible) value articles stored on my local file system and indexed for search is good enough. I usually just save plain text, but if figures look especially useful I save them also.
Books are more of a problem. When we moved 7 years ago, I reduced the size of my technical library from about 400 books to about 150. Now when I purchase new books, I try to get rid of an equal number as gifts to my local library or sell them at a local used bookstore. A few times a year I go to reference a book that I have let go, but in general, I think that my technical library might be more useful with fewer books because I can find things very quickly.
Anyway, local storage works well for knowledge artifacts that other people create - usually storage and archival for personal use is allowed. For stuff that I produce (except for my published books that are owned by my publishers), I prefer public web storage.
I find that del.icio.us is a fantastic resource for organizing bookmarks for both knowledge artifacts on the web and for fun stuff that I might want to find again.
For fun stuff: I used to keep travel and family photographs on my KnowledgeBooks.com web site, but now I keep the best pictures on Flickr. I am tempted to start storing video clips (and I have some great stuff like dancers in India and Africa, etc.) on video.google.com when I have time.
Labels: knowledge management
Tuesday, May 09, 2006
Integrating a semantic network with a reasoning system
Labels: AI, knowledge management, Lisp
Sunday, April 30, 2006
Information: organization vs. overload
There is so much information to absorb and use for any type of knowledge worker that it takes time and effort to stay up to date with what we need to do our jobs. Much of my work involves writing custom software (usually layered on open source) for information management in specific industries/applications (large scale search, document categorization and repository maintenance, AI style data mining, agent technology to assist by bringing important things to user's attention, etc.) but I find it ironic that I can not seem to set aside the time to write much custom code for my own information needs (take care of customers first!). And, so far, it always seems to take custom code to solve specific information management problems. From what I have seen, there is not yet any silver bullet.
I have some ideas for exactly what tools I want for my own work flow and how I might "productize" them, but for now I have an adhoc system using subversion repositories, local directories organized by topic and augmented with local search, and using del.icio.us to organize bookmarks for material on the web. If I can set aside the time, I would like to integrate more of what I use in my own work flow.
What about search? Well, search is not information management. If/when semantic web technologies become more widely used, then software agents will be able to treat the web as an information source and be able to do research either without human intervention, or at least be valued assistants. CEOs of companies have well trained staff to filter and organize information - what will the effects be on society and the economy in the future when most people will have free or inexpensive software agents that can compete with well trained human staff? A nice thought but there will always be selective advantages to better information management systems.
Labels: knowledge management, search
Monday, April 10, 2006
Working backwards on the Semantic Web
I have had a little free time today to work on a pet project that Obie inspired: write Ruby wrapper code for making it easier to deal with RDF/RDFS/OWL by loading files and automatically mirroring classes, etc. I would work in Protege, then write Ruby code to consume the RDF/RDFS/OWL files so that I could work in a decent language. OK, fine.
However, this all still seems more than a little wrong to me. Since the Semantic Web is largely about ontologies and knowledge representation, why turn our backs on decades of AI research? Why not work with knowledge representation systems written in Lisp (or Prolog, Ruby, etc.) and have a back end that serializes to XML/RDF/RDFS/OWL as required. Really, use the best notation possible for all of the human-intensive work.
While Protege is a terrific tool, I still think that using older technologies like KEE, Loom, PowerLoom, etc. with optimal programming environments makes a lot of sense. Any language with good introspection (like Ruby or Common Lisp) would work for supporting XML serialization when required.
Labels: knowledge management
Wednesday, April 05, 2006
Interesting: Bill Gate's work flow; knowledge management
Knowledge management is something that I am keenly interested in because it ties together lots of technologies that I am interested in: ontologies, knowledge representation, data constraints, server side technologies, and natural language processing with text mining, etc.
Labels: knowledge management
Monday, March 27, 2006
We *really* need semantic attributes on web links
Sure, this is a problem, but the real solution is setting a standard (could be grass roots) for adding optional attributes to HTML links (I wrote about this More on link types and the Semantic Web a few months ago).
Jon Udel has a great idea for combining CSS style ids with semantic information. The microformats people have good ideas for using the rel attribute to specify a relationship to a link (license, help, type-of, friend, etc.)
What I am most interested in is having enough usable information on the web to enable me to write software tools that can automatically extract information on trust relationships between information sources on the web. However, the range of applications is just about infinite - given more metadata.
Labels: CSS, knowledge management
Tuesday, March 14, 2006
Useful tool for searching millions of RSS feeds; some of my own projects
I have been working on a customized information portal KBSportal.com for about a year. I took down the Java version a few months ago when I started on a newer version written in Ruby and Rails. I was hoping to have it back on line by now, but working on my new Ruby book and consulting for a few customers takes priority.
I created a 'knowledge based' recipes/cooking portal CJsKitchen.com last year and I also want to work on this technology more this year. I originally started to use the USDA nutrition database, then stopped when I realized how inaccurate estimating the nutritional content of recipes is given variations in cooking and quality of ingredients. Anyway, I want to take another shot at providing nutritional information along with displayed recipes. One cool thing about CJsKitchen.com is that you can keep a personal database on the web site of the ingredients you have on hand - and optionally only see recipes that you have the ingredients to make. I also have a first cut at an AI recipe agent to help you use up ingredients that you have.
Labels: knowledge management, nutrition
Wednesday, January 11, 2006
More on link types and the Semantic Web
One problem with the rel tag is that it does not support name/value pairs. For example, I might want a <a href=...> tag to have an attribute for how much I agree with the linked page. Something like this: <a href=... agreewith="3" ...> where the numeric constant might be assumed to be in the range [-10,10]. Still, even rel tag values like disagreewith, agreewith, etc. would be useful.
I think that eventually the Semantic Web will, in some form, catch on but I think its eventual success will come from small grass-roots efforts that are simple to implement and become de-facto standards if they become widely used. I believe that the most important semantics for linked web sites is what the level of trust or agreement is. For example, if a very large number of people link to a site that they believe to contain incorrect information, the dubious site's page rank will be high and the dubious site may be taken to contain accurate information. Widespread use of trust attributes on links would create new possibilities and opportunities for people writing software agents.
Labels: knowledge management
Subscribe to Posts [Atom]
