Sunday, September 06, 2009

Very much liking Amazon EC2

I remain very enthusiastic about Google's AppEngine (and also I am very much enjoying my developer's Wave account). That said, Amazon's AWS services are having a much larger effect on my work for customers and my own work and research. AppEngine is great for some types of projects, but EC2 can be used for anything.

I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduce runs, the cost over a few week period is a small business expense. Anyway, I am setting up a work environment on an EC2 instance "as we speak."

My wife and I often evaluate what we really need in our home (we like to keep our small house tidy and elegant) and I would like to get rid of old computer hardware that I can do without - and, in the USA, our schools are poorly funded so contributing old hardware to our local high school with the latest Ubuntu installed could be a good thing.

Note: I would like to thank Amazon for providing a grant to me to cover my EC2, S3, and Electric MapReduce expenses for writing my last Ruby book's examples and permanently providing a MCI with all of the book's examples.

Labels: , , ,


Saturday, September 05, 2009

great video talk: "Innovation in Search and Artificial Intelligence"

Peter Norvig's recent talk at UC Berkeley discussed how the effects of large data sets and increasing computer resources make it possible to achieve increasingly better modeling and predictive results. Well worth an hour to listen to.

There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.

Labels: ,


Monday, August 31, 2009

Notes on using PowerLoom with SBCL Common Lisp

A while ago, I wrote Java wrappers for easily using PowerLoom from Java (see my Java AI book (free PDF download)).

I am evaluating the use of PowerLoom on a customer project and spent a while this morning experimenting with PowerLoom (version powerloom-3.2.50) using SBCL Common Lisp. Since it took me a while to find how to do the things in Lisp that I am used to doing in Java, I thought that I would make some notes on what I did:

Download and unpack the PowerLoom distribution. We will be using the example knowledge base file kbs/business.plm so you might want that open in a text editor to read through it. Start by running SBCL (lots of output removed for brevity):
$ cd powerloom-3.2.50
$ sbcl
This is SBCL 1.0.29, an implementation of ANSI Common Lisp.
* (load "load-powerloom.lisp")
* (STELLA::LOAD "kbs/business.plm")
NIL
* (PLI:S-ASSERT-PROPOSITION "(and (company c1) (company-name c1 \"Moms Grocery\"))" "BUSINESS" nil)

|i|/PLI/@PL-ITERATOR
* (PLI:S-ASSERT-PROPOSITION "(and (company c2) (company-name c1 \"Dads Grocery\"))" "BUSINESS" nil)

|i|/PLI/@PL-ITERATOR
* (let ((iter (pli:s-retrieve "all (company-name ?x ?y)" "BUSINESS" nil)))
(loop while (stella::next? iter)
do (print (pli::%pl-iterator.value iter))))

(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/C1 |L|"Dads Grocery")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/C1 |L|"Moms Grocery")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/MEGASOFT |L|"MegaSoft, Inc.")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/ACME-CLEANERS |L|"ACME Cleaners, LTD")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/MEGASOFT |L|"MegaSoft")
NIL
*
The argument "BUSINESS" is the knowledge base module name defined in kbs/business.plm. The last nil argument specifies a PowerLoom environment.

The example file kbs/business.plm defined the concept company and defined instances. I asserted two additional companies and listed them.

It is easy enough to use PowerLoom using the interactive shell but it is more difficult embedding PowerLoom in Java and Common Lisp applications. The above example will at least get you started interfacing between your Lisp runtime environment and the embedded PowerLoom environment. API functions that start with "s-" are the most convenient because they take string input arguments.

Labels: ,


Thursday, April 23, 2009

Apache Mahout Scalable Machine Learning first public release

The Mahoot project has just made their first public release of scalable machine learning tools for the Hadoop platform. With Amazon's Elastic MapReduce, it is possible (for example) to make an 8 server instance 1 hour run for about a dollar - combined with Mahoot, I think that this is really going to open the door for individuals and small organizations to more effectively use machine learning. Good stuff! I have started to take a quick look at the code but I won't have time to try it out on Elastic MapReduce for a few weeks (I am finishing the last Chapter of my Intelligent Scripting for Web 3.0 book and then I have some production work to do - so no free time for a while!)

It is interesting in life how things often come together just when you need them. I have a business idea that I want to pursue using EC2 and Mahout will probably help with a small part of the system.

Labels: , ,


Thursday, September 25, 2008

Looking for reviewers for my book "Practical Artificial Intelligence Programming With Java"

I am within a month or so of completing the third edition of my book. This book will always be available as a free PDF from my web site and as an instant-print book.

I would very much appreciate technical feedback on the manuscript which can be downloaded from my open content page: www.markwatson.com/opencontent/

A direct download link is: www.markwatson.com/opencontent/JavaAI3rd.pdf

Thanks in advance!

Labels: , ,


Wednesday, August 13, 2008

New version of my KBtextmaster NLP library is available

I just released a new version of my KBtextmaster Natural Language Processing (NLP) Java library. Free for non-commercial use, with a small fee for commercial use. Should also work fine with JRuby :-)

Labels: , , ,


Wednesday, August 29, 2007

Good book: "Programming Collective Intelligence"

This book is a great introduction to the techniques that I use almost daily in my own personal research and work for customers, and I can recommend it without reservation. The choice of Python for the examples is not optimum for me, but OK, especially because the techniques in the book for machine learning, categorization, clustering, filtering, optimization, support vector machines, etc. are mostly short and can be used as is or converted to whatever programming language that you need to use. The data used to present the book material is mostly from collaborative web sites. The book relies heavily on existing Python libraries and I like this approach since it mirrors rational software development practice: build custom code on top of existing libraries and software tools. Good book!

Labels: , , , ,


Tuesday, April 17, 2007

The Semantic Web, Parrots, and AI

Two different subjects today: I just added a blog entry on the semantic web on my AI blog and our pet parrot. One (possible) route to understanding how to do AI is to appreciate problem solving abilities in the natural world. Our young Meyers parrot is a good problem solver but it takes him a while. Earlier this morning, I was reading in bed and had fetched our parrot so he could run around like crazy on and under our bedspread - good for burning off energy. Our parrot wanted to get at some of my stuff on my night stand, but his way was blocked, except for a space between two water bottles which, try as he might he could not squeeze through and he could not move the water bottles. He spent about 2 minutes walking back and forth thinking about the sad situation he was confronted with when he suddenly lowered one wing, raised the other, moving his shoulders close together and then simply walked right through the "water bottle gap" :-)

Our small parrot must have some abstract world model of objects and his own body. Why and how he thought of raising one shoulder while lowering the other to compress the width of his shoulders is a mystery to me, but I believe that this was possibly an example of abstract thinking.

Labels: ,


Friday, March 16, 2007

metaweb.com and freebase.com

I am always on the lookout for freely available sources of data in useful formats. Metaweb was founded by Danny Hillis and their first public system is at www.freebase.com.

"Freebase is a vast, free, open online database of structured knowledge" - from their web site.

One interesting thing, besides the interesting technology for storing and querying structured data where both the user can define her own categories and use system wide categories, is that the content that hosted is freely licensed under Creative Commons, GNU documentation license, or in the public domain.

You need to request an invitation, and then the documentation provides information on accessing Freebase. I experimented during lunch time with their Python client APIs - cool stuff.

Labels: , ,


Wednesday, March 14, 2007

I have released some NLP (natural language processing) tools with a LGPL license

Here is the download link. These tools come in a few 'flavors': Java, Ruby, C++, and C#. I expect to add two larger NLP projects in the next month.

BTW, I consider the LGPL to be "business friendly". You are allowed to mix my LGPL software with your own commercial products without open sourcing your products. You may also mix my LGPL software with open source with projects with Apache, BSD, MIT, or Mozilla style licenses. If you have any questions, ask me. If the LGPL license prevents you from using this material, please let me know about it.

Labels: , , , ,


Saturday, March 03, 2007

Source code for FastTag released - free for non-commercial use

I have started to release my commercial products with a free for non-commercial license, with source code. I repackaged the FastTag part of speech tagger this morning and it is now available. A separate version of FastTag uses the MEDPOST medical term lexicon.

Labels:


Tuesday, October 10, 2006

Human minds, programming, and the "caching problem"

When writing large software systems, rapid access to data is often important: what can be kept in memory or more slowly: processes on the same local network and on disk. In software development, we see the same effect: maximum speed and efficiently if a single person can understand the architecture and comprehend the entire system. Moving from a single developer to a very small team adds a little overhead: design notes and pencil and paper drawings turn into casual but more explicit short documents and conversation. The optimisation is minimising cost between two people talking and sharing information vs. maintaining documentation and reading time. Talking is almost always better because communication is a two way street, but if you have N developers, O(N^2) "talking overhead" is too expensive with a large N, so back to the one way street of documentation.

I like to view programming languages in terms of how they allow me to deal with complexity, keeping as much stuff in my head at once:

Labels: , ,


Friday, August 18, 2006

I started a new blog just on AI theory

Artificial Intelligence Theory will probably be a very low volume blog. I am planning on using it more in an essay or white-paper writing mode. One thing that I will probably write about, in addition to more practical topics like probabilistic networks and reasoning systems is a long time interest that started in 1976 when I bought Bertram Raphael's great book The Thinking Computer: Mind Inside Matter: Computer Go Programs. I spent a lot of free time in the late 1970s writing what I am quite sure was the world's first commercial Go playing program Honnibo Warrior. I am still very interested in trying to develop some cross between NGRAM style hashes for local board positions and efficient storage mechanisms like AllegroCache to solve some tasks that if not strictly required by a Go program, would at least be more like the way human experts play Go: in other words, figure out how to implement the temporal and spatial memory in the human neocortex, but in software, and efficiently.

Labels:


Monday, August 14, 2006

Indie Game Development, AI in games

Slashdot has a discussion on Microsoft's "free" PC and XBOX 360 game development kit. There are also other good low cost alternatives for Indie development like Torque.

I spent a few years doing AI game development at Angel Studios (2 Nintendo games, prototype networked PC hovercraft game, and a VR system for Disney) and although I have been working more on 'practical' AI applications since moving to Sedona 7 years ago, I still have a keen interest in gaming and AI for games. A few years ago I thought of setting up a cooperative game development community for fun and maybe some profit, but my consulting business keeps me too busy, at least for now. Another thing that keeps me from making a large investment in an independent game making co-op is thinking how much money was spent writing commercial games at Angel Studios: teams with dozens of professional artists, programmers, a few musicians, etc. are expensive. That said, game AI programming is great fun and surprisingly difficult.

Labels: ,


Thursday, August 10, 2006

OpenCyc 1.0, AI in general

I noticed on Slashdot that OpenCyc 1.0 has been released. I spent a short while reading comments and realized how different my own views on AI are from many Slashdot commentators.

To me, AI is all about writing software that makes decisions given uncertain and sometimes contradictory information. AI is about modeling problem domains and working both within that model and changing the model as new information becomes available. AI is about using problem domain models to provide human users with useful, interesting, and unexpected results by matching a model of a user's inquiry. AI is about solving the game of Go: the branching complexity of the game is so great that having perfect information is not enough.

So, a tool like OpenCyc is not really a match to my personal view of what AI development is: Cyc and OpenCyc try to define ontolological knowledge of real world common sense knowledge. I appreciate decades of hard work, and I have myself spent many hours experimenting with earlier versions of OpenCyc - so kudos for the 1.0 delivery.

Still, I tend to view "AI problems" as being problems restricted to narrow domains but still made very difficult or impossible by uncertainty, missing information, and time or memory constraints on algorithms.

Labels: ,


Saturday, August 05, 2006

Yes languages affect our thoughts, even in programming

The Sapir-Whorf hypothesis poses that our native language affects how we think. Computer programming languages also strongly affect how we think about, design, implement, and maintain code.

I was working on some tricky code this morning that builds on some Common Lisp CLOS class libraries. The new code is really orthogonal to the existing functionality and it seemed like a poor idea to merge the new in with the old, especially since the old codebase will probably be used as-is for a while. I decided to start a new module (as defined by physical file organization) that added the new functionality to the existing classes as generic methods. The new module stays small, and anyone needing to use the original codebase is not confused with extra code for functionality that they do not need.

In Ruby, I like to do the same sort of thing: have different modules (as defined by physical file organization) where new orthogonal functionality is added by defining new methods to existing classes in new file modules.

In Java (and other languages), you can always use Aspect-oriented programming (AOP) to add new orthogonal behavior to class libraries, but, to be honest, I dislike AOP - this is not the fault of AOP per se, but because I have only used AOP with Java.

Labels: , ,


Thursday, May 25, 2006

New PowerLoom site

Thanks to reader Vinodh Das for pointing this out to me: the PowerLoom web site has been updated and as one of the developers told me, PowerLoom is now released under an open source license. PowerLoom is a great system - if you are interested in AI, logic, reasoning systems, etc., then check it out.

Labels: ,


Tuesday, May 09, 2006

Integrating a semantic network with a reasoning system

For a long term AI project, I am using Common Lisp and CLOS to model customer application specific nodes in a semantic network. This morning I worked out the non-obvious (to me!) bits for integrating my stuff with the Loom reasoning system by deriving Loom concepts from my CLOS classes. Cool stuff!

Labels: , ,


Friday, February 24, 2006

Wonderful mix of functional and logic programming

I received the book The Reasoned Schemer last week. The authors use the same socratic teaching style (ask questions of the reader) that they used in the Little Schemer to introduce the implementation of logic programming using a functional programming style. I can just feel my brain twisting a little from new ways of thinking about old problems.

I mostly use object oriented programming (Java, Ruby, Smalltalk, and Common Lisp's CLOS) so it is healthy to switch to a functional programming style if only for research and learning projects. Anyway, this is an awesome book.

Labels: ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]