Sunday, September 06, 2009
Very much liking Amazon EC2
I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduce runs, the cost over a few week period is a small business expense. Anyway, I am setting up a work environment on an EC2 instance "as we speak."
My wife and I often evaluate what we really need in our home (we like to keep our small house tidy and elegant) and I would like to get rid of old computer hardware that I can do without - and, in the USA, our schools are poorly funded so contributing old hardware to our local high school with the latest Ubuntu installed could be a good thing.
Note: I would like to thank Amazon for providing a grant to me to cover my EC2, S3, and Electric MapReduce expenses for writing my last Ruby book's examples and permanently providing a MCI with all of the book's examples.
Labels: AI, EC2, Hadoop, textmining
Saturday, September 05, 2009
great video talk: "Innovation in Search and Artificial Intelligence"
There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.
Monday, August 31, 2009
Notes on using PowerLoom with SBCL Common Lisp
I am evaluating the use of PowerLoom on a customer project and spent a while this morning experimenting with PowerLoom (version powerloom-3.2.50) using SBCL Common Lisp. Since it took me a while to find how to do the things in Lisp that I am used to doing in Java, I thought that I would make some notes on what I did:
Download and unpack the PowerLoom distribution. We will be using the example knowledge base file kbs/business.plm so you might want that open in a text editor to read through it. Start by running SBCL (lots of output removed for brevity):
$ cd powerloom-3.2.50The argument "BUSINESS" is the knowledge base module name defined in kbs/business.plm. The last nil argument specifies a PowerLoom environment.
$ sbcl
This is SBCL 1.0.29, an implementation of ANSI Common Lisp.
* (load "load-powerloom.lisp")
* (STELLA::LOAD "kbs/business.plm")
NIL
* (PLI:S-ASSERT-PROPOSITION "(and (company c1) (company-name c1 \"Moms Grocery\"))" "BUSINESS" nil)
|i|/PLI/@PL-ITERATOR
* (PLI:S-ASSERT-PROPOSITION "(and (company c2) (company-name c1 \"Dads Grocery\"))" "BUSINESS" nil)
|i|/PLI/@PL-ITERATOR
* (let ((iter (pli:s-retrieve "all (company-name ?x ?y)" "BUSINESS" nil)))
(loop while (stella::next? iter)
do (print (pli::%pl-iterator.value iter))))
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/C1 |L|"Dads Grocery")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/C1 |L|"Moms Grocery")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/MEGASOFT |L|"MegaSoft, Inc.")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/ACME-CLEANERS |L|"ACME Cleaners, LTD")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/MEGASOFT |L|"MegaSoft")
NIL
*
The example file kbs/business.plm defined the concept company and defined instances. I asserted two additional companies and listed them.
It is easy enough to use PowerLoom using the interactive shell but it is more difficult embedding PowerLoom in Java and Common Lisp applications. The above example will at least get you started interfacing between your Lisp runtime environment and the embedded PowerLoom environment. API functions that start with "s-" are the most convenient because they take string input arguments.
Thursday, April 23, 2009
Apache Mahout Scalable Machine Learning first public release
It is interesting in life how things often come together just when you need them. I have a business idea that I want to pursue using EC2 and Mahout will probably help with a small part of the system.
Labels: AI, Java, machine learning
Thursday, September 25, 2008
Looking for reviewers for my book "Practical Artiļ¬cial Intelligence Programming With Java"
I would very much appreciate technical feedback on the manuscript which can be downloaded from my open content page: www.markwatson.com/opencontent/
A direct download link is: www.markwatson.com/opencontent/JavaAI3rd.pdf
Thanks in advance!
Labels: AI, Java, technical writing
Wednesday, August 13, 2008
New version of my KBtextmaster NLP library is available
Labels: AI, commercial products, Java, NLP
Wednesday, August 29, 2007
Good book: "Programming Collective Intelligence"
Labels: AI, clustering, machine learning, Python, support vector machines
Tuesday, April 17, 2007
The Semantic Web, Parrots, and AI
Our small parrot must have some abstract world model of objects and his own body. Why and how he thought of raising one shoulder while lowering the other to compress the width of his shoulders is a mystery to me, but I believe that this was possibly an example of abstract thinking.
Labels: AI, semantic web
Friday, March 16, 2007
metaweb.com and freebase.com
"Freebase is a vast, free, open online database of structured knowledge" - from their web site.
One interesting thing, besides the interesting technology for storing and querying structured data where both the user can define her own categories and use system wide categories, is that the content that hosted is freely licensed under Creative Commons, GNU documentation license, or in the public domain.
You need to request an invitation, and then the documentation provides information on accessing Freebase. I experimented during lunch time with their Python client APIs - cool stuff.
Labels: AI, data mining, Python
Wednesday, March 14, 2007
I have released some NLP (natural language processing) tools with a LGPL license
BTW, I consider the LGPL to be "business friendly". You are allowed to mix my LGPL software with your own commercial products without open sourcing your products. You may also mix my LGPL software with open source with projects with Apache, BSD, MIT, or Mozilla style licenses. If you have any questions, ask me. If the LGPL license prevents you from using this material, please let me know about it.
Labels: AI, C#, C++, Java, Ruby
Saturday, March 03, 2007
Source code for FastTag released - free for non-commercial use
Labels: AI
Tuesday, October 10, 2006
Human minds, programming, and the "caching problem"
I like to view programming languages in terms of how they allow me to deal with complexity, keeping as much stuff in my head at once:
- Lisp: great for building up the language from the bottom and extending towards an application domain. The new application "programming language" is higher level and the remaining part of a system is more concise code and easier to keep track of.
- Ruby: concise, so programs are much shorter and easier to understand.
- Java: the language does little for me as far as reducing complexity, but great IDEs like IntelliJ at least allow rapid code browsing, "who calls this" queries, etc.
Friday, August 18, 2006
I started a new blog just on AI theory
Labels: AI
Monday, August 14, 2006
Indie Game Development, AI in games
I spent a few years doing AI game development at Angel Studios (2 Nintendo games, prototype networked PC hovercraft game, and a VR system for Disney) and although I have been working more on 'practical' AI applications since moving to Sedona 7 years ago, I still have a keen interest in gaming and AI for games. A few years ago I thought of setting up a cooperative game development community for fun and maybe some profit, but my consulting business keeps me too busy, at least for now. Another thing that keeps me from making a large investment in an independent game making co-op is thinking how much money was spent writing commercial games at Angel Studios: teams with dozens of professional artists, programmers, a few musicians, etc. are expensive. That said, game AI programming is great fun and surprisingly difficult.
Thursday, August 10, 2006
OpenCyc 1.0, AI in general
To me, AI is all about writing software that makes decisions given uncertain and sometimes contradictory information. AI is about modeling problem domains and working both within that model and changing the model as new information becomes available. AI is about using problem domain models to provide human users with useful, interesting, and unexpected results by matching a model of a user's inquiry. AI is about solving the game of Go: the branching complexity of the game is so great that having perfect information is not enough.
So, a tool like OpenCyc is not really a match to my personal view of what AI development is: Cyc and OpenCyc try to define ontolological knowledge of real world common sense knowledge. I appreciate decades of hard work, and I have myself spent many hours experimenting with earlier versions of OpenCyc - so kudos for the 1.0 delivery.
Still, I tend to view "AI problems" as being problems restricted to narrow domains but still made very difficult or impossible by uncertainty, missing information, and time or memory constraints on algorithms.
Labels: AI, knowledge management
Saturday, August 05, 2006
Yes languages affect our thoughts, even in programming
I was working on some tricky code this morning that builds on some Common Lisp CLOS class libraries. The new code is really orthogonal to the existing functionality and it seemed like a poor idea to merge the new in with the old, especially since the old codebase will probably be used as-is for a while. I decided to start a new module (as defined by physical file organization) that added the new functionality to the existing classes as generic methods. The new module stays small, and anyone needing to use the original codebase is not confused with extra code for functionality that they do not need.
In Ruby, I like to do the same sort of thing: have different modules (as defined by physical file organization) where new orthogonal functionality is added by defining new methods to existing classes in new file modules.
In Java (and other languages), you can always use Aspect-oriented programming (AOP) to add new orthogonal behavior to class libraries, but, to be honest, I dislike AOP - this is not the fault of AOP per se, but because I have only used AOP with Java.
Thursday, May 25, 2006
New PowerLoom site
Labels: AI, knowledge management
Tuesday, May 09, 2006
Integrating a semantic network with a reasoning system
Labels: AI, knowledge management, Lisp
Friday, February 24, 2006
Wonderful mix of functional and logic programming
I mostly use object oriented programming (Java, Ruby, Smalltalk, and Common Lisp's CLOS) so it is healthy to switch to a functional programming style if only for research and learning projects. Anyway, this is an awesome book.
Subscribe to Posts [Atom]
