Sunday, September 06, 2009
Very much liking Amazon EC2
I remain very enthusiastic about Google's AppEngine (and also I am very much enjoying my developer's Wave account). That said, Amazon's AWS services are having a much larger effect on my work for customers and my own work and research. AppEngine is great for some types of projects, but EC2 can be used for anything.
I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduce runs, the cost over a few week period is a small business expense. Anyway, I am setting up a work environment on an EC2 instance "as we speak."
My wife and I often evaluate what we really need in our home (we like to keep our small house tidy and elegant) and I would like to get rid of old computer hardware that I can do without - and, in the USA, our schools are poorly funded so contributing old hardware to our local high school with the latest Ubuntu installed could be a good thing.
Note: I would like to thank Amazon for providing a grant to me to cover my EC2, S3, and Electric MapReduce expenses for writing my last Ruby book's examples and permanently providing a MCI with all of the book's examples.
I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduce runs, the cost over a few week period is a small business expense. Anyway, I am setting up a work environment on an EC2 instance "as we speak."
My wife and I often evaluate what we really need in our home (we like to keep our small house tidy and elegant) and I would like to get rid of old computer hardware that I can do without - and, in the USA, our schools are poorly funded so contributing old hardware to our local high school with the latest Ubuntu installed could be a good thing.
Note: I would like to thank Amazon for providing a grant to me to cover my EC2, S3, and Electric MapReduce expenses for writing my last Ruby book's examples and permanently providing a MCI with all of the book's examples.
Labels: AI, EC2, Hadoop, textmining
Saturday, September 05, 2009
great video talk: "Innovation in Search and Artificial Intelligence"
Peter Norvig's recent talk at UC Berkeley discussed how the effects of large data sets and increasing computer resources make it possible to achieve increasingly better modeling and predictive results. Well worth an hour to listen to.
There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.
There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.
Subscribe to Posts [Atom]
