I have used Nutch for two contracting jobs and Lucene for many jobs. Until today, I have viewed Nutch simply as:
  • Quick to configure for target websites to spider and to administer spidering
  • Trivial to run search web application
  • Web service provider (OpenSearch API)
Today however I started looking more closely at the underlying Hadoop architecture (like the distributed Google file system and their map reduce client library) and at both the available plugins and the plugin architecture. New opinion: Nutch is a platform for building more complex web applications and knowledge management applications.