Subscribe to: Atom posts

26 Jan 2014

Java and Clojure examples for reading the new WARC Common Crawl files

I just added a Clojure example to my Common Crawl repo. This Clojure example assumes that you have locally copied a crawl segment file to your laptop. In the next week I will add another Clojure example that pulls segment files from S3.

There are two Java examples in the repo for reading local segment files and from S3.