Note on July 7, 2014 I switched my blog back to hosting on Blogspot: http://blog.markwatson.com
Subscribe to: Atom posts

26 Jan 2014

Java and Clojure examples for reading the new WARC Common Crawl files

I just added a Clojure example to my Common Crawl repo. This Clojure example assumes that you have locally copied a crawl segment file to your laptop. In the next week I will add another Clojure example that pulls segment files from S3.

There are two Java examples in the repo for reading local segment files and from S3.


Do you want to comment on this blog article? Then please email your comment. I will add your comment with your name (I will not show your email address when publishing your comment).