Java and Clojure examples for reading the new WARC Common Crawl files
I just added a Clojure example to my Common Crawl repo. This Clojure example assumes that you have locally copied a crawl segment file to your laptop. In the next week I will add another Clojure example that pulls segment files from S3.
There are two Java examples in the repo for reading local segment files and from S3.