Nutch is a highly extensible, highly scalable, matured, production-ready Web crawler which enables fine grained configuration and accomodates a wide variety of data acquisition tasks. Scalable Relying on Apache Hadoop⢠data structures, Nutch is great for batch processing large data volumes but can also be tailored to smaller jobs. Pluggable Out of the box Nutch offer powerful plugins i.e., parsing
{{#tags}}- {{label}}
{{/tags}}