Common Crawl maintains a free,open repository of web crawl data that can be used by anyone.
Common Crawl is a 501(c)(3) non–profit founded in 2007. We make wholesale extraction, transformation and analysis of open web data accessible to researchers.
Announcing our February 2025 Web Graph release based on the crawls of December 2024 and January/February 2025, consisting of 267.4 million nodes and 2.7 billion edges at the host level, and 106.5 million nodes and 1.9 billion edges at the domain level.
Thom Vaughan
Thom is Principal Technologist at the Common Crawl Foundation.