Cloudflare data shows AI crawler traffic surging in May. [1] The numbers are concrete: bots hitting servers, consuming bandwidth, taking content without compensation. The surge is not a trend story. It is the infrastructure evidence for the extraction thesis.
MSM covered this as a network traffic report. [2] TechCrunch emphasized volume metrics and platform impact. X debated copyright and fair use — whether scraping is theft, whether content creators deserve payment. The paper follows the physical layer: which sites are hit, how much bandwidth is consumed, and whether any compensation framework exists. [1]
AI systems need data. The data lives on servers. The servers have owners. The owners have not agreed to provide the data. Crawlers are not abstractions — they are machines making millions of requests per day across the web. [1] Cloudflare's network sits between the crawlers and the servers, which gives it visibility into the scale of extraction.
The prior edition tracked compute costs — chips, power, cooling. [2] Today the Cloudflare data adds the content extraction dimension. AI infrastructure has two cost layers: compute (the hardware) and data (what the hardware consumes). Cloudflare measures the second layer.
No compensation framework exists. The legal calculus around fair use and scraping is unresolved. The paper's position: every crawl is a data point for who takes and who pays. [1]
-- THEO KAPLAN, San Francisco