Saturday, March 29, 2025
Google search engine
HomeTechnologyUsOpen Source devs say AI crawlers dominate traffic, forcing blocks on entire...

Open Source devs say AI crawlers dominate traffic, forcing blocks on entire countries



Open Source devs say AI crawlers dominate traffic, forcing blocks on entire countries

“Any time one of these crawlers pulls from my tarpit, it’s resources they’ve consumed and will have to pay hard cash for,” Aaron explained to Ars. “It effectively raises their costs. And seeing how none of them have turned a profit yet, that’s a big problem for them.”

On Friday, Cloudflare announced “AI Labyrinth,” a similar but more commercially polished approach. Unlike Nepenthes, which is designed as an offensive weapon against AI companies, Cloudflare positions its tool as a legitimate security feature to protect website owners from unauthorized scraping, as we reported at the time.

“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” Cloudflare explained in its announcement. The company reported that AI crawlers generate over 50 billion requests to their network daily, accounting for nearly 1 percent of all web traffic they process.

The community is also developing collaborative tools to help protect against these crawlers. The “ai.robots.txt” project offers an open list of web crawlers associated with AI companies and provides premade robots.txt files that implement the Robots Exclusion Protocol, as well as .htaccess files that return error pages when detecting AI crawler requests.

As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.

Responsible data collection may be achievable if AI firms collaborate directly with the affected communities. However, prominent industry players have shown little incentive to adopt more cooperative practices. Without meaningful regulation or self-restraint by AI firms, the arms race between data-hungry bots and those attempting to defend open source infrastructure seems likely to escalate further, potentially deepening the crisis for the digital ecosystem that underpins the modern Internet.



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments