As bots continue to scour the web to help train AI models, Cloudflare has released a new tool that allows customers to block all bots at once.
This tool is designed to solve the problem of scraping, the process by which bots extract content and material from websites. This is a practice that has grown with the rise of generative AI. Another issue highlighted by Cloudflare is web crawling, the bots that roam the web to index content from various websites. Cloudflare last year announced the ability for customers to block certain types of bots, but this new tool allows them to block all types of bots at once.
The company also analyzes its traffic to monitor the popularity of scraping bots and claims that "the value of bulk original content has never been higher." "While our analysis identified the most popular crawl program demand volume and the number of internet properties available to which many customers may not know about the most popular AI larvae who are actively searching their sites," Cloudflare said in a blog post in the area
The IT giant also warned that not all companies are transparent in terms of their data rape practices. Cloudflare claims to have detected bot operators that try to "disguise themselves as real browsers using deceptive user agents."
Cloudflare said: “We will continue to focus on and add more bot blocks to our AI crawlers and crawl rules, and develop our machine learning models to help keep the web a place where creators can thrive. The biggest bots
On top of the new feature, Cloudflare also shared insight into some of the most prominent AI bots scraping its network. The company claims to be connected to approx. 20 pcs. Of the Internet. The most popular AI bots making requests to Cloudflare sites are Bytespider, Amazonbot, ClaudeBot and GPTBot, the company said. Cloudflare claims these bots are used to train artificial intelligence models from ByteDance, Amazon, Anthropic and OpenAI. According to Cloudflare data, Bytespider, GPTBot and ClaudeBot are the top three bots in terms of share of website visits.
With the development of generative artificial intelligence, data scraping has recently become a problem in various fields. In May, Sony Music Group wrote to more than 700 tech companies asking them to refrain from using its content to train AI models.