AI Site Perplexity and Cloudflare's Allegations

Cloudflare, a network security firm, has recently made allegations against the AI search engine Perplexity. They claim that Perplexity is employing stealth bots and other tactics to bypass websites' no-crawl directives, a violation that contradicts longstanding Internet practices.
Cloudflare's researchers conducted tests following complaints from customers who reported Perplexity was ignoring their site's robots.txt files and firewall settings, which were meant to block the crawlers. Despite these measures, Perplexity continued accessing site content, prompting the need for further investigation.
The research revealed that Perplexity utilizes undeclared crawlers that approach websites through multiple, unlisted IPs. These IPs are rotated in response to restrictive measures, and requests are observed from various ASNs to evade blocks. This strategy was noted across thousands of domains, involving millions of daily requests.
While Cloudflare isn't the first to voice such concerns over Perplexity's practices, it's a growing issue. For example, Reddit has previously struggled with similar concerns related to Perplexity, Microsoft, and Anthropic. There is an ongoing conversation about whether these AI entities are treating online content as freely accessible, challenging the spirit of established protocols like the Robots Exclusion Protocol.
The repercussions for Perplexity have been significant. They've faced allegations from publishers accusing them of content plagiarism. Notably, Forbes accused Perplexity of content theft, and Wired reported similar claims about suspicious activity. In response, Cloudflare has removed Perplexity as a verified bot and included them in blocks via their managed rules.
These actions by Cloudflare underscore the importance of transparency in crawler operations. They emphasize the necessity for crawlers to serve clear purposes, adhere to specific activities, and respect website directives. As Perplexity's methods suggest otherwise, steps have been taken to limit their access through Cloudflare's services.