Cloudflare Clampdown on Perplexity's Stealth Tactics

Artificial intelligence search engine, Perplexity, is reportedly using covert bots to bypass websites' no-crawl rules. As outlined by Cloudflare, a leading network security service, this approach could be breaching long-established Internet standards.

Cloudflare revealed in a blog post that they received several complaints from customers who had blocked Perplexity's crawlers using settings in their robots.txt files and Web application firewalls. Despite these measures, Perplexity allegedly continued accessing content on these sites.

Upon investigation, Cloudflare discovered Perplexity's use of stealth bots to circumvent restrictions posed by robots.txt files or firewall rules. These undeclared crawlers utilized various IPs not officially recognized by Perplexity to access site content.

The implications of these practices are substantial, affecting over 10,000 domains and processing millions of requests daily. Such methods challenge the established Robots Exclusion Protocol, a standard formally codified by the Internet Engineering Task Force in 2022, designed to restrict unauthorized crawler access to specified site areas.

This controversy echoes previous allegations against Perplexity, notably from Reddit's CEO, who highlighted Perplexity's disregard for online content ownership. Additionally, media outlets like Forbes and Wired have accused Perplexity of content plagiarism, due to suspicious traffic patterns undermining robots.txt directives.

As Cloudflare moves to enhance its defenses against such unscrupulous crawling activities, they underscore the importance of transparency and respect towards website directives. In response to the findings, Perplexity has been de-listed as a verified bot by Cloudflare.

Currently, there has been no response from Perplexity regarding these allegations.

Read next