Cloudflare's Stand Against Perplexity's Crawling Tactics

The AI search engine Perplexity has been accused by Cloudflare of utilizing stealth tactics to bypass no-crawl directives set by various websites. Cloudflare, a renowned network security firm, claims that this approach by Perplexity is in violation of long-standing Internet protocols.

According to Cloudflare, despite certain websites installing anti-scraping settings through their robots.txt files, Perplexity's bots managed to evade these defenses. The situation prompted Cloudflare researchers to delve deeper, only to find that Perplexity substituted its known bots with stealth crawlers designed to mask their activity.

This undeclared crawler leveraged a range of IPs not officially listed within Perplexity's IP range, continually rotating these IPs to evade the blocks. Observations indicated the exploitation of different autonomous system numbers (ASNs) to bypass site restrictions across numerous domains, resulting in millions of requests daily.

This behavior breaches the Internet norms established in 1994 with the proposal of the Robots Exclusion Protocol, a rule designed to prevent unauthorized site indexing by crawlers. Cloudflare's findings have pushed it to implement measures to counteract such stealth activities, including updating its managed rules to block these undeclared crawlers.

In response to these findings, Cloudflare has ceased to verify Perplexity as a recognized bot and has ramped up efforts to ensure transparency in crawler activities across its platform.

Perplexity has previously faced allegations of content theft and manipulation of its bot identification strings to get past website blocks. This includes accusations from major publishers like Forbes and Wired.

Cloudflare emphasizes the importance of transparent crawling practices and adherence to the website's guidelines to maintain a healthy digital ecosystem.

For further updates, stay tuned to the latest in AI, cybersecurity, and digital protocols.

Read next