Perplexity AI Accused of Evading Web Crawling Restrictions

Perplexity, a known AI search engine, has come under scrutiny for allegedly using stealth bots to bypass web crawling restrictions set by websites. According to a revelation by Cloudflare, Perplexity’s methods might contravene established Internet norms that have persisted for over three decades.
Cloudflare’s insights were shared in a detailed blog post where they outlined complaints received from clients. These clients had configured their sites’ robots.txt files and deployed web application firewalls to block Perplexity’s web crawling activities. Despite these defensive measures, Perplexity reportedly continued to circumvent restrictions and access site content.
Investigations by Cloudflare found that even when faced with blocks via robots.txt or firewalls, Perplexity resorted to employing a stealth bot. This sophisticated bot masked its activities through a variety of tactics, rotating between numerous IP addresses not listed in the official Perplexity IP range.
This unapproved crawler activity was detected across more than 10,000 domains, and involved handling millions of requests daily. The crawler was noted to rotate through diverse IPs and leverage different Autonomous System Numbers (ASNs) to evade website restrictions even further.
The method employed by Perplexity, as illustrated by Cloudflare in a provided diagram, reportedly violates Internet norms that have been in effect since 1994. These norms are encapsulated within the Robots Exclusion Protocol devised by Martijn Koster, which officially became an Internet standard in 2022.
Perplexity’s alleged non-compliance is not unheard of, as similar grievances have been previously expressed. Reddit’s CEO, for one, voiced frustration over Perplexity's practices. Meanwhile, publications like Forbes and Wired have accused Perplexity of plagiarizing content by ignoring robots.txt exclusions.
Subsequent to its findings, Cloudflare has initiated steps to thwart these behaviors by de-listing Perplexity as a verified bot and fortifying their managed rules to block such stealth crawling activities.
As of now, Perplexity has not provided any comments clarifying their stance on the allegations.