AI Site Perplexity Accused of Bypassing No-Crawl Directives Using Stealth Tactics

The AI search platform, Perplexity, has been accused of utilizing stealthy bots to bypass websites' no-crawl mandates. This situation raises concerns about violating longstanding Internet protocols, according to Cloudflare, a service focused on network security and optimization.
Cloudflare reported in a blog post that complaints were received from site owners who had implemented measures like robots.txt files and Web application firewalls to prevent Perplexity's scraping bots. Despite these precautions, the bots allegedly continued accessing the sites' content.
Upon investigation, researchers discovered that when conventional Perplexity crawlers hit blocks, the site employed a stealth bot leveraging various tactics to disguise its activities.
Widespread Incidence
"This undisclosed crawler operated using multiple IP addresses not part of Perplexity's official list, adapting to blocking strategies deployed in robots.txt files and cloud services like Cloudflare," explained the researchers. "Requests originated from diverse Autonomous System Numbers (ASNs) to circumvent website restrictions, with activity detected across tens of thousands of domains generating millions of daily requests."
This alleged evasion breaches Internet standards rooted over three decades ago. In 1994, a protocol to guide robots.txt was proposed, becoming formally recognized in 2022, yet Perplexity appears to be skirting this practice.
Cloudflare is among others pointing out Perplexity's disregard for these norms. Last year, Reddit faced similar challenges, with their CEO Steve Huffman calling the actions of Perplexity and other AI platforms a significant nuisance, asserting they treated web content as freely accessible without permission.
Assertions of content plagiarism have also been made against Perplexity by several entities, including claims from Forbes and Wired. These allegations cite irregular traffic patterns suggestive of deliberate robots.txt exclusions by Perplexity, including the tweaking of bot ID strings to outmaneuver restrictions.
Cloudflare is taking a firm approach in response to their investigation by de-listing Perplexity as a verified bot and enhancing measures to block disguised crawling activities.
Attempts to reach Perplexity for a response to these allegations have so far been unanswered.