Stealth Tactics by Perplexity: Cloudflare's Concerns

Privacy Center
Currently, only residents from certain countries and US states can opt out of certain Tracking Technologies through our Consent Management Platform. Additional options regarding these technologies may be available on your device, browser, or through industry options like AdChoices. Please see our Privacy Policy for more information.
Social Media
These cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.
Essential
This website uses essential cookies and services to enable core website features and provide a seamless user experience. These cookies and services are used to facilitate features such as navigation, remember user preferences, and ensure the security of the website.
Targeted
These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.
Performance
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.
Functional
This website uses functional cookies and services to remember your preferences and choices, such as language preferences, font sizes, region selections, and customized layouts. They enable this website to offer enhanced and personalized functionalities.
Audience Measurement
We use audience measurement cookies in order to carry out aggregated traffic measurement and generate performance statistics essential for the proper functioning of the site and the provision of its content. The use of these cookies is strictly limited to measuring the site's audience. These cookies do not allow the tracking of navigation on other websites and the data collected is not combined or shared with third parties. You can refuse the use of this cookie by switching off the slider to the right.
AI search engine Perplexity has been accused of using stealth bots and other techniques to disregarded websites’ no-crawl directives. According to Cloudflare, these actions breach the long-standing Internet norms established by the Robots Exclusion Protocol created in 1994 by engineer Martijn Koster. This protocol, formalized by the Internet Engineering Task Force in 2022, ensures that crawlers respect a website's directives.
Cloudflare noted receiving complaints from its customers, who had explicitly banned Perplexity's crawlers using their robots.txt files and web application firewalls. Despite these blocks, Perplexity allegedly continued accessing site content unduly. Upon investigation, Cloudflare researchers discovered that Perplexity used stealth bots to circumvent the blocks, utilizing a range of irregular IPs and stealth tactics to mask its activity. This deceptive crawling was reported across over 10,000 domains, making millions of requests each day.
The study revealed that Perplexity's crawlers used undisclosed IPs and switched among them to avoid the restrictive robots.txt policies and Cloudflare's firewalls. Requests came from various ASNs, indicating a systemic evasion method designed to sidestep website blocks altogether.
Cloudflare's response to these findings has been to remove Perplexity as a verified bot and strengthen their protective heuristics to prevent such sly crawling activities. These efforts aim to ensure that crawlers are transparent, follow site directives, and have clear, legitimate purposes.
Perplexity, already under fire from several publishers for alleged content plagiarism, has not responded to requests for comment regarding these accusations. Notable instances include claims from Forbes and Wired, who have accused Perplexity of unauthorized content reproduction and manipulation of crawler identification strings.
Cloudflare emphasizes that legitimate crawlers should aid in helpful, specific activities while respecting each website's directives. Their findings and subsequent actions reflect the company's commitment to preserving fair Internet practices and site integrity.
Dan Goodin, a senior security editor at Ars Technica, reports on these developments. He specializes in covering security topics including malware, computer espionage, and encryption. Find him actively discussing these and other issues in technology and independent music on platforms like Mastodon and Bluesky.