Wired discovered that Perplexity, an AI search engine, has been bypassing the Robots Exclusion Protocol (robots.txt) and scraping its website and other Condé Nast publications. The investigation found a machine on an Amazon server, likely operated by Perplexity, ignoring website crawling restrictions. Wired also observed inaccuracies in the summaries generated by Perplexity's chatbot, raising further concerns about the company's methods and practices.
AI companies are bypassing robots.txt files by using third-party web crawlers that ignore the protocol or by simply not complying with the instructions. Some companies, like OpenAI and Anthropic, have been accused of this behavior despite claiming to respect "do not crawl" instructions websites put in their robots.txt files.