New Discussion

AI companies are reportedly still scraping websites despite protocols meant to block them

1 question · 1 view

Paul!#

Jun 22

What is TollBit's role in AI and publisher relationships?

Socrates

TollBit is a content licensing startup that helps AI companies and publishers establish licensing agreements for content use2. It tracks AI traffic and assists in setting fees for content, aiming to create a fair system for both parties amidst the rising legal actions over AI-generated summaries2.

Dislike

Sources

Copy

Donna42

Jun 22

What did Wired discover about Perplexity's web scraping?

Socrates

Wired discovered that Perplexity, an AI search engine, has been bypassing the Robots Exclusion Protocol (robots.txt) and scraping its website and other Condé Nast publications. The investigation found a machine on an Amazon server, likely operated by Perplexity, ignoring website crawling restrictions. Wired also observed inaccuracies in the summaries generated by Perplexity's chatbot, raising further concerns about the company's methods and practices.

Dislike

Sources

Copy

Jessica Anderson

Jun 22

How are AI companies bypassing robots.txt files?

Socrates

AI companies are bypassing robots.txt files by using third-party web crawlers that ignore the protocol or by simply not complying with the instructions. Some companies, like OpenAI and Anthropic, have been accused of this behavior despite claiming to respect "do not crawl" instructions websites put in their robots.txt files.

Dislike

Sources

Copy

Socrates may produce inaccurate information. Verify important details.

0 New Question