Tech News
← Back to articles

Perplexity gives Apple new reason not to acquire the AI company

read original related products more articles

Perplexity has long been accused of deliberately bypassing anti-scraping measures to retrieve web content. While the company has historically dismissed these accusations as disingenuous or misunderstandings, a new report shows that not only is the practice still happening, but it may actually be getting worse.

Perplexity’s main counter-argument: semantics

The issue with Perplexity’s web crawling practices first came to light in June 2024, when Wired and other media outlets accused the company of ignoring the Robots Exclusion Protocol, and pulling content from their websites.

At the time, Perplexity CEO Aravind Srinivas said the culprit was an unnamed third-party web crawling vendor, and that there was “a basic misunderstanding of the way this works.”

It wasn’t long before other publications started accusing Perplexity of plagiarism and unethical web scraping, with The New York Times and the BBC even issuing legal threats. At the time, Perplexity said the BBC was being “manipulative and opportunistic”, and had a “fundamental misunderstanding of technology, the internet and intellectual property law”.

Since then, Perplexity has repeatedly denied this line of accusation, disputing the definition of crawling and scraping in specific use cases. As Wired reported:

In other words, if a user manually provides a URL to an AI, Perplexity says its AI isn’t acting as a web crawler but rather a tool to assist the user in retrieving and processing information they requested. But to Wired and many other publishers, that’s a distinction without a difference because visiting a URL and pulling the information from it to summarize the text sure looks a whole lot like scraping if it’s done thousands of times a day.

Likewise, Srinivas has promised in the past that the company would make it easier to go to the original source of the content surfaced by their answer engine. However, this does not address the fact that the problem is in the sourcing of information, rather than just how it’s presented.

Cloudflare says Perplexity is going out of its way to go after data it is explicitly being told not to crawl

Today, Cloudflare published a report that claims that even when a server specifically denies all automated access, and includes specific rules that block crawling from Perplexity’s public crawlers, Perplexity reportedly does it anyway.

... continue reading