Unsurprisingly this article confuses the issue somewhat by also talking about training models on content. I understand why that's in there - it's a hot topic, especially in the UK right now - but I don't think it's directly relevant to this complaint.
The note about robots.txt is interesting - "The BBC said in its letter that while it disallowed two of Perplexity's crawlers, the company "is clearly not respecting robots.txt".
Perplexity describe their user-agents here: https://docs.perplexity.ai/guides/bots
I had a look at https://www.bbc.com/robots.txt and it does indeed block both PerplexityBot ("designed to surface and link websites in search results on Perplexity" - I think that's their search index crawler) and Perplexity-User ("When users ask Perplexity a question, it might visit a web page to help provide an accurate answer and include a link to the page in its response").
But... I checked the Internet Archive for a random earlier date - Feb 2025 - https://web.archive.org/web/20250208052005/https://www.bbc.c... - and back then the BBC were blocking PerplexityBot but not Perplexity-User.
> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.
[1]https://www.tomshardware.com/tech-industry/artificial-intell...
Normally the expecation is that the user-agent faithfully presents the content it fetched.
If I make a browser that fetches bbc.com, and strips away ads and presented it to users - I would expect BBC to not like it and block the user-agent from accessing it. It isnt a robots.txt thing. It is a user-agent thing.
> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.
...has been added sometime between 30.01.2025[0] and 07.02.2025[1], and makes it sound like robots.txt was not respected by that bot anyways.
[0]: https://web.archive.org/web/20250130164401/https://docs.perp...
[1]: https://web.archive.org/web/20250207113929/https://docs.perp...
Unless perplexity has a way to indirectly pay writers the way google does, this is very rich
> four popular AI chatbots - including Perplexity AI - were inaccurately summarising news stories, including some BBC content.
One of the interesting things about the failures of LLMs is that news sources have become more concise and more authoritative. Even google fails to get facts right with its AI summaries, so one is compelled even more to go read the website instead. And I'm not sure if LLMs will ever be able to grasp true from lies.
For example I like to watch F1 and I like to know the times for all sessions in my timezone during the weekend.
It's surprisingly hard to find this information, because the Google search is SEOed to hell and back by sites that hide the information behind endless articles full of irrelevant AI slop and 2 million intrusive ads, and that's if they have it right or at all.
Perplexity wades through all that shit, gives me a neatly formatted table and has never been wrong so far.
So I can see where the BBC is coming from but I also don't really want them to win.
I use it the same way as well, but everytime I use it .. I feel icky. A sense of impending doom.
Imagine a book summaries service, that helped users not buy any books ever. What is the incentive for a writer to write a book, when they know that in ~mins, the summary of the work will be available on a different site.
News sites are unique in that the value they provide, for the most part, is the realtime-ness of it. BBC reporting on latest in London is the work of soo many journalists and if Perplexity sidesteps that - BBC has no incentive (and in the future, money) to do that work. It kills BBC, and it ultimately kilss Perplexity.
So yes, Perplexity is playing a very dangerous short term game, and BBC is right in suing them.
> BBC is coming from but I also don't really want them to win.
If BBC doesnt win, BBC (and other sites that "produce" information) dies which kills Perplexity.
A very old argument: If you don't want people scraping or downloading your content don't put it on the (public) Internet!
Imagine we had LLM-like functionality in the 1980s: Sony announces a new VCR that can read a recorded news show and print out a summary on a connected Imagewriter II. People start using it to summarize the publicly-broadcast BBC news programs.
Today's scenario would be like the BBC sues Sony for providing that functionality.
1000000x'ing fair use... might no longer be fair use.
The balances between society and copyright need to change when scale changes drastically.
To address the elephant in the room -- what happens when there are only leachers and no sources, because we've let them hijack first-party news revenue without creating a replacement?
esskay•4h ago
That's got to be the most delusional response they could've given. It's not BBC or any other news publishers job to preserve Google's monopoly. The comparison would only even work if Google was replacing a link to a BBC article in the search results with a direct copy of said article on the Google search results page.
oneeyedpigeon•4h ago
randall•3h ago