frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Google just cut off 90% of the internet from AI – no one's talking about it

https://www.reddit.com/r/ArtificialInteligence/s/spZ9qh0Ia1
13•alexgotoi•4mo ago

Comments

barbazoo•4mo ago
> You can no longer view 100 results at once. The new hard limit is 10.

Does Google not support lazy-loading more results or is that not supported via API or what's going on here?

asdff•4mo ago
Results are certainly thinner than in years past when you could seemingly inspect the entire crawled corpus. You can search for a pretty broad topic and hit the wall pretty fast today. I think they are limiting the depth of queries these days probably owing to search volume and the size of cache they can sustain what with current webdev standards. It was a different story when websites were a few kb to mb 20 years ago even though storage is "cheaper" today.
fooker•4mo ago
Writing a web crawler is not too complicated.

I predict every 'AI' company will have a homegrown search engine in a few months to account for this.

The way this would be publicly usable is through the new generation of 'AI' browsers.

toomuchtodo•4mo ago
https://commoncrawl.org/
aiauthoritydev•4mo ago
It is not crawling but indexing is the problem. Google has over years learned the patterns and authority of different articles. It will be hard for others to replicate but not impossible.

What Google should do is offer API based access to these providers but a lot of these providers might no adhere by contracts. So there is that.

fooker•4mo ago
Indexing is a fairly well understood technology.

You could hire one or experts for this to be doable with a pretty good amount of scalability.

afavour•4mo ago
Crawling isn't complicated. But ranking? That was Google's reason for existence for a very long time. Remains to be seen if AI companies will be able to replicate that.
fooker•4mo ago
Here's the interesting part - ranking matters when humans are looking at the results.

For a bot with a large context window though, not so much.

jdale27•4mo ago
Clickbait title. They only cut off the AI that was using Google as their crawler, which was not a good idea in the first place. I’d love to ask the developers of these AIs: what exactly did you expect to happen here?
pityJuke•4mo ago
i thought most of the major ai vendors (excl google) used their own crawlers and indexes, or licensed from a non-google company
fooker•4mo ago
> used their own crawlers and indexes

Not yet, but will eventually for sure.

If you're an expert who has worked on Google search or something like that, this would b great time to start a company for this.

aiauthoritydev•4mo ago
Clickbait article.
weinzierl•4mo ago
"Most large language models like OpenAI, Anthropic, and Perplexity rely directly or indirectly on Google's indexed results to feed their retrieval systems and crawlers."

Is this true?

I thought OpenAI was using Bing. Gemini obviously will use Google but to them the restriction does not apply. Claude says it uses Brave.

jug•4mo ago
Or OAI-SearchBot for "web search" feature to augment queries and GPTBot for training?

I swear I've even read how aggressive GPTBot is. Surely they aren't just googling stuff?

https://platform.openai.com/docs/bots

LargoLasskhyfv•4mo ago
AI too stupid to scroll, or what?