Waiting for dawn in search: Search index, Google rulings and impact on Kagi

https://blog.kagi.com/waiting-dawn-search

485•josephwegner•2w ago

Comments

whs•2w ago

>Google: Google does not offer a public search API. The only available path is an ad-syndication bundle with no changes to result presentation - the model Startpage uses. Ad syndication is a non-starter for Kagi’s ad-free subscription model.[^1]

>Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results (SERP meaning search engine results page). These providers serve major enterprises (according to their websites) including Nvidia, Adobe, Samsung, Stanford, DeepMind, Uber, and the United Nations.

The customer list matches what is listed on SerpAPI's page (interestingly, DeepMind is on Kagi's list while they're a Google company...). I suppose Kagi needs to pen this because if SerpAPI shuts down they may lose access to Google, but they may already have utilize multiple providers. In the past, Kagi employees have said that they have access to Google API, but it seems that it was not the case?

As a customer, the major implication of this is that even if Kagi's privacy policy says they try to not log your queries, it is sent to Google and still subject to Google's consumer privacy policy. Even if it is anonymized, your queries can still end up contributing to Google Trends.

xnx•2w ago

> Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results

Crazy for a company to admit: "Google won't let us whitelabel their core product so we steal it and resell it."

direwolf20•2w ago

Pretty standard business practice though. There's no ethics in making money.

Ar-Curunir•2w ago

Strange to pick on Kagi when there's much bigger companies on that list.

xnx•2w ago

Those companies allegedly have used SerpAPI (probably to check visibility), but not to resell a Google Search knock-off.

manquer•2w ago

> knock-off

Is it though? It feels so better than Google results[1], while being still built partly with Google results.

In the last 3 years as a Kagi customer i have rarely if ever felt the need to use bangs !g and on few occasions i did use them, it was with instant regret.

In the previous decade or so using DDG, using bangs !g Google would be 30-50% of searches, i would have to consciously try the results first instead of starting with !g and then think to myself DDG was at least getting the query data to improve their results.

[1] While the de-cluttered UI is a relief, on just the results list comparison, Google search is so bad that less time saved in not redrafting the queries constantly, filtering out the spam, the AI summaries, sponsored content, all the "cards" , recommended search listicles on is worth more than the $10/month.

direwolf20•2w ago

DDG's results are primarily Bing results while Kagi's results are primarily Google results. It makes sense that you feel the need to escape from Bing to Google more often than from Google to Google.

manquer•2w ago

Perhaps, but each time I do go to Google the results are painfully bad. I don't think Kagi is just proxying Google- while they do use it as a core source, they rerank much better

The blog post talks about that specifically - Bing unlike Google does have Index licensing program but their terms forbid reordering that is key reason Kagi is not also using Bing in their index mix.

My point is Kagi is similar to a car tuning company like Hennessey, Brabus they take a base product and make it much better for a premium, they are not selling knock-offs.

shadowgovt•2w ago

But in this current climate, they can admit it and then dare Google to tell them to stop... After Google has just had an antitrust ruling against it for dominating the search market.

Google doesn't really have a leg to stand on and they know it.

techjamie•2w ago

What's the alternative? Building a competing search index as a relative nobody on the web is very difficult, from the outset, and is made more difficult from sites taking extra measures to stop bots in general now.

Google's crawler is given special privileges in this right and can bypass basically all bot checks. Anyone else has to just wade through the mud and accept they can't index much of the web.

eli•2w ago

Seems like an open question as to whether that violates any laws.

Another way to look at it is that if you publish a service on the web, you have limited rights to restrict what people do with it.

Isn't that the logic Google search relies on in the first place? I didn't give permission for Google to crawl and index and deep link to my site (let alone summarize and train LLMs on it). They just did it anyway, because it's on a public website.

malfist•2w ago

Google's stance is "I can copy you and you can't stop me" as well as "You can't copy me, I'll sue you"

GuB-42•2w ago

Maybe it has changed but Google doesn't look like it uses litigation as its primary weapon. It defends itself but rarely attacks.

The are however more than happy to use technical measures, like blocking accounts. And because of their position, blocking your Google account may be more damaging than a successful lawsuit.

ancillary•2w ago

Google at least claims that noindex will keep your site from getting crawled [1]. Do people think this is false?

[1] https://developers.google.com/search/docs/crawling-indexing/...

eli•2w ago

Strictly speaking no, that doesn’t prevent crawling - at the least Googlebot has to fetch the page to see the meta tag or the robots.txt to see what’s allowed, and it will periodically recheck for changes.

It doesn’t even prevent indexing. If a page is linked from elsewhere, Google will show it in search results even if noindex’d.

And why does Google get to set these rules on my site anyway? I didn’t agree to them.

roywiggins•2w ago

Is it much different than what Google AI Summaries do?

timeon•2w ago

Even the article posted (and search itself) has Google IP address.

zhfanlqeo•2w ago

Crying to Big Daddy Government because those other mean companies won't give away their secret sauce is pretty lame and doesn't make me want to reinscribe.

postexitus•2w ago

It is basic antitrust practice. If a company starts to control a vertical so much so that they start to exclude others, they get broken up into components and ordered to offer the basic infrastructure service to others. This is how it worked for 100 years (read up on telecoms/fiber; train companies/railroads; heck, even roads used to belong to people in the UK). This is why we have net neutrality - I recommend Tubes by Andrew Blum to go the heart of the matter. Imagine Internet if Google was able to throttle other services if you are not using their own? Here the author is arguing the search index is like infra that needs to be shared for public good. The state will not confiscate it - Google will break it into an independent company, will start paying for it, and let others to pay as well. It's not whitelabeling, stealing and reselling. Gosh - just read a bit people.

direwolf20•2w ago

I hope they cache search results to further reduce the number of calls to Google.

And Marginalia Search was not mentioned? Marginalia Search says they are licensing their index to Kagi. Perhaps it's counted under "Our own small-web index" which is highly misleading if true.

packetlost•2w ago

The index is not necessarily the code, but the dataset. IMO it would be better to be more open about the technical stack, but I don't think this feels dishonest to me.

xnx•2w ago

> "Our own small-web index"

Has Kagi ever said what this is? I wouldn't be at all surprised if it is just kagi.com pages or a download of Wikipedia.

z64•2w ago

https://github.com/kagisearch/smallweb

jrmg•2w ago

From that:

——

Criteria for posts to show on the website

If the blog is included in small web feed list (which means it has content in English, it is informational/educational by nature and it is not trying to sell anything) we check for these two things to show it on the site:

- Blog has recent posts (<7 days old)

- The website can appear in an iframe

——

Emphasis mine. Restricting visibility to blogs that post at least every week doesn’t feel very ‘small web’ to me.

marginalia_nu•2w ago

I believe it was formerly run under the name Teclis[1]. Reportedly they took it down for a while but now it's apparently back up. Has quite an extensive writeup on how it operates on the page.

[1] https://teclis.com/

z64•2w ago

There is a practical limit that we can't cache results for too long; Search engine users are particularly sensitive to stale data, especially around current events. Without a holistic and realiable way to know when the cache ought to be invalidated, our caching is mostly focused on mitigating "abuse", e.g., someone / bunch of people spamming the same search in a short timespan; no sense in repeating all those upstream calls.

Most "cost saving engineering" is involved in finding cases/hueristics where we only need to use a subset of sources and omitting calls in the first place, without compromising quality. For example, we probably don't need to fire all of our sources to service a query like "youtube" or "facebook".

Marginalia data is physically consolidated into the same infra that we use for small web results in our SERP, but also among other small scale sources besides those two. That line is simply referring directly to https://kagi.com/smallweb (https://github.com/kagisearch/smallweb).

AlienRobot•2w ago

To me, a lot of problems with "building a search engine" don't seem to be problems with "building a search engine," they seem to be problems with "building a Google."

Nobody said a search engine needs to have fresh data, for example. Nor has anybody said a search engine needs to index the entire web. Yet these are two things every search engine tries to do, and then they usually fail to compare with Google.

To put it in another way, the reason why TikTok succeeded against Youtube is exactly because TikTok wasn't trying to be a Youtube.

Nextgrid•2w ago

I don't think TikTok "succeeded" compared to Youtube? TikTok succeeded in popularizing short-form video, but I'd argue that's a different product. YouTube is still king for longform video.

While there might be arguments for building a different product (and LLM-based search like Perplexity is trying it), there appears to be enough demand for a "good Google" that Kagi is trying to address.

direwolf20•2w ago

If the product is long-form video then sure... but if the product is user attention? Film camera companies still make the best film cameras, but is the product film cameras or is it taking photos?

terribleperson•2w ago

I'll say that a search engine needs to have fresh data. When I search for a phrase from a reddit thread I saw earlier, I want that exact thread to be in the results.

When I search for a brand new restaurant, I want to see a map entry for that restaurant and a link to a newspaper article, ad, or facebook post announcing the opening of that restaurant (though I probably won't click on the third).

OGEnthusiast•2w ago

Sounds like we need a nationalized search engine company then?

browningstreet•2w ago

I wouldn't trust a nationalized search engine company.

That said, there are projects like Common Crawl and in Europe, Ecosia + Qwant.

I personally would like to see a search enginge PaaS and a music streaming library PaaS that would let others hook up and pay direct usage fees.

shadowgovt•2w ago

An interoperable search index access standard might work. We've done something similar for peering and the backbone of the IP-layer interconnects themselves.

direwolf20•2w ago

You have to make it economically preferable, and there's No known solution to this. Large networks are still using their positions to bully smaller ones off the IP-layer internet backbone.

NitpickLawyer•2w ago

> and in Europe, Ecosia

I tried. It's just not good enough. Quick example: yesterday I set up a workstation with Ubuntu, wanting to try out wayland. One of the things I wanted was to run an app (w/ gui) from another (unprivileged) user under my own user. Ecosia gave me bad old stuff. Tried for a few minutes, nothing useful. Switched to google, one of the first results was about waypipe. Searched waypipe on ecosia. 1 and a half pages of old content. Glaringly, not one of those results was the ubuntu.manpages entry on waypipe. shrug

g947o•2w ago

So the entire search result comes from Truth Social and Grokipedia. No thanks

ajdude•2w ago

Does anyone else use the phrase "I'm going to google XYZ" while referring to actually searching it up on Kagi, DDG, or another search engine?

jeremyjh•2w ago

Yes, it’s like Xerox or Kleenex except it’s actually still a monopoly. In a happy Kagi user but I know hardly anyone else is.

dijksterhuis•2w ago

nope, i say “i’m going to search for XYZ” or similar

eli•2w ago

Ironically this is a bad thing for Google from a legal standpoint. If a term becomes "genericized" then it can lose trademark protection.

"Aspirin" is a famous example. It used to be a brand name for acetylsalicylic acid medication, but became such a common way to refer to it that in the US any company can now use it.

1-more•2w ago

Apparently the "lost in the Treaty of Versailles" explanation is a bit of a just-so story: https://history.stackexchange.com/questions/55729/why-did-ba...

pixl97•2w ago

Yes, but more in the past than now, simply because almost everybody seems to use google itself.

For example I'd hear people say "I'll Google that", then use Yahoo when they were still a major search engine.

shervinafshar•2w ago

I've been using Kagi for the past few years, but I try to use a brand-agnostic language talking about web search; e.g. "I'm gonna search [the web] for it"; "Use your favorite search engine to look it up".

filterfish•2w ago

Likewise and if people say, "why don't you google that?" I usually reply (obviously to everyone's annoyance:-) "I don't use Google". The general response is a blank, uncomprehending look.

cormorant•2w ago

To a young enough audience, you will sound like you exclusively use ChatGPT instead.

kqr•2w ago

I used to. Even when I actually used DDG. Now that I use Kagi (and thus am on the second web search service after I stopped using Google) it started to feel silly so I say "search the web" these days.

dooglius•2w ago

Yeah, I don't feel the need to have conversations go on a tangent about explaining what Kagi is

bronson•2w ago

Now my family usually says "I'm going to ask AI."

matkoniecz•2w ago

yes, me

201984•2w ago

I do

Wilder7977•2w ago

In Italian verbs for foreign words are almost always generated from the first conjugation (-are), which means "to google" is actually "googl-are".

With kagi, one cannot miss the opportunity to generate a similar verb " kag-are", which sounds exactly like "going number 2" (in a relatively rude way), which is what I ironically use every time I decide not to use the generic "search" verb. I consider it one of the minor benefits of being a kagi user!

smsm42•2w ago

Nope. I stopped using google for search many years ago, and stopped using it as a verb about the same time.

I admit I've used something like "are you banned on Google or what?" a couple of times though.

hsuduebc2•2w ago

It is even worse that the Google search become shit in last years. So they gate keep only relevant information for themselves and not using them with intent to improve search quality. As always if you have no competition your innovation goes only towards cost reduction. Not product improvement.

warkdarrior•2w ago

If Google Search is shit, why does Kagi want access to it?

JaggedJax•2w ago

They want access to the index. They will perform their own sorting to determine the best results to show from that index.

b3kart•2w ago

…without having advertiser interests to cater to.

WhyNotHugo•2w ago

The statistics in this article sound like garbage to me.

Google used by 90% or the world?

~20% of the human population lives in countries where Google is blocked.

OTOH, Baidu is the #1 search engine in China, which has over 15% of the world’s population… but doesn’t reach 1%?

These stats are made measuring US-based traffic, rather than “worldwide” as they claim.

0x1ch•2w ago

Google is only blocked in places where it would already be hard for a company with morals to work in, if not outright blocked as well. This probably represents traffic globally, excluding those places.

Instead of downvoting blindly, please state which countries are currently blocking Google that would willingly allow Kagi, a AI/Privacy focused search engine company to exist in their domain? The results may surprise you!

direwolf20•2w ago

Google is not blocked in the USA.

0x1ch•2w ago

Interesting. I'm in the US and use Kagi everyday.

dylan604•2w ago

I read it more as "company having morals". Not many US companies have "morals".

0x1ch•2w ago

Google doesn't, Kagi seems to (hopefully). I meant this more as a jab at countries willing to block Google, as they're generally dictatorships / authoritarian in nature. Oh the irony, as an american saying this in 2026....

lostlogin•2w ago

They do, but do they align with your own?

It’s difficult, because Kagi results are so good and the alternatives are often business that behave worse.

https://news.ycombinator.com/item?id=42349797

direwolf20•2w ago

I've heard Yandex results are good, and they often avoid Google bias. I have no moral problem with Kagi integrating results from Yandex and Google.

0x1ch•2w ago

The yandex thing seems to be an argument made in bad faith at this point.

mrweasel•2w ago

So you use Google. Kagi still rely on Googles index.

PunchTornado•2w ago

there is no where in the article where they mention this (not even an * saying what they exclude).

they present numbers and say "world" like whole countries and groups of people don't matter. very arrogant.

elektronika•2w ago

Google and Facebook would be very happy to operate in China, but they're too closely tied to the US intelligence apparatus to agree to the terms that China requires.

g947o•2w ago

For years Facebook wanted to get into Chinese market, so much that Zuckerberg asked Xi Jinping to name his child: https://www.the-independent.com/news/people/china-s-presiden...

No I didn't make this up.

And there was reporting like this: https://www.msn.com/en-us/news/world/zuckerberg-s-meta-was-w...

Although a few years they seemed to completely abandon the effort and started to criticize China, although I can't find the article.

You'll be amazed at how quickly Zuckerberg "adapts" to things. Which is why I never trust a single word that comes out of his mouth.

lolc•2w ago

I guess they'd argue that the people in China don't count, because people in China don't get to choose Google. But yeah, the stats they use from "StatCounter" are clearly not representative for what the world uses.

elAhmo•2w ago

You can argue that people outside of China don't get to choose something other than Google. Sure, there are recent pushes with default search engine choices and similar initiatives, but there is a reason why Google is paying hundreds of millions of dollars to be the default search engine.

tyre•2w ago

It’s reasonable to see a distinction between the great firewall and the default browser search engine

manquer•2w ago

Market share is based on factual consumption numbers however subsidized or regulated by a government not free will.

Choice/Free will is an arbitrary line in the sand, one could argue how much choice we have about consuming google search when it is "85-90"% monopolistic business with well documented anti-competitive practices.

Chinese consumers perhaps have more choice than we do, Baidu is only about 60% market share. They do get to choose, it more that Google is not one of the options available to them, it is not like if not Baidu then it is a Phone Book.

weisnobody•2w ago

Yes the stats don't make sense. It appears to be an issue with StatsCounter.

The Search Engine wikipedia article [1] has a section on Russia and East Asia market share, which confirms that the roll up used for world wide counts is off, unless the number of people using the Internet is drastically different in some of the countries.

Russia

  * Yandex: 70.7%
  * Google: 23.3%

China:

  * Baidu: 59.3%
  * Other domestic engines: "smaller shares"
  * Bing: 13.6%

South Korea:

  * Naver: 59.8%
  * Google: 35.4%

Japan: * Google: 76.2% * Yahoo! Japan: 15.8%

[1] https://en.wikipedia.org/wiki/Search_engine#Market_share

dylan604•2w ago

Maybe it's the same logic that says you can lower the prices of things >100%

ivanjermakov•2w ago

To be fair, Kagi won't be used in China either.

faitswulff•2w ago

I have used it from China, actually. Not big enough to be blocked.

yomismoaqui•2w ago

One thing I have discovered after using AI chats that include a websearch tool is that I don't want to delve on diferent blogs, Medium posts, Stack overflow threads with passive-aggresive mod comments, dismissing cookie banners... Sorry I just want the info I'm looking for, I don't care for your personal expression or need to monetize your content.

There are other times (usually not work related) when I want to explore the web and discovering some nice little blog or special corner on the net. This is what my RSS feed reader is for.

kqr•2w ago

With Kagi you can opt in to an LLM summary of the search result by appending a question mark to the query. It's a neat mechanism when it works!

ghm2199•2w ago

> Building a comparable one from scratch is like building a parallel national railroad..

Not too be pedantic here but I do have a noob question or two here:

1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it — like they did with LLM training for base models with the infamous "pile" dataset — because the upshot of offering this index for public good would break not just google's own monopoly but also other monopolies like android, which will introduce a breath of fresh air into a myriad of UX(mobile devices, browsers, maps, security). So, why don't they just do this already?

2. The other question is about "control", which the DoJ has provided guidance for but not yet enforced. IANAL, but why can't a state's attorney general enforce this?

hsuduebc2•2w ago

I don’t think it’s comparable to today’s AI race.

Google has a monopoly, an entrenched customer base, and stable revenue from a proven business model. Anyone trying to compete would have to pour massive money into infrastructure and then fight Google for users. In that game, Google already won.

The current AI landscape is different. Multiple players are competing in an emerging field with an uncertain business model. We’re still in the phase of building better products, where companies started from more similar footing and aren’t primarily battling for customers yet. In that context, investing heavily in the core technology can still make financial sense. A better comparison might be the early days of car makers, or the web browser wars before the market settled.

ghm2199•2w ago

> ... stable revenue from a proven business mode... In that game, Google already won.

But if they were to pour that money strategically to capture market share one of two things would happen if google was replaced/lost share:

1. it would be the start of the commoditization of search. i.e. search engine/index would become a commodity and more specialized and people could buy what they want and compete.

2. A new large tech company takes rein. In which case it would be as bad as this time.

Like what I don't get is that if other big tech companies actually broke apart monopoly on search, several google dominos in mobile devices, browser tech, location capabilities would fall. It would be a massive injection of new competition into the economy, lots of people would spend more dollars across the space(and ad driven buying too) money would not accrue in an offshore tax haven in ireland

To play the devils advocate, I think the only reason its not happening is because meta, apple, microsoft have very different moats/business models to profit off. They all have been stung one time or another is small or big ways for trying to build something that could compete but failed. MS with bing, Meta with facebook search, Foursquare — not big tech but still — with Maurauder's Map.

hamdingers•2w ago

> If other tech companies really wanted to break this monopoly, why can't they just do it

Google is a verb, nobody can compete with that level of mindshare.

wongarsu•2w ago

Xerox is a verb, but most copy machines I see are made by their competition

hamdingers•2w ago

Wonder why that could be?

https://www.nytimes.com/1975/07/31/archives/xerox-settlement...

eikenberry•2w ago

Kleenex isn't the only brand of tissues sold in stores.

Zyst•2w ago

So were AOL, and Skype

dylan604•2w ago

I don't ever recall anyone using AOL as a verb. How would you do that?

terespuwash•2w ago

Let me AOL this for you

dylan604•2w ago

said no one ever

mgiampapa•2w ago

You clearly did not live in the world of watching two teens on computers in the same room hold two entirely different conversations out-loud and over AIM.

observationist•2w ago

A big part of it is about the legal minefield if you presented any sort of real threat to Google. Nobody wants to wager billions in infrastructure and IP against Google or Apple or Microsoft, even if you could whip up a viable competing product in a weekend (for any given product.)

Part of it is also the ecosystem - don't threaten adtech, because the wrong lawsuits, the wrong consumer trend, the wrong innovation that undercuts the entire adtech ecosystem means they lose their goose with the golden eggs.

Even if Kagi or some other company achieves legitimate mindshare in search, they still don't have the infrastructure and ancillary products and cash reserves of Google, etc. The second they become a real "threat" in Google's eyes, they'd start seeing lawsuits over IP and hostile and aggressive resource acquisitions to freeze out their expansion, arbitrary deranking in search results, possible heightened government audits and regulatory interactions, and so on. They have access to a shit ton of legal levers, not to mention the whole endless flood of dirty tricks money can buy (not that Google would ever do that.)

They're institutional at this point; they're only going away if/when government decides to break it up and make things sane again.

cowsandmilk•2w ago

Licensing their index doesn’t change that.

iamacyborg•2w ago

How’s that working out for Hoover in the UK?

xnx•2w ago

> If other tech companies really wanted to break this monopoly, why can't they just do it

Companies would rather sue than try and compete by investing their own money.

walls•2w ago

A huge amount of the web is only crawlable with a googlebot user-agent and specific source IPs.

Imustaskforhelp•2w ago

> And given you-know-what, the battle to establish a new search crawler will be harder than ever. Crawlers are now presumed guilty of scraping for AI services until proven innocent.

I have always wondered but how does wayback machine work, is there no way that we can use wayback archive and then run a index on top of every wayback archive somehow?

ghm2199•2w ago

You can read https://hackernoon.com/the-long-now-of-the-web-inside-the-in... it was a nice look into their infra structure. One could theoretically build it. A few things stand out:

1. IIUC depends a lot on "Save Page Now" democratization, which could work, but its not like a crawler.

2. In absence of alexa they depend quite heavily on common crawl, which is quite crazy because there literally is no other place to go. I don't think they can use google's syndicated API, cause they would then start showing ads in their database, which is garbage that would strain their tiny storage budget.

3. Minor from a software engineering perspective but important for survival of the company: since they are an artifact of record storage, to convert that to an index would need a good legal team to battle google to argue. They do that the DoJ's recent ruling in their favor.

deepsquirrelnet•2w ago

I do not know a lot about this subject, but couldn’t you make a pretty decent index off of common crawl? It seems to me the bar is so low you wouldn’t have to have everything. Especially if your goal was not monetization with ads.

ghm2199•2w ago

I think someone had commented on another thread about SerpAPI the other day that common crawl is quite small. It would be a start, I think the key to a good index people will use is freshness of the results. You need good recall for a search engine, precision tuning/re-ranking is not going to help otherwise.

charcircuit•2w ago

If a crawler offered enough money they could be allowed too. It's not like Google has exclusive crawling rights.

Nextgrid•2w ago

There is a logistics problem here - even if you had enough money to pay, how would you get in touch with every single site to even let them know you're happy to pay? It's not like site operators routinely scan their error logs to see your failed crawling attempts and your offer in the user-agent.

Even if they see it, it's a classic chicken & egg problem: it's not worth the time of the site operator to engage with your offer until your search engine popular enough to matter, but your search engine will never become popular enough to matter if it doesn't have a critical mass of sites to begin with.

charcircuit•2w ago

Realistically you don't need every single site on board before you index becomes valuable. You can get in touch with sites via social media, email, discord, or even visiting them face to face.

stavros•2w ago

You really do need every single site, as search is a long tail problem. All the interesting stuff is in the fringes, if you only have a few big sites you'll have a search engine of spam.

charcircuit•2w ago

I think that is only needed for a small subset of queries. Seriously think of the last time you did a search and went to a fringe site as opposed to a well known brand or social media. Ranking quality is much more important than coverage over the whole internet.

stavros•2w ago

> Seriously think of the last time you did a search and went to a fringe site as opposed to a well known brand or social media.

Oh, almost never. That's exactly why search sucks now.

hattmall•2w ago

Are these websites not serving public content? If there's some legal concerns just create a separate scraping LLC that fakes user agent and uses residential IPs or VPN or something. I can't imagine that the companies would follow through with some sort of lawsuit against a scraper that's trying to index their site to get them more visitors, if they allow GoogleBot.

ddtaylor•2w ago

Isnt that what SerpAPI was doing?

paxys•2w ago

Apple had a chance to break Google's search monopoly, but they chose to take billions from them instead.

Microsoft had a chance (well another chance, after they gave up IE's lead) to break up Google's browser monopoly, but they decided to use Chromium for free instead.

Ultimately all these decisions come down to what's more profitable, not what's in the best interests of the public. We have learned this lesson x1000000. Stop relying on corporations to uphold freedoms (software or otherwise), becuase that simply isn't going to happen.

charcircuit•2w ago

>but they chose to take billions from them instead.

They chose to use Google with a revenue sharing agreement. Google is very well monetized. It would be very difficult for Apple to monetize their own search as good as Google can.

>they decided to use Chromium

Windows ships with Microsoft Edge as the browser which Microsoft has full control over.

KellyCriterion•2w ago

Scraping is hard. Very good scraping is even harder. And today, being a scraping business is veeery difficult; there are some "open"/public indices, but none of these other indices ever took off

ghm2199•2w ago

Well sure yes, I don't contend with the fact that its hard, but if the top tech companies joined their heads I am sure if for example, Meta, Apple, MS have enough talent between to make an open source index if only to reap gains from the de-monopolization of it all.

Imustaskforhelp•2w ago

I mean, doesn't microsoft have bing?

ghm2199•2w ago

Yeah but no one uses it. I am not even sure people that are forced to use it like using it because it was productized it pretty poorly. After all who wants another google? They invested 100 Billion dollars, which is a lot of wasted money TBH.

Search indexes are hard, surely, but if you were to strip it to just a good index on the browser, made it free, kept it fresh, it cannot be 100 billion dollars to build. Then you use this DoJ decision and fight against google to not deny a free index to have equal rights on chrome you can have a massive shot at a win for a LOT less money.

Imustaskforhelp•2w ago

> Yeah but no one uses it. I am not even sure people like using it because it was productized it pretty poorly. They invested 100 Billion dollars, which is a lot of wasted money TBH.

I mean... Duckduckgo uses bing api iirc and I use duckduckgo and many people use duckduckgo.

I also used bing once because bing used to cache websites which weren't available in wayback archive, I don't know how but It was pretty cool solution for a problem.

I hate bing too and I am kind of interested in ecosia/qwant's future as well (yes there's kagi too and good luck to kagi as well! but I am currently still staying on duckduckgo)

ghm2199•2w ago

Duck duck go is really cool. I am almost fully rooting for them and they are my default mobile and web browser.

The small distributed team grinding it out against the goliath. They are awesome and perhaps the right example of what a path like this would look like. Maybe someone from their team can chime in on the difficulties of building a search engine that works in the face of tremendous odds.

direwolf20•2w ago

DDG is mostly just an anonymizing proxy for Bing. Microsoft encourages it because it increases Bing's market share over Google.

dylan604•2w ago

I would imagine the users of DDG to be closer to a rounding error than an actual percentage of users. I'd imagine theGoog would love and hate to have 100%. They'd love it because all the data, and hate it as it would prove the monopoly. At the end of the day, the % that is not going to them probably doesn't cause theGoog to lose much sleep

Imustaskforhelp•2w ago

It's just so wild how great Duckduckgo is & how under-rated it is.

It's available in all major browsers (Here in zen browser, it doesn't even have a default browser but rather on the start page it asks between the three options, google duckduckgo and bing but yes if you press next it starts from google but zen can even start from ddg, its not such a big deal)

Duckduckgo is super amazing. I mean they are so amazing and their duck.ai or ai actually provides concise data instead of Google's AI

DDG is leaps ahead of Google in terms of everything. I found Kagi to be pleasant too but with PPP it might make sense in Europe and America but privacy isn't/ shouldn't be the only who only pays. So DDG is great for me personally and I can't recommend it enough for most cases.

Brave/Startpage is a second but DDG is so good :)

It just works (for most cases, the only use case I use google is for uploading images to then get more images like this or use an image as a search query and I just do !gi and open images.google.com but I only use this function very rarely, bangs are amazing feature by ddg)

dylan604•2w ago

I use DDG myself. I just assumed that I'm not a very sophisticated user as I've never had it not serve my needs based on how other people here say it's not very good.

8bitsrule•2w ago

>I've never had it not serve my needs

Same here. It may be 'not very good' for highly specialized or complex technical questions ... but I do research across a broad range of (non-specialized) topics daily. I often need to find 2nd and 3rd points of view on a topic ... or detailed facts about singular events ... and I rarely need to go to the 2nd page. And all ad-free!

It's a remarkable education tool. A curious, explorative kid these days could easily sail WAY beyond their age group using DDG. I can only wish I'd had it.

Their recently added 'Search assistant' consistently provides a couple of CITATIONS to backup its (multi-leveled) responses (Ask for more, get more.) I've seen nothing like it elsewhere. It is even quite good at diggin up useful ... and working ... example code for some languages. Also with citations.

direwolf20•2w ago

DDG is just an anonymizing front-end for Bing. Your DDG results are Bing results.

antiframe•2w ago

Then the anonimization is a key component of their goodness. When I compare searches between Bing and DDG I find the DDG ones superior every time.

Nextgrid•2w ago

All these companies have the exact same business model as Google (advertising) and have the same mismatched incentives: good search results are not something they want.

Google Search sucks not because Google is incapable of filtering out spam and SEO slop (though they very much love that people believe they can't), but that spam/slop makes the ads on the SERP page more enticing, and some of the spam itself includes Google Ads/analytics and benefits them there too.

There is no incentive for these companies to build a good search engine by themselves to begin with, let alone provide data to allow others to build one.

alex1138•2w ago

I was on the Goog forums for years (before they even fucking ruined the FORMAT of the forums, possibly to 'be more mobile friendly') and it was people absolutely (justifiably) screaming at the product people

No, the customer isn't 'always' right, but these guys like to get big and once big, fuck you, we don't have to listen to you, we're big; what are you going to do, leave?

zanderz•2w ago

I learned on here that this has been happening to a degree with maps. Several big companies have been cooperating to improve open street map data, a rare example of a beneficial commons. This is probably some unique accident of incentives and timing and history but maybe it could happen in other domains.

t_mahmood•2w ago

They will prefer to band up with Google, and rip us off.

renegat0x0•2w ago

Scraping is hard, and is not hard that much at the same time. There are many projects about scraping, so with a few lines you can do implement scraper using curl cffi, or playwright.

People complain that user-agent need to be filled. Boo-hoo, are we on hacker news, or what? Can't we just provide cookies, and user-agent? Not a big deal, right?

I myself have implemented a simple solution that is able to go through many hoops, and provide JSON response. Simple and easy [0].

On the other hand it was always an arms race. It will be. Eventually every content will be protected via walled gardens, there is no going around it.

Search engines affect me less, and less every day. I have my own small "index" / "bookmarks" with many domains, github projects, youtube channels [1].

Since the database is so big, the most used by me places is extracted into simple and fast web page using SQLite table [2]. Scraping done right is not a problem.

[0] https://github.com/rumca-js/crawler-buddy

[1] https://github.com/rumca-js/Internet-Places-Database

[2] https://rumca-js.github.io/search

SyneRyder•2w ago

+1 so much for this. I have been doing the same, an SQLite database of my "own personal internet" of the sites I actually need. I use it as a tiny supplementary index for a metasearch engine I built for myself - which I actually did to replace Kagi.

Building a metasearch engine is not hard to do (especially with AI now). It's so liberating when you control the ranking algorithm, and can supplement what the big engines provide as results with your own index of sites and pages that are important to you. I admit, my results & speed aren't as good as Kagi, but still good enough that my personal search engine has been my sole search engine for a year now.

If a site doesn't want me to crawl them, that's fine. I probably don't need them. In practice it hasn't gotten in the way as much as I might have thought it would. But I do still rely on Brave / Mojeek / Marginalia to do much of the heavy lifting for me.

I especially appreciate Marginalia for publicly documenting as much about building a search engine as they have: https://www.marginalia.nu/log/

carte_blanche•2w ago

Do you have any documentation/blog post for this? I would love to do something similar for my own use.

SyneRyder•1w ago

Unfortunately I still haven't had a chance to make a blog post about this, which I really must do. But I can give some quick hints. Anyone reading this, feel free to reach out & I can try to answer questions, and they might help my blog post too.

I started off with a meta-search calling out to Brave / Mojeek / Marginalia, and the basics of that are something that you can ask an AI to make for you as a one-file PHP script. I still think this is a good place to start, because you'll quickly find "okay, I can replace my everyday search engine with this". Once you're dogfooding your engine every day, you'll notice all the rough points you want to improve.

Once you've got an array of objects with Title, URL, Description, and splitting the URL into domain, TLD, subdomain, path, file extension... there's a lot of ranking you can apply just to those. Honestly, a lot of my "ranking" has just been applying increased rankings to domains that I visit most often. I have an array of about 600 domains that it applies ranking boosts to. You can try experimenting with your re-ranking there, before even starting to build your own index.

As for building your own (small, personal) index, the technical details are not as difficult as you'd think. An SQLite database file, that your PHP file reads, will take you a long way... especially if you enable FTS5 indexing. I only did that last week, and I should have done that at the beginning. Search times are 10ms, and not just on my personally curated index of 80,000 pages... I just added a 2nd database with 1.3 Million entries from DMOZ (the old Mozilla Directory), and it's still only about 10ms. My search engine now feels super fast when it gets results from my database. And when it finds zero results, it automatically falls back to the metasearch.

At 1.3 Million entries, the two databases are only about 550MB total. It's running on a shared hosting account and apparently they're not worried - but it's only available to me, so I'm only hitting it maybe 50 times a day maximum. I'll move it onto a VPS eventually, but every time I think "this must be using up too many resources", I find I'm thinking too small by at least a factor of 10x.

For getting started with PHP & SQLite, I found this blog post helpful - but at this point, your AI can vibe code the entire thing for you:

https://davidejones.com/blog/simple-site-search-engine-php-s...

It's amazing how far you can get with just SQLite and FTS5 and a little PHP. Read the Marginalia blog too, there's so much good information in there.

Don't hold yourself back, don't think it's impossible.

ghm2199•2w ago

When I saw the Internet-Places-Database I thought it was an index on some sort of PoI and I got curious. But the personal internet spiel is pretty cool. One good addition to this could be the Foursquare PoI dataset for places search: https://opensource.foursquare.com/os-places/

visarga•2w ago

> Search engines affect me less, and less every day. I have my own small "index" / "bookmarks" with many domains, github projects, youtube channels

Exactly, why can't we just hoard our bookmarks and a list of curated sources, say 1M or 10M small search stubs, and have a LLM direct the scraping operation?

The idea is to have starting points for a scraper, such as blogs, awesome lists, specialized search engines, news sites, docs, etc. On a given query the model only needs a few starting points to find fresh information. Hosting a few GB of compact search stubs could go a long way towards search independence.

This could mean replacing Google. You can even go fully local with local LLM + code sandbox + search stub index + scraper.

direwolf20•2w ago

Marginalia Search does something like this

oh_fiddlesticks•2w ago

> 1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it?

FTA:

> Context matters: Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections. Today, publishers “consent” to Google’s crawling because the alternative - being invisible on a platform with 90% market share - is economically unacceptable. Google now enforces ToS and robots.txt against others from a position of monopoly power it accumulated without those constraints. The rules Google enforces today are not the rules it played by when building its dominance.

baggachipz•2w ago

A classic case of climbing the wall, and pulling the ladder up afterward. Others try to build their own ladder, and Google uses their deep pockets and political influence to knock the ladder over before it reaches the top.

dylan604•2w ago

Why does Google even need to know about your ladder? Build the bot, scale it up, save all the data, then release. You can now remove the ladder and obey robots.txt just like G. Just like G, once you have the data, you have the data.

Why would you tell G that you are doing something? Why tell a competitor your plans at all? Just launch your product when the product is ready. I know that's anathema to SV startup logic, but in this case it's good business

Nextgrid•2w ago

Running the bot nowadays is hard, because a lot of sites will now block you - not just by asking nicely via robots.txt, but by checking your actual source IP. Once they see it's not Google, they send you a 403.

eloisius•2w ago

Cloudflare’s ubiquity makes bootstrapping a search index via crawler virtually impossible, but what about data sources like Common Crawl?

monooso•2w ago

Cost, presumably. From the article:

> Microsoft spent roughly $100 billion over 20 years on Bing and still holds single-digit share. If Microsoft cannot close the gap, no startup can do it alone.

kavalg•2w ago

Wouldn't it be nice if Microsoft opened the bing index for all.

direwolf20•2w ago

Don't they? DDG and Kagi use it. I would think you have to pay money but it does seem like they're willing to get partners.

edit: this is wrong

monooso•2w ago

This is incorrect. Kagi does not use the Bing index, as detailed in the article:

> Bing: Their terms didn’t work for us from the start. Microsoft’s terms prohibited reordering results or merging them with other sources - restrictions incompatible with Kagi’s approach. In February 2023, they announced price increases of up to 10x on some API tiers. Then in May 2025, they retired the Bing Search APIs entirely, effective August 2025, directing customers toward AI-focused alternatives like Azure AI Agents.

specialist•2w ago

Now that you mention it...

It's odd that Microsoft hasn't aggressively pushed for "openness". That's in the usual playbook for attacking a market leader.

(And then pull up the ladder once you've become king of hill.)

Microsoft will probably never topple Google, absent anti-monopolistic enforcement. But they can certainly attack Google's profits.

ricardo81•2w ago

There's one great example of a company that did that and managed to go viral on their release, Cuil. They claimed to have a Google size of search index. Unfortunately for them their search results weren't good and so that visibility quickly disappeared.

Going further back, AlltheWeb was actually pretty decent but was eventually bought by Overture and then Yahoo and ended up in their graveyard.

For everyone else it's the longer grind trying to gain visibility.

baggachipz•2w ago

I forgot about Cuil! I really wanted to like it.

ghm2199•2w ago

True. But the thing is if one says "We will make sure your site is in a world wide freely availabled index" which is kept fresh, google's monopoly ship already begins to take on water. Here is a appropriate line from a completely different domain of rare earth metals from The Economist on the chinese govt's weaponization of rare earths[1]:

> Reducing its share from 90% to 80% may not sound like much, but it would imply a doubling in size of alternative sources of supply, giving China’s customers far more room for manoeuvre.

[1] https://archive.ph/POkHZ#selection-1233.117-1233.302

creato•2w ago

robots.txt was being enforced in court before google even existed, let alone before google got so huge:

> The robots.txt played a role in the 1999 legal case of eBay v. Bidder's Edge,[12] where eBay attempted to block a bot that did not comply with robots.txt, and in May 2000 a court ordered the company operating the bot to stop crawling eBay's servers using any automatic means, by legal injunction on the basis of trespassing.[13][14][12] Bidder's Edge appealed the ruling, but agreed in March 2001 to drop the appeal, pay an undisclosed amount to eBay, and stop accessing eBay's auction information.[15][16]

https://en.wikipedia.org/wiki/Robots.txt

throw-the-towel•2w ago

Nitpick: Google incorporated in 1998, so, before the Bidder's Edge case.

dragonwriter•2w ago

Not only was eBay v. Bidder's Edge technically after Google existed, not before, more critically the slippery-slope interpretation of California trespass to chattels law the District Court relied on in it was considered and rejected by the California Supreme Court in Intel v. Hamidi (2003), and similar logic applied to other states trespass to chattels laws have been rejected by other courts since; eBay v. Bidder's Edge was an early aberration in the application of the law, not something that established or reflected a lasting norm.

creato•2w ago

The point is, robots.txt was definitely a thing that people expected to be respected before and during google's early existence. This Kagi claim seems to be at least partially false:

> Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections.

hattmall•2w ago

Perhaps it wasn't a widespread norm though. But I don't really see why that matters as much, is the the issue that sites with robots.txt today only allow Googlebot and not other search engines? Or is Google somehow benefitting from having two decade old content that is now blocked because of robots.txt that the website operators don't want indexed?

ricardo81•2w ago

Agree. It was not standard in the late 90s or early 00s. Most sites were custom built and relied on the _webmaster_ knowing and understanding how robots.txt worked. I'd heard plenty of examples where people had inadvertently blocked crawlers from their site, not knowing the syntax correctly. CMS' probably helped in the widespread adoption e.g. wordpress

embedding-shape•2w ago

> robots.txt was definitely a thing that people expected to be respected before and during google's early existence

As someone who was a web developer at that time, robots.txt wasn't a "widespread norm" by a large margin, even if some individuals "expected it to be respected". Google's use of robots.txt + Google's own growth made robots.txt a "widespread norm" but I don't think many people who were active in the web-dev space at that time, would agree that it was a widespread norm before Google.

jeromechoo•2w ago

Building an index is easy. Building a fresh index is extremely hard.

Ranking an index is hard. It's not just BM25 or cosine similarity. How do you prioritize certain domains over others? How do you rank homepages that typically have no real content in them for navigational queries?

Changing the behavior of 90% of the non-Chinese internet is unraveling 25 years and billions of dollars spent on ensuring Google is the default and sometimes only option.

Historically, it takes a significant technological counter position or anti-trust breakup for a behemoth like Google to lose its footing. Unfortunately for us, Google is currently competing well in the only true technological threat to their existence to appear in decades.

AlienRobot•2w ago

Good news! Google doesn't know how to rank pages either!

pas•2w ago

yet ... it works "ok" most of the time.

not to mention that people mostly need wikipedia, the news, navigating the infuriating world of websites of big service providers (gov sites, or try to find anything on Microsoft's dark corner of the web), porn and brainrot

but it's awfully hard to make traction on a business that provides this.

citizenpaul•2w ago

>why can't they just do it

Money. Google controls 99% of the adverting market. That's why its called a monopoly. No one else can compete because they can never make enough money to make it worth the costs of doing it themselves.

w10-1•2w ago

Other comments mention difficulty, cost, conditions, etc.

Also, competitive agreements: of the big players like Apple, Microsoft, Facebook/Meta, Amazon, etc., only Google is in the ad business. But it has credible threats of digging into their businesses - GCP, Android, (not to mention software licenses and competitive access to e.g., Samsung), etc. So they agree to cede the ad world to Google, to keep Google out of their businesses.

The injunctions cannot be effective. Google ads are essentially a tax at a fine scale that rational people chose when it didn't change site behavior. But then Google ads changed the nature of the web itself, converting every snippet of information into an opportunity to monetize. Neither would change with a public search.org, and injunctions to license ad-free indexes won't change site behavior or publishers' self-interest in selling access to their content to Google alone.

Google knows the injunctions are unworkable and ultimately ineffective. The only question is what price they have to pay to the Trump judiciary to counter them.

the_arun•2w ago

If google is serving 90% traffic & others are unable to enter - Doesn't that mean google is doing something right for the customer and others are unable to outcompete it? Isn't this how life works?

CGMthrowaway•2w ago

Google is allowed to be big, be better and win users. But happy customers is not the full test of monopolization. The real question is, "Could a meaningfully better search engine realistically displace Google today?” If the answer is no, then competition is broken

xnx•2w ago

> "Could a meaningfully better search engine realistically displace Google today?”

ChatGPT clearly demonstrated that displacing Google is possible. All previous monopoly arguments seemed even more flimsy after that.

b3kart•2w ago

I think you’re proving the monopoly argument yourself: if they only way to compete with Google is an innovation that generations of scientists have been working towards, it does paint a grim picture of competition in this space. Besides, are we ignoring Gemini?

charcircuit•2w ago

Google already used AI and language models before ChatGPT came out. If you wanted a state of the art search / recommendation engine you needed that innovations from scientists already.

b3kart•2w ago

That's what I am saying: if you had a better search/rec engine than Google, good luck making it useful without Google's search index, acquired to a large extent thanks to their dominant market position. This doesn't sound like healthy competition. ChatGPT had to change the whole game to be able to compete.

Nextgrid•2w ago

ChatGPT did not build a search engine though. They built something else (equally impressive) and then were able to use their weight to enter the web search business where most sites now have to allow them in.

While it's good that building other products is possible, it doesn't detract from the point that search engines are a de-facto monopoly.

gf000•2w ago

(Not quite the same as a search engine, but to create a base model in an LLM, they pretty much did "download the whole internet" for it)

rafterydj•2w ago

This is a woefully naive view on the nature of monopolies. You could have made the same argument for Standard Oil.

soiltype•2w ago

...No. Not at all. Not in the case of Google and generally that's not "how life works". If it was true, why would Google spend so much money to be the default search engine in so many devices/browsers?

hamdingers•2w ago

Is the user's choice to use google a meaningful one when they're effectively the only game in town?

giantrobot•2w ago

Google must be right for the customer because Google pays billions of dollars to be the default search engine for all the major browsers. And end users are notorious for changing application defaults.

gf000•2w ago

Competition only works when we have an even field.

Like shooting your opponents in the leg before a Marathon will surely improve your chances, but it doesn't mean you are the best out of them. This is like the very tenet of markets, reaching as far back as Adam Smith.

"Funnily" enough this requires some external system that upholds the rules of the competition, e.g. governments. That's why busting monopolies make sense.

jeffbee•2w ago

"We will simply access the index" has always struck me as wild hand-waving that would instantly crumble at first contact with technical reality. "At marginal cost" is doing a huge amount of work in this article.

nige123•2w ago

The user data (anonymised) and analytics also needs to be shared.

user3939382•2w ago

For anyone not acquainted Kagi is excellent and the people who work there strike me as nice and competent. I’m a harsh critic usually. Highly recommended.

flkiwi•2w ago

I've gotten more value out of it than just about any ongoing subscription I have. It's clean, fast, deeply customizable (i.e., excluding "answers" websites or any other domain you never want to see again), and, for what it is, inexpensive. Honestly if Google (or Bing) worked like Kagi does, I'd trade some of the privacy for the utility.

ares623•2w ago

Kagi should start building an index of sites that are trying to escape the current slop internet. It’s know they have the Small Web thing. But I’d like to see an index of a “neo internet” that blocks Google et al.

z64•2w ago

I've been tossing around the very early idea of seeing what we can do to elevate alcoves of the web such as Gemini[1] through Kagi. I am slightly conscious of that some people might not like us operating in that space, it's been on my TODO to poll people about it and take a quick pulse. I love the tech and think we could give it meaningful exposure.

Is this along the lines of what you have in mind - any other active efforts you're aware of that you think we should look into?

[1] https://en.wikipedia.org/wiki/Gemini_(protocol)

freediver•2w ago

Relevant https://github.com/kagisearch/smallweb/pull/425

ares623•2w ago

That's cool that you're looking into it. Are you saying that in any "official" manner as a Kagi employee? Or something more personal?

I've been meaning to write an RFC or open-letter of sorts to collect ideas for what a neo or parallel web could look like, but I'm just a nobody so shrug. It'll probably be something very fragmented and very very niche but nowadays I think that can be seen as a good thing.

z64•2w ago

I'm working on making an internal proposal to integrate with Gemini on several fronts, yes. Still hatching the idea, and much else to do - maybe this summer it will come to fruition if it pans out :)

ares623•2w ago

Well, from one dreamer to another, thank you for taking on that effort and I wish you luck. I will keep my eye out for it.

idiotsecant•2w ago

How would that work? Like Kagi caches the gemini content and delivers it as web content? I suppose that might annoy the kind of person who runs a Gemini server.

z64•2w ago

I don't think we would go to that end, not as a first step anyways; I'm thinking of some simpler ones. Sparing details as I'm just brainstorming for now and getting to know their communities.

But, there are already plenty of services that proxy Gemini pages so that you can read them in conventional browsers, as well as search engines for Gemini content.

direwolf20•2w ago

Add your site to Marginalia Search. They accept submissions by email or GitHub PR, and Kagi pulls from Marginalia Search.

WhereIsTheTruth•2w ago

Kagi's "waiting for dawn" is just waiting for Google to legitimize their reseller business

Meanwhile, users pay a premium to pretend they're not using Google

Fascinating delusion

b3kart•2w ago

> Meanwhile, users pay a premium to pretend they're not using Google

My searches can’t be tied to me by Google for their ad targeting: this is worth paying a premium for, and I am glad Kagi are providing this service.

You seem to have a very limited understanding of the value Kagi provides.

yuugha1838•2w ago

I have a limited understanding of the value Christianity provides. That neither means that Christianity provides no value, nor does it mean that God exists.

idiotsecant•2w ago

Uh oh you're eating your tail again

miloignis•2w ago

With Kagi being $55-$110 a year and Google making >$200 a year per US user, it's arguably a discount.

Nextgrid•2w ago

Users pay a premium to have Google's results cleaned out of spam/trash. It's effectively paying someone to cut out the newspaper ads for you and then give you the resulting ad-free paper.

BlackFly•2w ago

In addition to what others are telling you, Kagi also allows you to

- filter out results from specific websites that you can choose, - show more results from specific websites that you can choose, - show fewer results from specific websites that you can choose,

and so forth. When you find your results becoming contaminated by some new slop farm, you can just eliminate them from your results. Google could also do that, but their business model seems to rely more on showing slop results with their ads in those third party pages.

Just like mobile phone providers, third parties can provide lots of value add by reselling infrastructure. Business models can be different, feature sets can differ. This is not a delusion but the reality of reselling.

idiotsecant•2w ago

users pay a premium for superior UX and no tracking, actually. Kagi has wildly better filtering and customization.

stephen_cagle•2w ago

One interesting point was the original PageRank algorithm greatly benefited from the fact that we kinda only had "text matching" search before Google (my memory was AltaVista at the time).

Because text matching was so difficult to search with, whenever you went to a site, it would often have a "web of trust" at the bottom where an actual human being had curated a list of other sites that you might like if you liked this site.

So you would often search with keywords (often literals), then find the first site, then recursively explore the web of trust links to find the best site.

My suspicion has always been that Google (PageRank) benefited greatly from the human curated "web of trust" at the bottom of pages. But once Google came out, search was much better, and so human beings stopped creating "web of trust" type things on their site.

I am making the point that Google effectively benefited from the large amount of human labor put into connecting sites via WOT, while simultaneously (inadvertently) destroying the benefit of curating a WOT. This means that by succeeding at what they did, they made it much more difficult for a Google#2 to come around and run the exact same game plan with even the exact same algorithm.

tldr; Google harvested the links that were originally curated by human labor, the incentive to create those links are gone now, so the only remaining "links" between things are now in the Google Index.

Addendum: I asked claude to help me think of a metaphor, and I really liked this one as it is so similar.

``` "The railroad and the wagon trails"

Before railroads, collective human use created and maintained wagon trails through difficult terrain. The railroad company could survey these trails to find optimal routes. Once the railroad exists, the wagon trails fall into disuse and the pathfinding knowledge atrophies. A second railroad can't follow trails that are now overgrown. ```

keeda•2w ago

> I am making the point that Google effectively benefited from the large amount of human labor...

This is exactly right, but the thing most people miss is that Google has been using human intelligence at massive scale even to this day to improve their search results.

Basically, as people search and navigate the results, Google harvests their clicks, hovers, dwell-time and other browsing behavior to extract critical signals that help it "learn" which pages the users actually found useful for the given query. (Overly simplified: click on a link but click back within a minute to go to the next link -> downrank, but spend more time on that link -> uprank.)

This helps it rank results better and improve search overall, which keeps people coming back and excluding competitors. It's like the web of trust again, except it's clicks of trust, and it's only visible to Google and is a never-ending self-reinforcing flywheel!

And if you look at the infrastructure Google has built to harvest this data, it is so much bigger than the massive index! They harvest data through Chrome, ad tracking, Android, Google Analytics, cookies (for which they built Gmail!), YouTube, Maps and so much more.

So to compete with Google Search, you don't need just a massive index, you also need the extensive web infra footprint to harvest user interactions at massive scale, which means the most popular and widely deployed browser, mobile OS, ad tracking, analytics script, email provider, maps, etc, etc.

This also explains why Google spent so many billions in "traffic acquisition costs" (i.e. payments for being the Search default) every year, because that was a direct driver to both, 1) ad revenue, and 2) maintaining its search quality.

This wasn't really a secret, but it (rightfully) turned out to be a major point in the recent Antitrust trial, which is why the proposed remedies (a TFA mentions) include the sharing of search index and "interaction data."

sabslikesobs•2w ago

I like that there's a list of primary sources at the bottom.

Kagi's AI assistant has been satisfying compared to Claude and ChatGPT, both of which insisted on having a personality no matter what my instructions said. Trying to do well-sourced research always pissed me off. With Kagi it gives me a summary of sources it's found and that's it!

weisnobody•2w ago

I think the crawled data should have to be shared, but I'm not convinced that Google should have to share their index.

It may be impracticable to share the crawled data, but from the stand point of content providers, having a single entity collecting the information (rather than a bunch of people doing) would seem to be better for everyone. Likely need to have some form of robots.txt which would allow the content provider to indicate how their content could be used (i.e research, web search, AI, etc.).

The people accessing the crawled data would end up paying (reasonable) fees to access the level of data they want, and some portion of that fee would go to the content provider (30% to the crawler and 70% to the crawler? :P maybe).

Maybe even go so far as to allow the Paywalled content providers to set a price on accessing their data for the different purposes. Should they be allowed to pick and choose who within those types should be allowed (or have it be based on violations of the terms of access)

It seems in part the content providers have the following complaints:

  * Too many crawlers (see note below re crawlers)
  * Crawlers not being friendly
  * Improper use of the crawled data
  * Not getting compensated for their content

Why not the index? The index, to me, is where a bunch of the "magic" happens and where individual companies could differentiate themselves from everyone else.

Why can't Microsoft retain Bing traffic when it's the default on stock Windows installs?

  * Do they not have enough crawled data?  
  * Their index isn't very good?
  * Their searching their index isn't good
  * The way they present the data is bad?
  * Google is too entrenched?
  * Combination of the above?

There are several entities intending to crawl all / large portions of the Internet: Baidu, Bing, Brave, Google, DuckDuckGo, Gigablast, Mojeek, Sogou and Yandex [1]. That does not include any of the smaller entities, research projects, etc.

[1] https://en.wikipedia.org/wiki/Search_engine#2000s–present:_P... (2019)

sharpshadow•2w ago

If Google provides a Search Index it will be the censored version therefore still politically acceptable. The “Layer 1” idea will not happen.

direwolf20•2w ago

That's why Kagi combines results from multiple sources, just as it does with Yandex.

pfist•2w ago

I am rooting for Kagi here, and I applaud their transparency on such matters. It is quite enlightening for someone like me who understands technology but knows little about the inner workings of search.

It remains to be seen how or if the remedies will be enforced, and, of course, how Google will choose to comply with them. I am not optimistic, but at least there is some hope.

As an aside: The 1998 white paper by Brin and Page is remarkable to read knowing what Google has become.

m-schuetz•2w ago

I'm rooting for Kagi solely because the block feature. It's amazing to be able to block undeservedly SEO'd garbage sites from future search results.

lostlogin•2w ago

Blocking, pinning and the general quality.

I’d pay a more if I could opt out of Yandex, and if it integrated properly with iOS (Apples fault).

fuzzy2•2w ago

fyi: DuckDuckGo has blocking now, too. I use it extensively to do away with all the clone sites of Stack Exchange, GitHub etc

All without using an account, saved locally in the browser.

m-schuetz•2w ago

Oh nice, that's good to know. Yes, those clones sites are also instantly on my block list, as well as Userbenchmark, sites with AI-generated "info" pages (if I want AI answers, I'll just ask ChatGPT), sites that won't work without third-part cookies, low-quality game guide sites that were evidently made for users to visit, but not actually to help them, etc.

ApolloFortyNine•2w ago

With Google's search engine making almost $200 billion a year in revenue, I'm not sure Kagi could afford what market rates would be here. They also spent billions developing the technology to crawl, index, and rank billions of pages, factoring that in, again I don't think a good price can be put on it.

What even is market rate? Kagi themselves admits there's no market, the one competitor quit providing the service.

Obviously Google doesn't want to become an index provider.

dangoor•2w ago

According to the article, the judge's memorandum said about index data access:

> Google must provide Web Search Index data (URLs, crawl metadata, spam scores) at marginal cost.

I'm guessing that the "marginal cost" of a search is small and it's not connected to the how much ad revenue that search is worth.

senko•2w ago

A full up-to-date index of the searchable web should be a public commons good.

This would not only allow better competition in search, but fix the "AI scrapers" problem: No need to scrape if the data has already been scraped.

Crawling is technically a solved problem, as witnessed by everyone and their dog seemingly crawling everything. If pooled together, it would be cheaper and less resource intensive.

The secret sauce is in what happens afterwards, anyway.

Here's the idea in more detail: https://senkorasic.com/articles/ai-scraper-tragedy-commons

I'm under no illusion something like that will happen .. but it could.

moebrowne•2w ago

Isn't this what CommonCrawl are doing?

https://commoncrawl.org/

senko•2w ago

Yes. But they don't crawl everything (probably due to lack of funding), and, as the article and other commenters here note, people are incentivised to allow Google and only Google to crawl. In practice, the CommonCrawl dataset is too small for a realistic search engine competitor.

I'd love to see Google, Bing and others being incentivized (wink, wink) to contribute (technically, financially, etc) to CommonCrawl or Internet Archive since they already do this.

azornathogron•2w ago

Is crawling really solved?

Any naive crawler is going to run into the problem that servers can give different responses to different clients which means you can show the crawler something different to what you show real users. That turns crawling into an antagonistic problem where the crawler developers need to continually be on the lookout for new ways of servers doing malicious things that poison/mislead the index.

Otherwise you'll return junk spam results from spammers that lied to the crawler.

I've never done it so maybe it's easier than I imagine but I wouldn't be quick to assume that crawling is solved.

senko•2w ago

I don't mean to say it's trivial. I'm sure there are many hard problems such as the one you mention - though that particular one is more "cleaning the index" part which might work on top of the open common corpus.

But my impression is that it's more a question of scale and engineering time than having to invent something new.

(disclaimer: I also never worked on a internet-scale search system, maybe I'm very off the bat here as well).

azornathogron•2w ago

Oh, ok. I misunderstood - I think we agree.

keeda•2w ago

Google's advantage is not just in its index and algorithms, it is that it has built a self-reinforcing flywheel that data mines human attention at massive scale to improve their search results.

This comment (https://news.ycombinator.com/item?id=46709957) points out that Google got its start via PageRank, which essentially ranked sites based on links created by humans. As such, its primary heuristic was what humans thought was good content. Turns out, this is still how they operate.

Basically, as people search and navigate the results, Google harvests their clicks, hovers, dwell-time and other browsing behavior -- i.e. tracking what they pay attention to -- to extract critical signals to "learn" which pages the users actually found useful for the given query. This helps it rank results better and improve search overall, which keeps people coming back, which in turns gives them more queries and data, which improves their results... a never-ending flywheel.

And competitors have no hope of matching this, because if you look at the infrastructure Google has built to harvest this data, it is so much bigger than the massive index! They harvest data through Chrome, ad tracking, Android, Google Analytics, cookies (for which they built Gmail!), YouTube, Maps, and so much more. So to compete with Google Search, you don't need just a massive index, you also need the extensive web infra footprint to harvest user interactions at massive scale, meaning the most popular and widely deployed browser, mobile OS, ad footprint, analytics, email provider, maps...

This also explains why Google spends so many billions in "traffic acquisition costs" (i.e. payments for being the Search default) every year, because that is a direct driver to both, 1) ad revenue, and 2) maintaining its search quality.

This wasn't really a secret, but it turned out to be a major point in the recent Antitrust trial, which is why the proposed remedies (as TFA mentions) include the sharing of search index and "interaction data."

We all knew "if you're not paying for it, you're the product" but the fascinating thing with Google is:

- They charge advertisers to monetize our attention;

- They harvest our attention to better rank results;

- They provide better results, which keeps us coming back, and giving them even more of our attention!

Attention is all you need, indeed.

Nextgrid•2w ago

> "learn" which pages the users actually found useful for the given query

But due to their business model I'm not sure they are ranking "usefulness" as much as you think.

Useful results ultimately don't benefit Google because Google makes no money on them. Google makes money on ads - either ads on the search results page, ads on the destination pages or (indirectly) from steering users to pages which have Google Analytics.

It's likely the actual algorithm balances usefulness to the user with usefulness to Google. You don't want to serve up exclusively spam/slop as users might bounce, but you also don't want to serve up the best result because the user will prefer it over the ad on the SRP page. So it has to be a mix of both - you'll eventually get a good result, after many attempts (during which you've been exposed to ads).

Google does enjoy the myth that they are unable to combat spam/slop while in reality they do profit off it.

keeda•2w ago

That is also the thesis of this piece: https://www.wheresyoured.at/the-men-who-killed-google/

It is plausible, but I'd guess Google would not risk that. I'm sure Google has pulled other shenanigans to get more clicks, like stuffing more and more ads, and making ads look like results (something even I personally have fallen for once), but I think they're too smart to mess with their sacred cash cow.

direwolf20•2w ago

> cookies (for which they built Gmail!)

Can you explain this one?

keeda•2w ago

There were blogs that explained this in detail (Facebook does something similar), but I can't find them, so here's what Google's AI overview says when I search for "How gmail cookies help google track users across the web":

Gmail cookies, such as SID and HSID, act as unique identifiers for a signed-in Google account, allowing Google to track user activity across its services and millions of third-party websites. These cookies, often lasting 2 years, link browsing behavior—like searches and site visits—to a specific user profile to personalize ads, measure campaign performance, and analyze site usage, even on non-Google sites that use tools like Google Analytics or AdSense.

jiehong•2w ago

I think one side problem is that part of the web is not even searchable with a search engine.

Here are some examples:

- Discord

- WeChat (is it the web?)

- Rednote

- TikTok (partially)

- X (partially)

- JSTOR (it finds daily, but you find more stuff on the website directly)

- any stuff with a login, obviously.

reddalo•2w ago

> Discord

Damn, I can't stand open-source projects that host their "forums" on Discord. It's a nigthmare to use, it's heavy, slow, and it's completely unsearchable from the web.

I wonder what went wrong with our society.

cyberrock•2w ago

First of all not everyone wants spectators and gawkers on all of their conversations. As for open solutions, IRC didn't provide chat history for the common folk (no, most users are not able to host their own Pi Zero bouncer, especially back in 2017), and Matrix development was too slow (Elements implemented message pinning in 2022), so the rest was history. There was just no alternative to Slack or Discord.

Spivak•2w ago

> I wonder what went wrong with our society.

Predators.

https://maggieappleton.com/cozy-web

1vuio0pswjnm7•2w ago

Google has appealed and moved for a partial stay re: the remedies discussed in this blog post

https://storage.courtlistener.com/recap/gov.uscourts.dcd.223...

Will Kagi file an amicus brief in support of the plaintiffs

Perhaps Google will fund amici in support of their position as they did in the Epic appeal

https://www.law.com/nationallawjournal/2025/01/10/fight-over...

adsharma•2w ago

Why didn't I see anything about common crawl?

Exa, Parallel and a whole bunch of companies doing information retrieval under the "agent memory" category belong to this discussion.

echelon•2w ago

If there are any Kagi folks here, I've come up with a new angle to attack Google's anti-competitive position that could be incredibly effective:

https://news.ycombinator.com/item?id=46681985

https://news.ycombinator.com/item?id=44546519

I'm going to send this idea to my legislators, the EU, Sam Altman, Tim Sweeny, and Elon Musk, et al., I just haven't had time to put this together yet.

Google is a monopolist scourge and needs to be knocked down a peg or two.

This should also apply to the iPhone and Android app stores.

stacktraceyo•2w ago

Is there a crowd indexed style search index? Like instead of relying on the crawling completely you rely on a maybe like an extension in your browser that indexes as people are using their browser. Or maybe indexing your site to this index instead of waiting to be crawled.

gkbrk•2w ago

I think Brave Search does something similar with their Web Discovery Project, but I don't think it indexes full web pages from users.

https://support.brave.app/hc/en-us/articles/4409406835469-Wh...

jxmesth•2w ago

Honestly, would be very cool if someone could make a search engine of only human-produced content. I know it's going to be hard and compute intensive but I don't think it's impossible. In fact, Google could do it. A paid service for only human made content. Obviously there would be a margin of error as we can never be 100% sure if something really is AI written.

HellsMaddy•2w ago

Kagi is doing something similar to this, though it's not trying to remove absolutely all AI, just "slop": https://help.kagi.com/kagi/features/slopstop.html

direwolf20•2w ago

Marginalia Search is a small-web search engine with a curated list of sites and its own index. Sometimes I find it useful to find answers to technical problems because it only searches the kind of site where people write about the technical problems they solved.

thisislife2•2w ago

> Layer 3: Paid, subscription-based search

Should actually be - Layer 3: Paid, ad-free, subscription-based search. (It's a subtle omission that indicates the direction Kagi search will eventually take).

lostlogin•2w ago

It does say ‘without selling your attention.’

This isn’t quite the same thing though.

I hope you are wrong, if not… wow.

dspillett•2w ago

TBH that sounds more creepy. It (sort of) rules out ads but not the stalking that is inherent in current adtech methods. I'm more bothered by the latter than the former.

decimalenough•2w ago

Kagi has been pretty consistent about funding itself with paying users instead of ads. I, for one, am a paid user but would quit if there were ads injected, since not having them (and, more importantly, having result rankings corrupted by them) is the exact thing I'm paying for.

canpan•2w ago

Another paying user here. Very happy with Kagi, but would cancel asap if there were ads. Don't mind paying more. I just don't want ads. But I cannot really imagine it, they would loose half their competitive advantage. (The other half being having good results)

For me it would probably mean to build a search from scratch. For 90% of my search use cases it's pretty straightforward. I mostly visit the same sites..

dmje•2w ago

Paid user and early adopter here - same, I think. I'm delighted with Kagi, but the thought of it riddled with ads makes me sad. My understanding is same as yours - this is an attempt at an entirely different business model - moving to ads would be totally contrary to what they're trying (or at least - to date - have been trying) to build.

Really hope they don't go this way...!

shmeeed•2w ago

I don't see how it's conductive to the underlying big stakes discussion if we start condemning Kagi over this omission instead of assuming good faith. It's just distracting from the real issue at hand that is Google's monopoly and the details of the rulings and their enforcement.

Call me naive, but I imagine Kagi would be VERY hesitant to force ads on their users, given that such a step would risk alienating a major part of their customer base, as they're well aware.

If they were secretly planning to to it somewhere down the road, they could just as well do it the usual and proven way and lie about it until they've build a sufficient moat, which they're not having right now. IMO, they have precious little to gain from hinting at it like this now.

And even if it were the case that this omission was consciously about keeping a door open for such a change in business models, there's a whole lot of leeway in approaches that involve ads, apart from the 100% user-as-a-product way that Google went with.

Given the high customizability of their search, they e.g. could give users the option to turn ads on or off. Some people (don't ask me who, but I keep being told they exist) don't mind being shown ads or might even desire them.

I remember way back when the first Google ads were clearly labeled as such and stood visually apart from organic search results. I personally don't think it would mean the end of the world if Kagi did something similar - in a transparent way, and preferrably as an opt-in.

But at this point it's all needless speculation in my view.

walt_grata•2w ago

I don't know dude. Every time I've assumed good faith on my paying for something means ad free, I've been screwed by some asshole with an MBA getting into a leadership position high enough to push ads through. I'd rather it be explicit

shmeeed•2w ago

I get the point, we've all been burnt. But if you're not trusting anybody anyway, why would explicity in a non-binding blog post / press release soothe you?

We're in the middle of an AI bubble propping up the whole friggin US economy all by itself, driven mostly by a company that claimed to be a non-profit until a few years ago.

walt_grata•2w ago

Because I've been burned by every big tech company I can think of. As for why it would be soothing, well because it gives me hope that when I read any further legal docs they'll hold to the post.

What does ai have to do with this? The sooner that bubble bursts the better IMO.

shmeeed•2w ago

The (Open)AI example was to illustrate the fact that companies can lie about their long-term plans or just change them whenever. They routinely do. And everybody who believed yesteryear's mission statement will then wind up feeling pretty stupid.

I'm considering Kagi a strategic ally in the fight against big tech right now.

It isn't a big tech company (yet). They don't have much of a moat either. Therefore, for the foreseeable future, they will be absolutely dependent on aligning their behaviour with their customer's interests, lest they lose them and go out of business.

thisislife2•2w ago

Assuming good-faith requires reciprocal actions to reinforce it from the other side too, which I don't see happening. As I have pointed out elsewhere, they've stopped offering offline installers for their browser, and I suspect a major reason for that is to also collect telemetry / user data - a clever way to get around their advertised claim of "no telemetry" browser. After being burnt many times by Google and Apple ("trust us, we care about your privacy"), others and now streaming services ("no ads if you pay us, promise"), I just can't help being cynical as another for-profit company appears to be using the same tactics ... like I said, trust is earned, not demanded.

LoganDark•2w ago

Assuming good faith is a mistake now. Users need to be asking, demanding, proof, not just assuming anyone or anything else has their best interests in mind. I'm tired of seeing this; I'll assume good faith only once I'm not universally treated as an enemy to society.

With that said, Kagi has appeared friendly so far.

bayesnet•2w ago

It should actually be “Layer 3: Paid, ad-free, asbestos free, subscription-based search”. Come on. I don’t think it’s productive to make decisive and conspiratorial declarations on the future of Kagi search because they didn’t use the magic words you like. [ And since I know freediver is active here I want to state plainly that I would cancel immediately if there is so much as a hint of an ad in the Kagi results ;) ]

edit: h/t to https://xkcd.com/641

shmeeed•2w ago

From the comments, I get the feeling that Kagi's customers are just a particularly paranoid bunch. ;)

Anybody is free to cancel their subscription the moment Kagi turns out a malicious actor. And then... go back to Google, I guess?

zvqcMMV6Zcr•2w ago

Recently I encounter "no results" screen when using Google that I am starting to suspect the problem will solve itself. And by solve I mean open parts of internet will die off completely, and only owners of silos like Facebook will be able to provide data for search indexes.

zhfanlqeo•2w ago

I used kagi for a while but got lazy with updating the subscription when moving and needing to change credit cards so I went back to DDG/Google and having to go back to having to skip the first result or first few results shows you just how obnoxious this practice is. When I have a few moments I'll resubscribe to kagi...

Ronsenshi•2w ago

I've been trying to use DDG for the past 2-3 years, but way too often I have to add !g at the end to go to google where I can get better results. So I've been considering giving Kagi a try. Can you tell if in your experience Kagi has better results than DDG?

hoooooooooome•2w ago

I switched to DDG from Google some years ago, and then to Kagi around the start of 2025.

I find the Kagi results to everything I need, and often lead me to more niche personal blog posts specific to what I am looking for. Surfacing small blogs posts is not something I remember getting much of in DDG and I'm really enjoying that.

mrweasel•2w ago

I've used DDG since, 2012, and switched to Ecosia about three years ago. In my experience if DDG or Ecosia can't find something, then neither can Google. In some cases I still check with !g, but Google is now worse than both DDG and Ecosia (which is funny because Ecosia partially uses Google).

It may be related to which type of content you search for, field of work or even how you search, but you're certainly not the only one I've heard complain that they need !g way to often for alternatives to be viable.

Google is very good is you need to buy something though. Their ad system yields rather good results, most of the time. Lately I've noticed that they are more and more serving ads for questionable drop shippers and foreign webshops, rather than brands I trust, so they might also be declining in that department.

bilekas•2w ago

I've tried Kagi and while it is better than google these days, to be fair that's not hard with the enshitification slop that's out there.

But Kagi funds Yandex which fund the RU government, and I think it should be known to anyone looking to use it.

https://ounapuu.ee/posts/2025/07/17/kagi/

https://kagifeedback.org/d/5445-reconsider-yandex-integratio...

maelito•2w ago

We need a european Kagi.

amelius•2w ago

https://en.wikipedia.org/wiki/Quaero

1970-01-01•2w ago

>The problem: A search monopoly

...

>We tried to do it the right way

This sign-up to retrieve better information idea will never take-off the way they think it will. A white label search will get you nowhere. They are silently failing because they're just too stubborn to do it the hard way. Kagi needs to pivot and succeed on useful and interesting edge cases first. Build us out a subject-relevant search, such as displaying vetted content from forums when searching a product/service, and then tying it into Facebook Marketplace for local items or services and Amazon for new. That is called building a product for yourself that others will use. Now you have your very own cashflow for clicks; use that cashflow to buy more corporate access, thereby proving you can succeed without any other search business propping you up and into relevancy. You don't need to start with the giants either. Start with something that works on local hunting, fishing, shooting, and knitting forums. When grandmothers need high quality green yarn today, make their muscle memory point to Kagi local, not Google.

grayhatter•2w ago

Kagi uses Brave search index? huh, TIL... that's very disappointing. And it's the kinda thing that would prevent me from ever paying for Kagi. Brave's crawler, is agressive, dumb (it doesn't appear to back off if it hits a number of 503s), and critically, it ignores robots.txt. They even admit they choose to ignore it. To top that off their crawler doesn't identify itself, instead masquerading as a real browser. I've had to ban the entire Hetzner ASN from my site to get them to stop.

On one hand, I really want Kagi to succeed. They very often, do seem to care about the parts of the world and internet that I care about. But on the other... to me, willingly associating, and financing a company that willingly brags about ignoring consent, is a non-starter for me.

cush•2w ago

The idea of a search index being a public utility is an interesting idea but I’m not sure what it would do for trust. Governance is the biggest question mark, and with the current administration I’d say let Google run it and have less restrictive access to the index. My Google search usage has dropped probably 99% over the last two years.

My hope is that the powers that be figure out how to monetize these products with dollars instead of attention. Google’s ad-driven business model ruined the internet - we don’t need that in our AI products too.

luk4•2w ago

I think it's worth mentioning the Open Web Search initiative [1] and the Open Web Index [2] specifically.

> 14 renowned European research and computing centers have joined forces to develop an open European infrastructure for web search. The initiative is contributing to Europe’s digital sovereignty as well as promoting an open human-centered search engine market. [1]

> The Open Web Index (OWI) is a European open source web index pilot that is currently in Beta testing phase. The idea: Collaboratively and transparently secure safe, sovereign and open access to the internet for European organisations and civil society. The index stores well structured open web data, making it available for search applications and LLMs. [3]

[1] https://openwebsearch.eu/

[2] https://openwebindex.eu/

[3] https://openwebsearch.eu/open-webindex/

Nevermark•2w ago

> A government-backed, ad-free, intermediary-free, taxpayer-funded search service providing baseline, non-discriminatory access to information. Imagine search.org.

There is no way the government provides a search engine that doesn’t become a political football or weapon.

Maybe in a different age.

I completely agree that monopoly remedies, such as fair open paid licensing, are needed. I prefer that to breakups, when this kind of cooperative/competitive leveling works.

embedding-shape•2w ago

> There is no way the government provides a search engine that doesn’t become a political football or weapon.

Maybe it doesn't have to be based in the US? Maybe we could make this a world effort, run by a coalition instead, across border lines, like a library for the modern age.

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

Comments