Crazy for a company to admit: "Google won't let us whitelabel their core product so we steal it and resell it."
Google doesn't really have a leg to stand on and they know it.
Google's crawler is given special privileges in this right and can bypass basically all bot checks. Anyone else has to just wade through the mud and accept they can't index much of the web.
Another way to look at it is that if you publish a service on the web, you have limited rights to restrict what people do with it.
Isn't that the logic Google search relies on in the first place? I didn't give permission for Google to crawl and index and deep link to my site (let alone summarize and train LLMs on it). They just did it anyway, because it's on a public website.
And Marginalia Search was not mentioned? Marginalia Search says they are licensing their index to Kagi. Perhaps it's counted under "Our own small-web index" which is highly misleading if true.
That said, there are projects like Common Crawl and in Europe, Ecosia + Qwant.
I personally would like to see a search enginge PaaS and a music streaming library PaaS that would let others hook up and pay direct usage fees.
I tried. It's just not good enough. Quick example: yesterday I set up a workstation with Ubuntu, wanting to try out wayland. One of the things I wanted was to run an app (w/ gui) from another (unprivileged) user under my own user. Ecosia gave me bad old stuff. Tried for a few minutes, nothing useful. Switched to google, one of the first results was about waypipe. Searched waypipe on ecosia. 1 and a half pages of old content. Glaringly, not one of those results was the ubuntu.manpages entry on waypipe. shrug
Not me. I only use Google.
Never used Kagi or DDG. Don’t care enough.
"Aspirin" is a famous example. It used to be a brand name for acetylsalicylic acid medication, but became such a common way to refer to it that in the US any company can now use it.
For example I'd hear people say "I'll Google that", then use Yahoo when they were still a major search engine.
Google used by 90% or the world?
~20% of the human population lives in countries where Google is blocked.
OTOH, Baidu is the #1 search engine in China, which has over 15% of the world’s population… but doesn’t reach 1%?
These stats are made measuring US-based traffic, rather than “worldwide” as they claim.
There are other times (usually not work related) when I want to explore the web and discovering some nice little blog or special corner on the net. This is what my RSS feed reader is for.
Not too be pedantic here but I do have a noob question or two here:
1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it — like they did with LLM training for base models with the infamous "pile" dataset — because the upshot of offering this index for public good would break not just google's own monopoly but also other monopolies like android, which will introduce a breath of fresh air into a myriad of UX(mobile devices, browsers, maps, security). So, why don't they just do this already?
2. The other question is about "control", which the DoJ has provided guidance for but not yet enforced. IANAL, but why can't a state's attorney general enforce this?
Google has a monopoly, an entrenched customer base, and stable revenue from a proven business model. Anyone trying to compete would have to pour massive money into infrastructure and then fight Google for users. In that game, Google already won.
The current AI landscape is different. Multiple players are competing in an emerging field with an uncertain business model. We’re still in the phase of building better products, where companies started from more similar footing and aren’t primarily battling for customers yet. In that context, investing heavily in the core technology can still make financial sense. A better comparison might be the early days of car makers, or the web browser wars before the market settled.
Meanwhile, users pay a premium to pretend they're not using Google
Fascinating delusion
My searches can’t be tied to me by Google for their ad targeting: this is worth paying a premium for, and I am glad Kagi are providing this service.
You seem to have a very limited understanding of the value Kagi provides.
Because text matching was so difficult to search with, whenever you went to a site, it would often have a "web of trust" at the bottom where an actual human being had curated a list of other sites that you might like if you liked this site.
So you would often search with keywords (often literals), then find the first site, then recursively explore the web of trust links to find the best site.
My suspicion has always been that Google (PageRank) benefited greatly from the human curated "web of trust" at the bottom of pages. But once Google came out, search was much better, and so human beings stopped creating "web of trust" type things on their site.
I am making the point that Google effectively benefited from the large amount of human labor put into connecting sites via WOT, while simultaneously (inadvertently) destroying the benefit of curating a WOT. This means that by succeeding at what they did, they made it much more difficult for a Google#2 to come around and run the exact same game plan with even the exact same algorithm.
tldr; Google harvested the links that were originally curated by human labor, the incentive to create those links are gone now, so the only remaining "links" between things are now in the Google Index.
Addendum: I asked claude to help me think of a metaphor, and I really liked this one as it is so similar.
``` The railroad and the wagon trails — Before railroads, collective human use created and maintained wagon trails through difficult terrain. The railroad company could survey these trails to find optimal routes. Once the railroad exists, the wagon trails fall into disuse and the pathfinding knowledge atrophies. A second railroad can't follow trails that are now overgrown. ```
whs•1h ago
>Because direct licensing isn’t available to us on compatible terms, we - like many others - use third-party API providers for SERP-style results (SERP meaning search engine results page). These providers serve major enterprises (according to their websites) including Nvidia, Adobe, Samsung, Stanford, DeepMind, Uber, and the United Nations.
The customer list matches what is listed on SerpAPI's page (interestingly, DeepMind is on Kagi's list while they're a Google company...). I suppose Kagi needs to pen this because if SerpAPI shuts down they may lose access to Google, but they may already have utilize multiple providers. In the past, Kagi employees have said that they have access to Google API, but it seems that it was not the case?
As a customer, the major implication of this is that even if Kagi's privacy policy says they try to not log your queries, it is sent to Google and still subject to Google's consumer privacy policy. Even if it is anonymized, your queries can still end up contributing to Google Trends.