"An error has occurred building the search results."
He can then exhaust the remaining server heat through the dryer vent stack.
- SearchaPage - Web Search Engine https://searcha.page/
- Seek Ninja - Stealthy Search Engine https://seek.ninja/
Google was invented many years ago by two guys in a dorm room and since then there's been so many white papers and advancements in the public sphere and the actual underlying problem has not changed that much, that it seems like it could be done by a small group or independent person.
Search candidates and rankings now require assessment by LLM. Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.
Crawling too requires innovative approaches to bypass server filters.
I doubt any independent person can afford to run a vector database or LLMs at immense scale.
The reason I pay for Kagi is that I specifically don't want this to occur.
Every person I have seen (outside the tiny tech bubble) google something has just read the AI overview without skipping a beat.
A search is just learning what you don't know and AI does a better job than search has ever done for me - and I'm in tech.
Yes, people want the answer directly. Google wants you to stay on their site to read some mishmash. I think the ideal would be to immediately go to the source’s site.
[EDIT] Incidentally, are there any sites that do actual web search any more, better than Yandex? I'd rather avoid a Russian site if I can, but there are whole topics where it's impossible to find anything useful on heavily "massaged" allegedly-Web-search-but-not-really sites like Google and DDG (Bing), but I can find what I want on page 1 or 2 of a Yandex search. Is Kagi as good as that, or is their index simply ignoring a whole bunch of the Web like so many others? I don't mind paying.
Citation needed
Also a lot of site owners are reluctant to link out. So much so that 'nofollow' had been reduced to a hint rather than a directive.
This leads directly to another big change.
People used to submit their sites to search engines and now they might actively block search engines. So a search engine author might have to spend a lot of effort in adversarial games.
The parts that absolutely require JS can't be reliably linked to and nobody indexes that stuff. Most apparent SPA:s serve a HTML alternative if you don't claim to be a web browser in the UA.
Cloudflare and the like are also fairly easy to deal with as long as your crawler is well behaved. You can register the fingerprint and mostly get access to cf:ed websites.
Or, perhaps, a "a better Google should just take you to these."
Something like that.
Second, the internet was different: when all nerds declared that Google is good, that was CNN-grade newsworthy (and CNN used to matter a lot more back then), simply because the internet seemed kinda important, but there was no other authority on the topic. Today, that's not the case. If you need someone to opine on the internet on air, you invite some political pundit or a business analyst.
So no, I don't think you can repeat the success of Google the same way. It was a product of its time.
I’m not following you.
It's like discovering that there a better pair of shoes that're more comfortable. Everybody can use a slightly improved more comfortable pair of shoes, so it comes up frequently.
I just don’t understand people who get so upset that someone might like something enough to talk about liking it. So upset that they won’t ever try the thing. Like … ok I guess? You do you. It’s just a strange way to make decisions.
At least this is just a consumer product. Worse is when people here say they make technical decisions using the same process. They’d black list certain tech because they’ve heard people talking about how it solved their problems. Also ok, but now I know I should avoid them professionally.
In all of these cases, a reasonable counterpoint is that if it were that applicable for all audiences, one wouldn't need to sing its praises, it would sing its own praises
I signed up for a specialist forum not too long ago and posted an honest review of a product because I hadn't been able to find one anywhere on the internet. Immediately a bunch of people accused me of being a "shill" for a direct-to-consumer business that's been powered by a Yahoo storefront for the last 20 years, as though a business that's run by a guy with an AOL e-mail address is sophisticated enough to figure out Fiverr and astroturf their reputation on a phpBB forum.
Think about it for just a moment - do you really think that the Hacker News audience is large enough or full of enough tastemakers to sway an alternative search engine's market share? It isn't. If Kagi wanted to do that they'd hire TikTok influencers.
But full disclosure, sometimes I'm using DuckDuckGo and it's also good enough most of the time that I occasionally forget until I go down some rabbit hole and realize that I'm using the wrong search engine.
When I started using it (~ 2 years) , it was necessary. Google was simply not solving any of my actual issues (software related).
Now, It seems that google might have improved a bit. I check from time to time and the gap isn't as huge, as when Kagi started
Oh sweet summer child
I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...
While the index is currently not open source, it should be at some point. Maybe when they get out of the beta stage (?) details are yet unclear.
why do I never get deals like that when I am shopping for the homelab on eBay?
I see this for pretty much all hardware out on eBay, just go back 5 years and watch the price fall 10x.
I feel like there was a five year span where everyone I talked to said buying or selling electronics on eBay was a nightmare, so I'm a little curious if I need to re-evaluate my priors.
The real issue is being a seller and solving the "and then the customer claims I shipped them a box of rocks" problem.
I've personally never had that problem after over a decade and hundreds of purchases on eBay. I've had some defective parts, but never outright fraud. IME eBay favors buyers.
A 7532 CPU is now ewaste for all the datacenters out there 1/10 of original price is reasonable, but the latest Nvidia GPU for 200 bucks is obviously a scam.
I have 1542766 domains. Might not be much, but it is an honest work.
It is available as a github repo, so anybody that wants to start crawling has some initial data to kick off.
Links
FYI there's a broken link in your readme:
https://rumca-js.github.io/internet full internet search
The bad thing about this is...read above.
Again, those orgs are likely too comfortable and less productive than people would like, but we're talking about many-many thousands and depending upon how you define "the work" of search upwards of 10k.
I didn't see any new secret sauce in the article and Google is has said that since 2015 (?) Google Brain has been involved in search.
This is not to say that Google couldn't be dislodged by search via LLM or similar, that is "new" research.
Building a state-of-the-art search engine is not shoelaces. But upwards of 10k workers is not impressive in the right direction.
One person starting out with anything at all can quickly grow into one person with one or two really innovative ideas. One or two good ideas can catch fire pretty quickly. Don't be too dismissive.
Some bits and pieces:
> his new search engine, the robust Search-a-Page <https://searcha.page>, which has a privacy-focused variant called Seek Ninja <https://seek.ninja>
> The secret to making it all happen? Large language models. “What I’m doing is actually very traditional search,” Pearce says. “It’s what Google did probably 20 years ago, except the only tweak is that I do use AI to do keyword expansion and assist with the context understanding
> Fellow ambitious hobbyist Wilson Lin, who on his personal blog <https://blog.wilsonl.in/search-engine/> recently described his efforts to create a search engine of his own, took the opposite approach from Pearce.
> And then there’s the concept of doing a small-site search, along the lines of the noncommercial search engine Marginalia <https://marginalia-search.com>, which favors small sites over Big Tech
And the obvious answer to the title: "Why the laundry room? Two reasons: Heat and noise." It runs on a a 32-core AMD EPYC 7532, half a terabyte of RAM, and "all in, cost $5,000, with about $3,000 of that going toward storage"
I've daydreamed about how I'd create my own search engine so, so many times. But I always run into an impassable wall: The internet now isn't at all the same as the internet in 1999.
Discovery isn't really that useful. If you find someone's self-hosted blog about dinosaurs, it probably hasn't been updated since 2004, all the links and images are broken, and it's just thoroughly upstaged by Wikipedia and the Smithsonian. Sure, it's fun to find these quirky sites, but they aren't as valuable as they once were.
We've basically come full circle to the AOL model, where there are "hubs" of content that cater to specific categories. YouTube has ALL the long-form essays. Tiktok has ALL the humorous videos. Medium has ALL the opinion pieces. Reddit has ALL the flame wars. Mayo Clinic has ALL the drug side-effects. Amazon has ALL the shopping. Ebay has ALL the collectables.
None of these big companies want nasty little web crawlers poking and prodding their site. But they accept Google crawlers, because Google brings them users. Are they going to be that friendly to your crawler?
Of course, I still dream. Maybe a hub-based internet needs a hub-aware search engine?
amelius•4h ago