AI search engine – How to prevent bots?

4•chaztaubelman•1mo ago

Hi, I'm launching an AI search engine (ex Perplexity like). I don't want to force people to sign up to use it. I want free visitors be able to discover it and use it. However, I've had issues in the past with bots spamming usage, which exploded my costs.

What are the best methdos to prevent those bots, while also having a frictionless UX ? I've heard of Cloudflare. Will that pop for every user or only for those who are trully suspicious ?

Thanks

Comments

reliefcrew•1mo ago

Depends on the "widget mode" you choose:

https://developers.cloudflare.com/turnstile/concepts/widget/...

timshell•1mo ago

Check out a demo of a similar tool we created (https://model-guessr.com/) that was bot-gated by Roundtable Proof of Human.

Happy to talk more details about PoH (disclaimer: I'm a cofounder and this is my YC S23 company)

reliefcrew•1mo ago

Can you comment on the notion that Turnstile's primary goal isn't to keep bots out 100% but instead to slow them down to "human" speeds.

Asking because as a dev I hate when sites don't allow bots... however can appreciate that automation should be rate-limited. IOW, isn't preventing bot access actually an anti-pattern since rate-limiting is sufficient?

I see a lot of marketing which bashes Turnstile [detection] rates and tries to leverage this misunderstood nuance. And, it seems to be a dishonest point of contention but am willing to hear opposing arguments.

Thanks.

timshell•1mo ago

Yup! It depends on your use case.

Cloudflare is really good at network bot detection. Rate-limiting is super helpful here, for example during DDoS attacks.

Our customers are a little different. They sometimes struggle with high-volume bot attacks (e.g. SMS toll fraud in ticketing marketplaces), but we specifically focus on online platforms that want to verify a human is on the other side of the screen. For example, survey pollsters and labor marketplaces want to stop a slow agent that can complete traditional CAPTCHA even if it's solving it a human speed

reliefcrew•1mo ago

I see. I'll have to read the marketing more closely next time, lol. The cynic in me only notices the detection rate comparisons, which I'm sure the marketing folks don't mind much ;-)

timshell•1mo ago

https://research.roundtable.ai/bot-benchmarking/ :)

reliefcrew•1mo ago

> Finally, our evaluation did not involve active adversarial optimization.

Good luck!

n1xis10t•1mo ago

Another option to consider (which marginalia-search.com uses) is Anubis (anubis.techaro.lol). The operator of Marginalia told me that he was getting lots of people spamming the same queries over and over, which he thought might be them trying to influence suggested searches. He put Anubis in place and the query volume dropped to much more reasonable levels. It works by running some sort of complex calculation in javascript, so it won’t get rid of all bots, but it should slow them all down.

The downside is that their silly anime girl mascot is displayed whenever the challenge is running, which I think some people might find off-putting.

Edit: Are you going to announce the search engine on hacker news?

2nd edit: If you are making a search engine, this is probably a good article to read: https://archive.org/details/search-timeline It talks about various search engines that have disappeared mysteriously over the years.

xena•1mo ago

There is an unbranded version: https://anubis.techaro.lol/docs/admin/botstopper

n1xis10t•1mo ago

I noticed that, but that page makes it sound like it can only be unbranded if you pay for the commercial version. It looks like Anubis is open source though, so I suppose you might be able to download the source and switch out images for your own, is that correct?

Also, since you are who you are, can I ask how you came across this post? Did you notice it because of the content of the original post or because I mentioned Anubis?

xena•1mo ago

> I noticed that, but that page makes it sound like it can only be unbranded if you pay for the commercial version. It looks like Anubis is open source though, so I suppose you might be able to download the source and switch out images for your own, is that correct?

You can do that, but I can prioritize support of the open source project based on if people play nice. I don't mean to be rude, but I can't pay rent with GitHub stars.

> Also, since you are who you are, can I ask how you came across this post? Did you notice it because of the content of the original post or because I mentioned Anubis?

I have a cronjob monitoring new Hacker News posts for mentions of Anubis.

n1xis10t•1mo ago

A cronjob, that’s cool. A little rudeness is fine, you could probably say I’ve been a little rude. Thank for making a cool product!

I turned myself into an AI-generated deathbot – here's what I found

Management style doesn't predict survival

One Generation Runs the Country. The Next Cashed in on Crypto

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

Show HN: A sandboxed execution environment for AI agents via WASM

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

The Nature of the Beast

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

Show HN: Curated list of 1000 open source alternatives to proprietary software

AI's Real Problem Is Illegitimacy, Not Hallucination

'I fell into it': ex-criminal hackers urge UK pupils to use web skills for good

Why 175-Year-Old Glassmaker Corning Is Suddenly an AI Superstar

Keeping WSL Alive

Unlocking core memories with GoldSrc engine and CS 1.6 (2025)

Gtrace an advanced network path analysis tool

America does not trust Putin or Trump

Let's Do Music in Linux [video]

"Nothing" is the secret to structuring your work

AI Makes the Easy Part Easier and the Hard Part Harder

Show HN: Fine-tuned Qwen2.5-7B on 100 films for probabilistic story graphs

A failed wantrepreneur's view on common startup advice

Show HN: BestClaw Simple OpenClaw/MoltBot for non tech people

AI is making me anxious and stupid

Show HN: Real-time path tracing of medical CT volumes in the browser via WebGPU

United States – Crypto Scam Help – Intelligence Cyber Wizard Safe Guide

What to Do After a Crypto Scam (USA) Intelligence Cyber Wizard Explained

The Physics of 588: A 17.64μm Isolation Barrier Strategy for 5nm Process

My Eighth Year as a Bootstrapped Founder

Data Modelling Open Source

Mid-life transitions

I turned myself into an AI-generated deathbot – here's what I found

Management style doesn't predict survival

One Generation Runs the Country. The Next Cashed in on Crypto

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

Show HN: A sandboxed execution environment for AI agents via WASM

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

The Nature of the Beast

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

Show HN: Curated list of 1000 open source alternatives to proprietary software

AI's Real Problem Is Illegitimacy, Not Hallucination

'I fell into it': ex-criminal hackers urge UK pupils to use web skills for good

Why 175-Year-Old Glassmaker Corning Is Suddenly an AI Superstar

Keeping WSL Alive

Unlocking core memories with GoldSrc engine and CS 1.6 (2025)

Gtrace an advanced network path analysis tool

America does not trust Putin or Trump

Let's Do Music in Linux [video]

"Nothing" is the secret to structuring your work

AI Makes the Easy Part Easier and the Hard Part Harder

Show HN: Fine-tuned Qwen2.5-7B on 100 films for probabilistic story graphs

A failed wantrepreneur's view on common startup advice

Show HN: BestClaw Simple OpenClaw/MoltBot for non tech people

AI is making me anxious and stupid

Show HN: Real-time path tracing of medical CT volumes in the browser via WebGPU

United States – Crypto Scam Help – Intelligence Cyber Wizard Safe Guide

What to Do After a Crypto Scam (USA) Intelligence Cyber Wizard Explained

The Physics of 588: A 17.64μm Isolation Barrier Strategy for 5nm Process

My Eighth Year as a Bootstrapped Founder

Data Modelling Open Source

Mid-life transitions

AI search engine – How to prevent bots?

Comments