Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

https://masonyarbrough.com/blog/ask-human

129•echollama•8mo ago

While building my startup i kept running into the issue where ai agents in cursor create endpoints or code that shouldn't exist, hallucinates strings, or just don't understand the code.

ask-human-mcp pauses your agent whenever it’s stuck, logs a question into ask_human.md in your root directory with answer: PENDING, and then resumes as soon as you fill in the correct answer.

the pain:

your agent screams out an endpoint that never existed it makes confident assumptions and you spend hours debugging false leads

the fix:

ask-human-mcp gives your agent an escape hatch. when it’s unsure, it calls ask_human(), writes a question into ask_human.md, and waits. you swap answer: PENDING for the real answer and it keeps going.

some features:

- zero config: pip install ask-human-mcp + one line in .cursor/mcp.json → boom, you’re live - cross-platform: works on macOS, Linux, and Windows—no extra servers or webhooks. - markdown Q\&A: agent calls await ask_human(), question lands in ask_human.md with answer: PENDING. you write the answer, agent picks back up - file locking & rotation: prevents corrupt files, limits pending questions, auto-rotates when ask_human.md hits ~50 MB

the quickstart

pip install ask-human-mcp ask-human-mcp --help

add to .cursor/mcp.json and restart: { "mcpServers": { "ask-human": { "command": "ask-human-mcp" } } }

now any call like:

answer = await ask_human( "which auth endpoint do we use?", "building login form in auth.js" )

creates:

### Q8c4f1e2a ts: 2025-01-15 14:30 q: which auth endpoint do we use? ctx: building login form in auth.js answer: PENDING

just replace answer: PENDING with the real endpoint (e.g., `POST /api/v2/auth/login`) and your agent continues.

link:

github -> https://github.com/Masony817/ask-human-mcp

feedback:

I'm Mason a 19yo solo-founder at Kallro. Happy to hear any bugs, feature requests, or weird edge cases you uncover - drop a comment or open an issue! buy me a coffee -> coff.ee/masonyarbrough

Comments

throwaway314155•8mo ago

Not certain that your definition of hallucination matches mine precisely. Having said that, this is so simple yet kinda brilliant. Surprised it's not a more popular concept already.

loloquwowndueo•8mo ago

- someone sets up an “ask human as a service mcp” - demand quickly outstrips offer of humans willing to help bots - someone else hooks up AI to the “ask human saas” - we now have a full loop of machines asking machines

TZubiri•8mo ago

This is pretty much already possible in any economy, but quite a waste.

Not much is stopping you from buying products from a retailer and selling them at a wholesaler, but you'd lose money in doing so.

4ndrewl•8mo ago

I mean, losing money is practically de rigeur in the startup community right?

olalonde•8mo ago

I built this - but mostly as a joke / proof-of-concept: https://github.com/olalonde/mcp-human

aziaziazi•8mo ago

Cool project! Naive question: does mechanical turk uses llm now?

lordmauve•8mo ago

Finally, the "AI" turns out to be 700 Indians. We now have the full loop of humans asking machines asking humans pretending to be machines. Civilisation collapses

franky47•8mo ago

AI stands for Actual Indians.

kajkojednojajko•8mo ago

please do the promptful

conception•8mo ago

What sort of prompt are you using for this?

kordlessagain•8mo ago

The prompt is (mostly) built using the tool loads in the MCP server. In Python, the @mcp.tool() decorators provide the context of tool to the prompt, which is then submitted (I believe) with each call to the LLM.

rgbrenner•8mo ago

Sounds similar to `ask_followup_question` in Roo

kjhughes•8mo ago

Cool conceptually, but how exactly does the agent know when it's unsure or stuck?

Groxx•8mo ago

The same way it knows anything else.

So not at all, but that doesn't mean it's not useful.

TZubiri•8mo ago

So we are just pushing the issue to another, less debuggable layer. Cool.

kjhughes•8mo ago

I'll try to give you credit for more than dismissing my question off-hand...

Yes, it may not need to know with perfect certainty when it's unsure or stuck, but even to meet a lower bar of usefulness, it'll need at least an approximate means of determining that its knowledge is inadequate. To purport to help with the hallucination problem requires no less.

To make the issue a bit more clear, here are some candidate components to a stuck() predicate:

- possibilities considered

- time taken

- tokens consumed/generated (vs expected? vs static limit? vs dynamic limit?)

If the unsure/stuck determination is defined via more qualitative prompting, what's the prompt? How well has it worked?

Groxx•8mo ago

I don't believe[1] any of those are part of the MCP protocol - it's essentially "the LLM decided to call it, with X arguments, and will interpret the results however it likes". It's an escape hatch for the LLM to use to do stuff like read a file, not a monitoring system that acts independently and has control over the LLM itself.

(But you could build one that does this, and ask the LLM to call it and give your MCP that data... when it feels like it)

So you'd be using this by telling the LLM to run it when it thinks it's stuck. Or needs human input.

1: I am not anything even approaching deeply knowledgeable about MCP, so please, someone correct me if I'm wrong! There do seem to be some bi-directional messaging abilities, e.g. notification, but to figure out thinking time / token use / etc you would need to have access to the infrastructure running the LLM, e.g. Cursor itself or something.

threeseed•8mo ago

You are trying to control a system that is inherently chaotic.

You can probably get some where by indeed running a task 1000 times and looking for outliers in the execution time or token count. But that is of minimal use and anything more advanced than that is akin to water divining.

kordlessagain•8mo ago

The system is only nondeterministic (and a model of nondeterminism at that) when it's emitting tokens. It (the system) becomes completely deterministic when it calls a tool and a result is returned from the tool.

This is little different than how I wrote this. Now it is deterministic, when I hit reply.

echollama•8mo ago

the reasoning aspect of most llms these days knows when its unsure or stuck, you can get that from its thinking tokens. It will see this mcp and call it when its in that state. Though this could benefit from some rules file to use it, although cursor doesn't quite follow ask for help rules, hence making this.

kjhughes•8mo ago

Does all thinking end up getting replaced by calls to Ask-human-mcp then? Or only thinking that exceeds some limit (and how do you express that limit)?

aziaziazi•8mo ago

I had the same question reading your post:

> (problem description) your agent […] makes confident assumptions

> (solution description) when it’s unsure

I read this as a contradiction: in one sentence you describe the problem as an agent being confident while hallucinating and in the next phrase the solution is that the agent can ask you if it’s unsure.

You tool is interesting but you may consider rephrasing that part.

mgraczyk•8mo ago

If you are answering these questions yourself, why not just add something like this to your cursor rules?

"If you don't know the answer to a question and need the answer to continue, ask me before continuing"

Will you have some other person answer the question?

deadbabe•8mo ago

Having another person answer the question is pretty much the obvious route this will go.

mgraczyk•8mo ago

But then that means they are editing a markdown file on your computer? How is that meant to work?

I like the idea but would rather it use Slack or something if it's meant to ask anyone.

echollama•8mo ago

this is mainly meant as a way to conversate with the model while you are programming with it. This is not meant to pull questions to a team but more to pair program. a markdown file is best for syntax in an llm prompt and also just easiest to have open and answer questions with. If i had more time and could i would build an extension into cursor.

mgraczyk•8mo ago

Why not have the model ask in the chat? It's a lot easier to just talk to it than open a file. The article mentions cursor so it sounds like you're already using cursor?

echollama•8mo ago

would probably work better, this is just how i threw it together as an internal tool a long time ago. i just improved it and shipped it to opensource it.

krakrum•8mo ago

Because i only have 500 requests in my cursor usage plan so if there's a way for claude to ask me questions (e-g missing context) without it taking an entire new request, i'll take. Haven't tried it yet but looking forward to it

multjoy•8mo ago

Conversate is not a word.

echollama•8mo ago

yes it is

multjoy•8mo ago

It is not.

mikem170•8mo ago

The word "conversate" is in the dictionary [1], labelled as "non-standard". That doesn't mean it's not a word. Most people would be able to easily infer its meaning.

English is a living language, words come and go.

I don't understand the objection.

[1] https://www.merriam-webster.com/dictionary/conversate

multjoy•8mo ago

Because it is either laziness, ignorance or AI slop.

mikem170•7mo ago

I wouldn't agree that using an understandable word that's in the merriam-webster dictionary is being ignorant or lazy. Nor would I call something AI slop because of a single word, without otherwise engaging with the content.

I do genuinely wonder why some people can be so derailed by odd or unfamiliar words and grammar. Are they stressed? Not wanting to engage with a conversation? Trying to assert status or intellectual superiority? Being aggressive and domineering to assuage their self-worth? Perhaps they feel threatened by cultural change? I assume it has something to do with emotional regulation, given that I can't recall bumping into too many mature people who do such things.

Might you have any insight to offer?

bckr•8mo ago

I’ve tried putting “stop and ask for help” in prompts/rules and it seems like Cursor + Claude, up to 3.7, is highly aligned against asking for help.

ramesh31•8mo ago

>If you are answering these questions yourself, why not just add something like this to your cursor rules?

What you are asking for is AGI. We still need human in the loop for now.

mgraczyk•8mo ago

What I'm describing is a human in the loop. It's just a different UX, one that is easier to use and closer to what the model is trained to use.

ramesh31•8mo ago

Human in the loop means despite your best efforts at initial prompting (which is what rules are), there will always be the need to say "no, that's wrong, now do this instead". Expecting to be able to write enough rules for the model to work fully autonomously through your problem is indeed wishing for AGI.

mgraczyk•8mo ago

In my example, the human would be in the loop in exactly the same way as the technique in the article. The human can tell the model that it's wrong and what to do instead.

Tools like th one in the article are also "rules".

superb_dev•8mo ago

This site is impossible to read on my phone. Part of the left side of the screen is cut off and I can’t scroll it into view

lobsterthief•8mo ago

Same here

tyzoid•8mo ago

Completely blank for me on mobile (javascript disabled)

banner520•8mo ago

I also have this problem on my phone

rfl890•8mo ago

Switching to desktop mode fixed it for me

kbouck•8mo ago

Rotate phone to landscape

multjoy•8mo ago

lol, no

echollama•8mo ago

i fixed this

threeseed•8mo ago

> an mcp server that lets the agent raise its hand instead of hallucinating

a) It doesn't know when it's hallucinating.

b) It can't provide you with any accurate confidence score for any answer.

c) Your library is still useful but any claim that you can make solutions more robust is a lie. Probably good enough to get into YC / raise VC though.

echollama•8mo ago

reasoning models know when they are close to hallucinating because they are lacking context or understanding and know that they could solve this with a question.

this is a streamlined implementation of a interanlly scrapped together tool that i decided to open-source for people to either us or build off of.

geraneum•8mo ago

> reasoning models know when they are close to hallucinating because they are lacking context or understanding and know that they could solve this with a question.

I’m interested. Where can I read more about this?

threeseed•8mo ago

> reasoning models know when they are close to hallucinating because they are lacking context or understanding and know that they could solve this with a question

You've just described AGI.

If this were possible you could create an MCP server that has a continually updated list of FAQ of everything that the model doesn't know.

Over time it would learn everything.

xeonmc•8mo ago

Unless there is as yet insufficient data for meaningful answer.

marshall300791•8mo ago

https://arxiv.org/html/2407.14507v3

exclipy•8mo ago

Would be great if it pinged me on slack or whatsapp. I wouldn't notice if it simply paused waiting for the MCP call to return

spacecadet•8mo ago

Easy enough to do with smolagents and fastmcp, its 20 lines of code.

atoav•8mo ago

I am running an electronics/medialab in an university, the amount of fires bad electronics advice from LLMs caused already is probably non-zero.

It is amazing how bad LLMs are when it comes to reasoning about simple dynamics within trivial electronic circuits and how eager they are to insist the opposite of how things work in the real world is the secured truth.

spacecadet•8mo ago

If the model responds with an obvious incorrect answer or hallucination, start over. Rephrase your input. Consider what output you are actually after... Adding to original shit output wont help you.

ddalex•8mo ago

Why wouldn't a rag-enabled ai be faster and better then humans at answering these documentation-grounded questions ?

kordlessagain•8mo ago

The same technique can be had by creating a "universal MCP tool" for the LLM to use if it thinks the existing tools aren't up to the job. The MCP language calls these "proxies".

PSBigBig•8mo ago

Thanks for sharing this. Bookmarked!

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: ARM64 Android Dev Kit

Show HN: Slack CLI for Agents

Show HN: Compile-Time Vibe Coding

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Sem – Semantic diffs and patches for Git

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: I built a directory of $1M+ in free credits for startups

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

Show HN: Chiptune Tracker

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: ARM64 Android Dev Kit

Show HN: Slack CLI for Agents

Show HN: Compile-Time Vibe Coding

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Sem – Semantic diffs and patches for Git

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: I built a directory of $1M+ in free credits for startups

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

Show HN: Chiptune Tracker

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

Comments