frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I made a human-in-the-loop system for tuning LLMs in beta

https://www.joinoneshot.com/
2•gitpullups•9h ago
OneShot is an API that routes failed LLM outputs to trained humans, returns corrected outputs or prompt injections, and stores the edits as structured training data.

Privacy Note: This product is not built for privacy yet. The current use case is internal tools or beta features where users aren’t promised privacy. But the point of this tool is NOT FOR PRODUCTION.

In the future, there will be a feature for anonymizing all private information automatically.

Problem: My project this year was a tool for pediatricians to do their insurance claims assisted by AI.

Niche industries like this require a ton of examples, fine-tuning and re-prompting to actually get them a product that works. Then, it requires monitoring the output to some extent (of course with the hospital’s consent) so small model changes or edge cases don’t break outputs for at least the first couple months.

This monitoring takes months of being distracted from doing new features. And every new feature I wanted to ship required this constant beta monitoring to get it to a reliable state. This also includes internal tools and automations that I needed to work reliably. That is when I started wishing I had an AI engineer/architect monitoring outputs 24-7 for every new feature’s first month. In real-world software, programs need to break less. Like almost never. And current AI models often don’t get us quite there. From 90% to 100 or 95 to 100. We waste months before shipping new features trying to tweak it internally without the model being able to have the hybrid of being improved live in the real world.

In niche agent environments, you sometimes need an actual human to jump in.

How it works: First, a beta deployment. You deploy your AI to do X business use case in beta or internally.

Each step of your pipeline queries our API with what models you prefer, etc.

Then, a human who is in charge of a batch of outputs will see a flagged output when something goes wrong (we agree first on what that means). They can then use human judgement to tweak the prompt, prompt a different model, or provide added context over and over in multiple parallel threads until the correct output comes out.

Second, fine tuning. You now own a dataset of what changes to your prompt and what changes to the output were made that caused that magical output. Thousands of changes and tweaks that can take your model to the next level internally for each feature are in your db. This data allows you to ship faster, with better guarantees and much less manual testing that isn’t being rewarded or punished by the real world.

Who are the humans? I’m a developer doing the tickets manually with my technical friends I’m paying out of pocket for now (yes, it IS available 24/7!!!). This is intentionally manual during beta, with clear review guidelines, so we understand the process before trying to hire.

How slow is it? Most of the time no human will touch it and sometimes a human will take a quick unnoticeable automated action. In some edge cases, you’ll feel some noticeable slowing (10s+) but we’re looking to accelerate those as well, and the alternative is fully broken output.

Who is it not for? This is not meant for consumer apps, privacy-sensitive production systems, or teams expecting zero human involvement.

Comments

vmitro•7h ago
Don't laugh, but I think in the (near) future, more and more accent will be put on HITL concept as private or selfhosted AI workflows gain on interest; it's hard not to (hope for?) an emergence of movement similar to GNU in the space of software itself, where freely available tooling allows for collaborative, federated HITL powered finetuning of ML models.

As I do also work on a similar concept, where HITL is the first class citizen, can you tell us a bit more about the underlying technology stack, if it's possible for users to host their own models for inference and fine tuning, how are pipelines defined and such?

gitpullups•6h ago
1. Pipelines are defined on your end, I want to build another option but for now it is still just queried as an API endpoint 2. Same as 1, so yes you can definitely use your models, you can definitely just send outputs you don't have to send prompts.
gitpullups•2h ago
I'm a bit curious what you're working on, and if there might be some interesting connections there. Would you like to speak? You can just book in my calendar through the site.

UK Treasury drawing up new rules to police cryptocurrency markets

https://www.theguardian.com/technology/2025/dec/15/uk-treasury-drawing-up-new-rules-to-police-cry...
1•chrisjj•34s ago•0 comments

L5: A Processing Library in Lua for Interactive Artwork

https://l5lua.org/
1•azhenley•1m ago•0 comments

A Year of Not Blogging

https://duggan.ie/posts/a-year-of-not-really-blogging
1•duggan•2m ago•0 comments

Adding Bits Beats AI Slop

https://gwern.net/blog/2025/good-ai-samples
1•networked•2m ago•0 comments

JSDoc types are not TypeScript types

https://jcbhmr.com/2024/12/24/jsdoc-is-not-ts/
1•jcbhmr•2m ago•0 comments

Whisper-Turbo – Cross-Platform, GPU Accelerated Whisper

https://github.com/FL33TW00D/whisper-turbo
1•montyanderson•3m ago•0 comments

Scripting on the Lido Deck (2000)

https://web.archive.org/web/20160307004219/http://www.wired.com/2000/10/cruise/
1•susam•5m ago•0 comments

Marc Andreessen and Charlie Songhurst on the past, present, and future [video]

https://www.youtube.com/watch?v=E_1cTlLpNMg
1•montyanderson•7m ago•0 comments

If you hate networking, you're probably bad at it

https://adelwu.substack.com/p/if-you-hate-networking-youre-probably
2•swyx•7m ago•0 comments

The World Is Not a Desktop (1994)

https://dl.acm.org/doi/pdf/10.1145/174800.174801
2•todsacerdoti•11m ago•0 comments

Microsoft AI

https://microsoft.ai/
3•gmays•12m ago•0 comments

I Built an App to Talk to My Dad

https://chadnauseam.com/coding/random/i-built-an-app-to-talk-to-my-dad
2•ChadNauseam•18m ago•0 comments

Breast Cancer Prediction Dashboard · Streamlit

https://breast-cancer-prediction-project-xlaymqx3l7jvnhhhsvjbh8.streamlit.app
2•yasminealiosman•22m ago•0 comments

Mesa shuts down credit card that rewarded cardholders for paying their mortgages

https://techcrunch.com/2025/12/14/mesa-shuts-down-credit-card-that-rewarded-cardholders-for-payin...
4•jnord•23m ago•0 comments

Clean, Limitless Energy Exists. China Is Going Big in the Race to Harness It

https://www.nytimes.com/2025/12/13/climate/china-us-fusion-energy.html
3•donohoe•24m ago•1 comments

Overview of the Memory Market in Mid-December 2025

https://hanchouhsu.substack.com/p/overview-of-the-memory-market-in
1•walterbell•25m ago•0 comments

Our emotional pain became a product

https://www.theguardian.com/us-news/ng-interactive/2025/dec/14/trauma-mental-health
5•worik•25m ago•0 comments

The Case Against Microservices

https://sashafoundtherootcauseagain.substack.com/p/the-case-against-microservices
2•birdculture•26m ago•0 comments

AI agents are starting to eat SaaS

https://martinalderson.com/posts/ai-agents-are-starting-to-eat-saas/
7•jnord•27m ago•3 comments

Risks for public health related to presence of furan and methylfurans in food

https://efsa.onlinelibrary.wiley.com/doi/10.2903/j.efsa.2017.5005
3•pera•30m ago•0 comments

Show HN: G023's OllamaMan – Web-based OS for managing Ollama servers

https://github.com/g023/g023-OllamaMan
1•g023•35m ago•0 comments

Scam Compounds Become Targets in Thai-Cambodian Border War

https://www.wsj.com/world/asia/scam-compounds-become-targets-in-thai-cambodian-border-war-7fbfe575
3•JumpCrisscross•38m ago•0 comments

Building a High-Performance OpenAPI Parser in Go

https://www.speakeasy.com/blog/building-speakeasy-openapi-go-library
2•indybonez•44m ago•1 comments

Rivian's Gen3 Processor [video]

https://www.youtube.com/watch?v=cu0_ZEIT5YU
1•kappi•44m ago•0 comments

Claude CLI deleted my home directory Wiped my whole Mac

https://old.reddit.com/r/ClaudeAI/comments/1pgxckk/claude_cli_deleted_my_entire_home_directory_wi...
62•tamnd•53m ago•47 comments

What a top food safety expert won't order for dinner

https://www.washingtonpost.com/business/2025/12/14/foodborne-illness-safety-expert-advice/
4•bookofjoe•57m ago•5 comments

Last Call for Mass Market Paperbacks

https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/99293-last-call...
5•dsr_•59m ago•0 comments

DARPA GO: Generative Optogenetics

https://www.darpa.mil/research/programs/go
3•birriel•1h ago•0 comments

Rive in React Native – The Good, the Bad and the Janky

https://justanotherheroriding.github.io/portfolio/writing/rive-react-native
2•justAnotherHero•1h ago•1 comments

British Rail Sandwich (Wikipedia)

https://en.wikipedia.org/wiki/British_Rail_sandwich
4•valzevul•1h ago•0 comments