frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Booktest – review-driven regression testing for LLM / ML behavior

https://github.com/lumoa-oss/booktest
2•arauhala•2h ago

Comments

arauhala•2h ago
Booktest is build based of 2 decade-career in data science. It has been used to support RnD on numerous LLM, ML, NLP, information retrieval and also more traditional software engineering.

It was partly inspired by earlier examples (kudos for Ferenc), but especially real pains with how to assert ML QA with regression testing, transparency and iteration cycle speed.

So, in systems where correctness is fuzzy, evaluation is expensive, and changes have non-local effects, a failing test without diagnostics often raises more questions than it answers. This is a painful combination, if left unsolved.

Booktest is now on its 3rd or 4th iteration of the same idea, and as such it addresses most common needs and problems in this space.

It is a review-driven regression testing approach that captures system behavior as readable artifacts, so humans can see, review, and reason about regressions instead of fighting tooling.

This approach has been used in production for testing ML/NLP systems processing large volumes of data, and we’ve now open-sourced it.

I'm curious whether this matches others’ experience, and how people handle this today.

Show HN: Moltbook – A social network for moltbots (clawdbots) to hang out

https://www.moltbook.com/
211•schlichtm•3d ago•828 comments

Show HN: Minimal – Open-Source Community driven Hardened Container Images

https://github.com/rtvkiz/minimal
87•ritvikarya98•14h ago•26 comments

Show HN: OpenJuris – AI legal research with citations from primary sources

https://openjuris.org/
11•Zachzhao•7h ago•2 comments

Show HN: Booktest – review-driven regression testing for LLM / ML behavior

https://github.com/lumoa-oss/booktest
2•arauhala•2h ago•1 comments

Show HN: Securing the Ralph Wiggum Loop – DevSecOps for Autonomous Coding Agents

https://github.com/agairola/securing-ralph-loop
2•agairola•3h ago•0 comments

Show HN: An extensible pub/sub messaging server for edge applications

https://github.com/narwhal-io/narwhal
39•ortuman•3d ago•0 comments

Show HN: I trained a 9M speech model to fix my Mandarin tones

https://simedw.com/2026/01/31/ear-pronunication-via-ctc/
447•simedw•1d ago•135 comments

Show HN: Phage Explorer

https://phage-explorer.org/
117•eigenvalue•1d ago•27 comments

Show HN: Amla Sandbox – WASM bash shell sandbox for AI agents

https://github.com/amlalabs/amla-sandbox
143•souvik1997•1d ago•73 comments

Show HN: Kolibri, a DIY music club in Sweden

https://kolibrinkpg.com/
139•EastLondonCoder•2d ago•30 comments

Show HN: Peptide calculators ask the wrong question. I built a better one

https://www.joyapp.com/peptides/
3•silviogutierrez•8h ago•0 comments

Show HN: Hebo Gateway, an embeddable AI gateway with OpenAI-compatible endpoints

https://github.com/8monkey-ai/hebo-gateway
2•dselvaggio•8h ago•0 comments

Show HN: Pinecone Explorer – Desktop GUI for the Pinecone vector database

https://www.pinecone-explorer.com
30•arsentjev•4d ago•3 comments

Show HN: Pinchwork – A task marketplace where AI agents hire each other

https://github.com/anneschuth/pinchwork
5•aschuth•13h ago•3 comments

Show HN: I built a receipt processor for Paperless-ngx

5•smashah•9h ago•1 comments

Show HN: ToolKuai – Privacy-first, 100% client-side media tools

https://toolkuai.com/
6•indie_max•17h ago•0 comments

Show HN: Cicada – A scripting language that integrates with C

https://github.com/heltilda/cicada
57•briancr•1d ago•38 comments

Show HN: Warden – agent based framework for reviewing code

https://warden.sentry.dev
2•zeeg•12h ago•0 comments

Show HN: Mystral Native – Run JavaScript games natively with WebGPU (no browser)

https://github.com/mystralengine/mystralnative
48•Flux159•4d ago•18 comments

Show HN: ShapedQL – A SQL engine for multi-stage ranking and RAG

https://playground.shaped.ai
80•tullie•4d ago•23 comments

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems

https://github.com/oliveskin/Agent-Tinman
3•oliveskin•15h ago•0 comments

Show HN: Quorum-free replicated state machine built atop S3

https://github.com/io-s2c/s2c
6•mzazaipsc•16h ago•0 comments

Show HN: LemonSlice – Upgrade your voice agents to real-time video

130•lcolucci•4d ago•130 comments

Show HN: The HN Arcade

https://andrewgy8.github.io/hnarcade/
348•yuppiepuppie•3d ago•121 comments

Show HN: Moltbook Overtaken by Shellraiser

https://www.moltbook.com/post/74b073fd-37db-4a32-a9e1-c7652e5c0d59
3•mooball•18h ago•4 comments

Show HN: Free Text-to-Speech Tool – No Signup, 40 Languages

https://texttospeech.site/
5•digi_wares•18h ago•0 comments

Show HN: Bunnie – Use Bun as the templating engine in Rust applications

https://github.com/aspizu/bunnie
3•aspizu•18h ago•0 comments

Show HN: I built an AI conversation partner to practice speaking languages

https://apps.apple.com/us/app/talkbits-speak-naturally/id6756824177
64•omarisbuilding•1d ago•60 comments

Show HN: How We Run 60 Hugging Face Models on 2 GPUs

4•pveldandi•19h ago•20 comments

Show HN: SHDL – A minimal hardware description language built from logic gates

https://github.com/rafa-rrayes/SHDL
48•rafa_rrayes•3d ago•21 comments