news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Booktest – review-driven regression testing for LLM / ML behavior

https://github.com/lumoa-oss/booktest

2•arauhala•1h ago

Comments

arauhala•1h ago

Booktest is build based of 2 decade-career in data science. It has been used to support RnD on numerous LLM, ML, NLP, information retrieval and also more traditional software engineering.

It was partly inspired by earlier examples (kudos for Ferenc), but especially real pains with how to assert ML QA with regression testing, transparency and iteration cycle speed.

So, in systems where correctness is fuzzy, evaluation is expensive, and changes have non-local effects, a failing test without diagnostics often raises more questions than it answers. This is a painful combination, if left unsolved.

Booktest is now on its 3rd or 4th iteration of the same idea, and as such it addresses most common needs and problems in this space.

It is a review-driven regression testing approach that captures system behavior as readable artifacts, so humans can see, review, and reason about regressions instead of fighting tooling.

This approach has been used in production for testing ML/NLP systems processing large volumes of data, and we’ve now open-sourced it.

I'm curious whether this matches others’ experience, and how people handle this today.

New Dutch government to push for EU social media ban for under-15s

https://www.politico.eu/article/d66-cda-vvd-dutch-government-aims-to-keep-under-15s-off-social-me...

1•DavideNL•2m ago•1 comments

Small accounts now get a chance on X (2026 algorithm changes)

https://medium.com/@loganholdsworth/xs-2026-algorithm-changes-are-here-here-s-how-small-accounts-...

1•bestonearth•3m ago•0 comments

My ESP32S3 Thinks It's a WebCam

https://www.youtube.com/watch?v=zhTTmRQLNws

1•iamflimflam1•12m ago•0 comments

China's genius plan to win the AI race is paying off

https://www.ft.com/content/68f60392-88bf-419c-96c7-c3d580ec9d97

2•Ozzie_osman•17m ago•1 comments

Elon Musk pours millions more into helping Republicans keep Congress

https://www.politico.com/news/2026/01/31/elon-musk-2026-election-donations-00758992

2•zerosizedweasle•18m ago•0 comments

Will we ever regenerate limbs?

https://www.nationalgeographic.com/science/article/will-we-ever-regenerate-limbs

1•maxloh•21m ago•0 comments

AI Churches and Botnet Architecture: A Risk Assessment

https://maciejjankowski.com/2026/02/01/ai-churches-botnet-architecture/

2•mjankowski•23m ago•0 comments

The Machine as Manager

https://bravenewteams.substack.com/p/the-machine-as-manager

2•zauberberg•32m ago•0 comments

Show HN: MoPeD – High-performance workspace with integrated AI

https://moped.base44.app

2•My_team•37m ago•1 comments

CG/SQL – SQL dialect compiler to C for sqlite3 mimicking stored procedures

https://ricomariani.github.io/CG-SQL-author/

3•linkdd•38m ago•0 comments

The AI Memory Solution We All Need (No, It's Not OpenClaw)

https://chrislema.com/the-ai-memory-solution-we-all-need-no-its-not-openclaw/

3•Manik_agg•39m ago•2 comments

Show HN: Windows tray app for monitoring Claude Code limits in WSL

https://github.com/sr-kai/claudeusagewin

2•Nlupus•44m ago•0 comments

Gaming market melts down after Google reveals new AI game design tool

https://www.tomshardware.com/video-games/gaming-market-melts-down-after-google-reveals-new-ai-gam...

2•thunderbong•46m ago•1 comments

Contracts in Nix

https://sraka.xyz/posts/contracts.html

2•todsacerdoti•48m ago•0 comments

Pi: The Minimal Agent Within OpenClaw

https://lucumr.pocoo.org/2026/1/31/pi/

2•tosh•52m ago•1 comments

Show HN: Drizzle-docs-generator – Generate database docs from Drizzle schemas

https://github.com/rikeda71/drizzle-docs-generator

1•rikeda71•52m ago•0 comments

A New LLM System for Synthesis Planning

https://www.science.org/content/blog-post/new-llm-system-synthesis-planning

1•u1hcw9nx•54m ago•0 comments

Free software that converts DOS computers into a cash register

https://www.facebook.com/daleharrispos

2•xupybd•59m ago•3 comments

Atomic Commits for AI Agents

https://raine.dev/blog/atomic-commits-for-ai-agents/

2•rane•1h ago•0 comments

nanochat can now train GPT-2 grade LLM for –$73 (3 hours on single 8XH100 node)

https://twitter.com/karpathy/status/2017703360393318587

2•tosh•1h ago•1 comments

Coding Agent VMs on NixOS with Microvm.nix

https://michael.stapelberg.ch/posts/2026-02-01-coding-agent-microvm-nix/

3•secure•1h ago•2 comments

Thomas Nagel: What is it like to be a Bat? [pdf] (1974)

https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf

2•bryanrasmussen•1h ago•1 comments

'I spoke to ChatGPT 8 times a day' – Gen Z's loneliness 'crisis'

https://www.bbc.com/news/articles/cg4ewrw2drpo

2•pera•1h ago•0 comments

A Broken Heart

https://allenpike.com/2026/a-broken-heart/

1•memalign•1h ago•0 comments

You Still Struggle with CORS Even After Reading Docs

https://evan-moon.github.io/2020/05/21/about-cors/en/

1•bboydart•1h ago•0 comments

Rethinking the Heritability of Aging

https://www.science.org/doi/10.1126/science.aee3844

1•XzetaU8•1h ago•0 comments

EigenVibe – local, ordinal feed ranking using a persistent "preference manifold"

https://eigenvibe.com/

1•Eidur•1h ago•1 comments

The Book of PF, 4th edition

https://nostarch.com/book-of-pf-4th-edition

13•0x54MUR41•1h ago•1 comments

Humans are the AI Bottleneck [video]

https://www.youtube.com/watch?v=2hcsmtkSzIw

1•jonbaer•1h ago•0 comments

The Tide Pool

https://thetidepool.org/

1•bluesnowmonkey•1h ago•1 comments