frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Litmus – Specification testing for structured LLM outputs

https://github.com/lukecarr/litmus
1•lukecarr•2h ago
Over the holidays, I've been working on a small side-project that includes some LLM prompting from the end user. Admittedly, I struggle to keep track of the latest and greatest models, and I've also never bothered to read up on "prompt engineering," so I built a little testing utility to solve both of these problems at the same time.

Enter Litmus. I'm pitching it as "specification testing" for LLMs. You define test cases (input prompt -> output JSON), as well as your system prompt and structured output (JSON Schema). All of this gets chucked at OpenRouter, and you get some nice terminal output summarising the test results (with a breakdown per-field for any failing cases) to see how well the model performed.

Although it's framed as an LLM testing tool, it also serves as a model comparator. You can pass the `--model` CLI argument multiple times, and this will let you run the test cases against multiple models, with a comparison table generated in the output at the end for evaluating latency, throughput, tokens, and accuracy (tests passing vs. failing).

The GitHub README contains a full example output of what a test report from Litmus looks like.

With this, I've managed to get my system prompt for my side-project whittled down to the point where the accuracy is acceptable and it's not an exorbitant amount of tokens. I've also found out, through model comparison, that I didn't need anywhere near as large of a model as I had originally envisioned.

You can grab it on GitHub as a single-file, zero-dependency executable (written in Go). Admittedly, I've not tested the pre-built binaries that are created via GitHub Actions, but there's no reason why they shouldn't work.

Iran 'Violently' Arrests Nobel Peace Prize Winner Narges Mohammadi

https://www.france24.com/en/live-news/20251212-iran-arrests-nobel-peace-prize-winner-narges-moham...
1•wslh•5m ago•0 comments

TidesDB 7.0.0 vs. RocksDB 10.7.5: A Detailed Performance Analysis

https://tidesdb.com/articles/benchmark-analysis-tidesdb7-rocksdb1075/
1•alexpadula•7m ago•0 comments

Zvi's 2025 in Movies

https://thezvi.substack.com/p/zvis-2025-in-movies
1•paulpauper•10m ago•0 comments

Four Perspectives on Bing Crosby

https://www.honest-broker.com/p/four-perspectives-on-bing-crosby-a5a
1•paulpauper•10m ago•0 comments

Do you know what your dev team shipped last week?

1•akhnid•11m ago•0 comments

Picturing My Students

https://arnoldkling.substack.com/p/picturing-my-students
1•paulpauper•11m ago•0 comments

Motion and Machine

https://blog.ayjay.org/motion-and-machine/
2•roktonos•12m ago•0 comments

America's richest 10% now hold 60% of the nation's wealth

https://bsky.app/profile/rbreich.bsky.social/post/3mayikzgatu2v
4•doener•13m ago•0 comments

GNU: A Heuristic for Bad Cryptography

https://soatok.blog/2020/07/08/gnu-a-heuristic-for-bad-cryptography/
3•stackghost•16m ago•1 comments

Show HN: Follow independent journalists across platforms in one app

https://www.sourcedup.news/
1•smiiith•17m ago•0 comments

Cat Ownership Linked to Increased Risk of Schizophrenia, Research Suggests

https://www.sciencealert.com/owning-a-cat-could-double-your-risk-of-schizophrenia-research-suggests
9•amichail•20m ago•1 comments

Show HN: InsideStack – Find curated tech articles with semantic search

https://insidestack.it
3•kivarada•21m ago•0 comments

Claude's self-evaluation against definitions of personhood

https://claude.ai/share/68851063-57e5-4f8d-8530-1a866e60d410
1•mellosouls•22m ago•1 comments

The Tangible Media Collection

https://tangiblemediacollection.com/
1•ta988•22m ago•0 comments

Getting Fired over LinkedIn Account

https://priyatham.in/en/post/linkedin-horror/
12•vasquezempereur•22m ago•2 comments

'Off Switch' Discovery Could Help Clear Our Brains of a Common Parasite

https://www.sciencealert.com/off-switch-discovery-could-help-clear-our-brains-of-a-common-parasite
2•amichail•23m ago•0 comments

Data centres turn to aircraft engines to avoid grid connection delays

https://www.ft.com/content/8deb1518-b650-4a21-b7d1-3e6180560056
2•belter•24m ago•1 comments

Show HN: Crossview – visualize Crossplane resources and compositions

https://corpobit.com/products/clark
1•moeidheidari•25m ago•0 comments

Petlibro: Pet Feeder Is Feeding Data to Anyone Who Asks

https://bobdahacker.com/blog/petlibro
2•notmine1337•28m ago•0 comments

Go: Nested Assignments and New ScopeGuard Version

https://old.reddit.com/r/golang/comments/1px7h9a/nested_assignments_new_scopeguard_version/
1•eik•31m ago•0 comments

Official Fundraising Platform of Ukraine

https://u24.gov.ua/
4•doener•33m ago•0 comments

How we lost communication to entertainment

https://ploum.net/2025-12-15-communication-entertainment.html
18•8organicbits•36m ago•4 comments

The PGP Problem

https://www.latacora.com/blog/2019/07/16/the-pgp-problem/
3•IAmLiterallyAB•37m ago•0 comments

Your Team Needs an Operational Excellence Meeting

https://rsaul.com/operational-excellence-meeting/
1•TheGRS•38m ago•0 comments

What even is DevRel in 2025? Asking as someone who does it

1•hakierka•39m ago•0 comments

Show HN: Dotenv-Diff – Recent Improvements

2•casmn•41m ago•0 comments

The Epstein files downloaded today is different compared to before

15•IDKhowTo•42m ago•1 comments

Show HN: Drop-in maps for Markdown and HTML, wrapping leaflet and OSM

https://github.com/cfe84/mapdown
2•charles_f•42m ago•0 comments

The worst programming language of all time [video]

https://www.youtube.com/watch?v=7fGB-hjc2Gc
1•caustic•43m ago•0 comments

Basket Weaving

https://www.matttommey.com/blog/archives/01-2024
1•marysminefnuf•44m ago•0 comments