frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How do you integration-test AI / LLMs?

3•tom1337•1mo ago
We’re currently debating how to properly test integrations with external LLMs in our software. At the moment, all LLM calls are mocked in unit and end-to-end tests. Recently, we updated a model version and it started returning responses that no longer matched our expected schema (we validate outputs strictly) and thus returned in errors. Because all tests used mocked responses, this only surfaced in production.

In hindsight, a simple integration test that sends a real (non-mocked) request to the LLM provider would probably have caught this.

One idea is to have a test suite which sends each prompt to the LLM provider and checks whether the response matches the expected schema. This has its own issues since LLMs are inherently nondeterministic and these tests might be flaky, but I’m currently lacking better ideas.

Curious to hear how others approach this.

Proving Laderman's 3x3 Matrix Multiplication Is Locally Optimal via SMT Solvers

https://zenodo.org/records/18514533
1•DarenWatson•2m ago•0 comments

Fire may have altered human DNA

https://www.popsci.com/science/fire-alter-human-dna/
1•wjb3•3m ago•0 comments

"Compiled" Specs

https://deepclause.substack.com/p/compiled-specs
1•schmuhblaster•8m ago•0 comments

The Next Big Language (2007) by Steve Yegge

https://steve-yegge.blogspot.com/2007/02/next-big-language.html?2026
1•cryptoz•9m ago•0 comments

Open-Weight Models Are Getting Serious: GLM 4.7 vs. MiniMax M2.1

https://blog.kilo.ai/p/open-weight-models-are-getting-serious
3•ms7892•19m ago•0 comments

Using AI for Code Reviews: What Works, What Doesn't, and Why

https://entelligence.ai/blogs/entelligence-ai-in-cli
3•Arindam1729•19m ago•0 comments

Show HN: Solnix – an early-stage experimental programming language

https://www.solnix-lang.org/
2•maheshbhatiya•19m ago•0 comments

DoNotNotify is now Open Source

https://donotnotify.com/opensource.html
5•awaaz•21m ago•2 comments

The British Empire's Brothels

https://www.historytoday.com/archive/feature/british-empires-brothels
2•pepys•21m ago•0 comments

What rare disease AI teaches us about longitudinal health

https://myaether.live/blog/what-rare-disease-ai-teaches-us-about-longitudinal-health
2•takmak007•26m ago•0 comments

The Brand Savior Complex and the New Age of Self Censorship

https://thesocialjuice.substack.com/p/the-brand-savior-complex-and-the
2•jaskaransainiz•28m ago•0 comments

Show HN: A Prompting Framework for Non-Vibe-Coders

https://github.com/No3371/projex
2•3371•28m ago•0 comments

Kilroy is a local-first "software factory" CLI

https://github.com/danshapiro/kilroy
2•ukuina•38m ago•0 comments

Mathscapes – Jan 2026 [pdf]

https://momath.org/wp-content/uploads/2026/02/1.-Mathscapes-January-2026-with-Solution.pdf
1•vismit2000•41m ago•0 comments

80386 Barrel Shifter

https://nand2mario.github.io/posts/2026/80386_barrel_shifter/
2•jamesbowman•41m ago•0 comments

Training Foundation Models Directly on Human Brain Data

https://arxiv.org/abs/2601.12053
1•helloplanets•42m ago•0 comments

Web Speech API on HN Threads

https://toulas.ch/projects/hn-readaloud/
1•etoulas•44m ago•0 comments

ArtisanForge: Learn Laravel through a gamified RPG adventure – 100% free

https://artisanforge.online/
2•grazulex•45m ago•1 comments

Your phone edits all your photos with AI – is it changing your view of reality?

https://www.bbc.com/future/article/20260203-the-ai-that-quietly-edits-all-of-your-photos
1•breve•46m ago•0 comments

DStack, a small Bash tool for managing Docker Compose projects

https://github.com/KyanJeuring/dstack
2•kppjeuring•46m ago•1 comments

Hop – Fast SSH connection manager with TUI dashboard

https://github.com/danmartuszewski/hop
1•danmartuszewski•47m ago•1 comments

Turning books to courses using AI

https://www.book2course.org/
6•syukursyakir•49m ago•4 comments

Top #1 AI Video Agent: Free All in One AI Video and Image Agent by Vidzoo AI

https://vidzoo.ai
2•Evan233•49m ago•1 comments

Ask HN: How would you design an LLM-unfriendly language?

1•sph•51m ago•0 comments

Show HN: MuxPod – A mobile tmux client for monitoring AI agents on the go

https://github.com/moezakura/mux-pod
1•moezakura•51m ago•0 comments

March for Billionaires

https://marchforbillionaires.org/
1•gscott•51m ago•0 comments

Turn Claude Code/OpenClaw into Your Local Lovart – AI Design MCP Server

https://github.com/jau123/MeiGen-Art
1•jaujaujau•52m ago•0 comments

An Nginx Engineer Took over AI's Benchmark Tool

https://github.com/hongzhidao/jsbench/tree/main/docs
1•zhidao9•54m ago•0 comments

Use fn-keys as fn-keys for chosen apps in OS X

https://www.balanci.ng/tools/karabiner-function-key-generator.html
1•thelollies•55m ago•1 comments

Sir/SIEN: A communication protocol for production outages

https://getsimul.com/blog/communicate-outage-to-ceo
1•pingananth•56m ago•1 comments