frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: How do you integration-test AI / LLMs?

3•tom1337•1d ago
We’re currently debating how to properly test integrations with external LLMs in our software. At the moment, all LLM calls are mocked in unit and end-to-end tests. Recently, we updated a model version and it started returning responses that no longer matched our expected schema (we validate outputs strictly) and thus returned in errors. Because all tests used mocked responses, this only surfaced in production.

In hindsight, a simple integration test that sends a real (non-mocked) request to the LLM provider would probably have caught this.

One idea is to have a test suite which sends each prompt to the LLM provider and checks whether the response matches the expected schema. This has its own issues since LLMs are inherently nondeterministic and these tests might be flaky, but I’m currently lacking better ideas.

Curious to hear how others approach this.

A 200-year-old book distributor is closing

https://www.npr.org/2026/01/07/nx-s1-5668426/libraries-books-distributor-closing
1•andsoitis•58s ago•0 comments

Will AI-powered humanoid robots someday work alongside us? [60 Minutes] [video]

https://www.youtube.com/watch?v=CbHeh7qwils
1•indigodaddy•2m ago•0 comments

Show HN: The kissing number theorem predicts particle masses from sphere packing

https://colab.research.google.com/drive/1_zDIOONfs4WvnpG7GDEH6hzSM25Fsu93?usp=sharing
1•AlekseN•5m ago•1 comments

Pink Ranger–Dressed Hacker Takes Down White Supremacist Websites Live Onstage

https://gizmodo.com/hacker-dressed-as-the-pink-ranger-takes-down-white-supremacist-websites-live-...
1•mrzool•6m ago•1 comments

The Personal Panopticon

https://twitter.com/mollycantillon/status/2008918474006122936
1•delichon•7m ago•1 comments

Fresh Onion Directory – Whereis.it.com

https://whereis.it.com
1•TheServitor•7m ago•0 comments

ICE agent fatally shoots woman in Minneapolis

https://www.reuters.com/world/us/us-federal-agent-involved-minneapolis-shooting-during-immigratio...
4•mraniki•7m ago•1 comments

Show HN: PAlignPrims – C++ library for sequence alignment beyond bioinformatics

https://github.com/offbynull/palignprims
1•offbynull•10m ago•0 comments

Campaigns Are Knowledge Workers and the Tools Just Caught Up

https://matthodges.com/posts/2026-01-07-ai-agents-campaigns/
1•m-hodges•10m ago•0 comments

Lack of Sweet-Receptor Gene Accounts for Cats' Indifference Toward Sugar (2005)

https://web.archive.org/web/20060423082857/http://genetics.plosjournals.org/perlserv/?request=get...
1•bookofjoe•11m ago•0 comments

Show HN: An offline first, with state-in-URL, workout planning and tracking app

https://mateuszitelli.github.io/trainlink/#2nZfbbts4EIbfhdcuwPPBd0n2VKCLFk3vFoWg2GosrCNnJbntIsi77...
1•mzitelli•11m ago•0 comments

E

2•guseyn•12m ago•3 comments

Tool UI: Component library for tool calls

https://www.tool-ui.com
1•petekp•13m ago•0 comments

Show HN: NewsMap – local news on a map (like Zillow but for news)

https://newsmap.me/
1•ajones05•13m ago•1 comments

Space Agency Confirms Breach – Hackers Claim 200 GB of Data Stolen

https://www.forbes.com/sites/daveywinder/2026/01/04/space-agency-confirms-breach---hackers-claim-...
1•vodou•13m ago•0 comments

Show HN: PostureGuard – Free posture monitoring using webcam

https://posture-guard-theta.vercel.app/
1•fanel•14m ago•0 comments

So you wanna de-bog yourself

https://www.experimental-history.com/p/so-you-wanna-de-bog-yourself
1•calvinfo•15m ago•0 comments

Show HN: Anyware – Remote Control for Claude Code

https://anyware.run/
1•igorzij•15m ago•0 comments

The application of AI tools to Erdos problems passes a milestone

https://mathstodon.xyz/@tao/115855840223258103
1•ColinWright•16m ago•0 comments

MIT 15.773 Hands-On Deep Learning Spring 2024 [video]

https://www.youtube.com/watch?v=kyQ0CRkYhy4
1•mdp2021•17m ago•0 comments

Water Heater Mines Bitcoin. It Could Help Solve AI's Energy Problem

https://www.cnet.com/home/energy-and-utilities/superheat-bitcoin-water-heater-ces-2026/
1•rmason•20m ago•0 comments

Tips to Read More This Coming Year

https://www.millersbookreview.com/p/10-tips-to-read-more-this-coming-year
2•ingve•22m ago•0 comments

ChatGPT is losing market share as Google Gemini gains ground

https://www.bleepingcomputer.com/news/artificial-intelligence/chatgpt-is-losing-market-share-as-g...
1•speckx•22m ago•0 comments

Study examines carbon footprint of wearable health tech

https://news.cornell.edu/stories/2026/01/study-examines-carbon-footprint-wearable-health-tech
1•JeanKage•23m ago•0 comments

Why sports stars who head the ball are more likely to die of Alzheimer's

https://www.bbc.com/future/article/20260106-the-health-dangers-of-heading-the-ball-in-sport
1•breve•24m ago•0 comments

Search your past ChatGPT, Claude and perplexity chats with context

https://github.com/siv-io/Index-AI-Chat-Search
2•siv_io_•24m ago•0 comments

Operation Absolute Resolve: How the US Captured Nicolas Maduro

https://www.dailymail.co.uk/news/article-15435381/Nicolas-Maduro-captured-reconstruction-Trump-Op...
1•febed•26m ago•0 comments

Show HN: An LLM response cache that's aware of dynamic data

https://blog.butter.dev/on-automatic-template-induction-for-response-caching
3•raymondtana•26m ago•0 comments

Programming Languages in 2025 [video]

https://www.youtube.com/watch?v=CzFiPcuMnWM
1•todsacerdoti•29m ago•0 comments

Per-query energy consumption of LLMs

https://muxup.com/2026q1/per-query-energy-consumption-of-llms
3•hasheddan•29m ago•0 comments