frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•1m ago•0 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

1•amichail•2m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•8m ago•1 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•10m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•10m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•11m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•12m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•13m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•13m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•14m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•16m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
4•codexon•16m ago•2 comments

The Science of the Perfect Second (2023)

https://harpers.org/archive/2023/04/the-science-of-the-perfect-second/
1•NaOH•17m ago•0 comments

Bob Beck (OpenBSD) on why vi should stay vi (2006)

https://marc.info/?l=openbsd-misc&m=115820462402673&w=2
2•birdculture•21m ago•0 comments

Show HN: a glimpse into the future of eye tracking for multi-agent use

https://github.com/dchrty/glimpsh
1•dochrty•22m ago•0 comments

The Optima-l Situation: A deep dive into the classic humanist sans-serif

https://micahblachman.beehiiv.com/p/the-optima-l-situation
2•subdomain•22m ago•1 comments

Barn Owls Know When to Wait

https://blog.typeobject.com/posts/2026-barn-owls-know-when-to-wait/
1•fintler•22m ago•0 comments

Implementing TCP Echo Server in Rust [video]

https://www.youtube.com/watch?v=qjOBZ_Xzuio
1•sheerluck•23m ago•0 comments

LicGen – Offline License Generator (CLI and Web UI)

1•tejavvo•26m ago•0 comments

Service Degradation in West US Region

https://azure.status.microsoft/en-gb/status?gsid=5616bb85-f380-4a04-85ed-95674eec3d87&utm_source=...
2•_____k•26m ago•0 comments

The Janitor on Mars

https://www.newyorker.com/magazine/1998/10/26/the-janitor-on-mars
1•evo_9•28m ago•0 comments

Bringing Polars to .NET

https://github.com/ErrorLSC/Polars.NET
3•CurtHagenlocher•30m ago•0 comments

Adventures in Guix Packaging

https://nemin.hu/guix-packaging.html
1•todsacerdoti•31m ago•0 comments

Show HN: We had 20 Claude terminals open, so we built Orcha

1•buildingwdavid•31m ago•0 comments

Your Best Thinking Is Wasted on the Wrong Decisions

https://www.iankduncan.com/engineering/2026-02-07-your-best-thinking-is-wasted-on-the-wrong-decis...
1•iand675•31m ago•0 comments

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

https://www.warcraftcn.com/
1•vyrotek•32m ago•0 comments

Trump Vodka Becomes Available for Pre-Orders

https://www.forbes.com/sites/kirkogunrinde/2025/12/01/trump-vodka-becomes-available-for-pre-order...
1•stopbulying•34m ago•0 comments

Velocity of Money

https://en.wikipedia.org/wiki/Velocity_of_money
1•gurjeet•36m ago•0 comments

Stop building automations. Start running your business

https://www.fluxtopus.com/automate-your-business
1•valboa•40m ago•1 comments

You can't QA your way to the frontier

https://www.scorecard.io/blog/you-cant-qa-your-way-to-the-frontier
1•gk1•41m ago•0 comments
Open in hackernews

Ask HN: How do you integration-test AI / LLMs?

1•tom1337•1mo ago
We’re currently debating how to properly test integrations with external LLMs in our software.

At the moment, all LLM calls are mocked in unit and end-to-end tests. Recently, we updated a model version and it started returning responses that no longer matched our expected schema (we validate outputs strictly) and thus returned in errors. Because all tests used mocked responses, this only surfaced in production.

In hindsight, a simple integration test that sends a real (non-mocked) request to the LLM provider would probably have caught this.

One idea is to have a test suite which sends each prompt to the LLM provider and checks whether the response matches the expected schema. This has its own issues since LLMs are inherently nondeterministic and these tests might be flaky, but I’m currently lacking better ideas.

Curious to hear how others approach this.