frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Unit Tests for LLMs?

3•simantakDabhade•1h ago
is theres any package that helps do like vitest style like quick sanity checks on the output of an llm that I can automate to see if I have regressed on smthin while changing my prompt.

For example this agent for a realtor kept offering virtual viewings (even though that isnt a thing) instead of doing a handoff, (modified prompt for this) so a package where I can write a test so that, hey for this input, do not mention this or never mention those things. Or for certain inputs, always call this tool.

Started engineering my own little utility for this, but before I dove deep and built my own package, wanted to see if something like this alr exists or if im heading down the wrong path here!

p.s. not sure if this should be called evals, kinda overlapping but yeah what should this even be called?

Comments

gberger•1h ago
You want to do evals, yeah.

Tilly Norwood, AI actress, raises ethical concerns in film

https://www.thestatesman.com/entertainment/hollywood/tilly-norwood-ai-actress-called-next-scarlet...
1•Brajeshwar•37s ago•0 comments

Homelab DSL [video]

https://www.youtube.com/watch?v=uwggAMrSLOI
1•marklit•2m ago•0 comments

Is violent AI-human conflict inevitable?

https://techxplore.com/news/2025-09-violent-ai-human-conflict-inevitable.html
1•pseudolus•2m ago•0 comments

Jason Jorjani on Cosmic Rebellions and Human Control Systems [video]

https://www.youtube.com/watch?v=EzPmG_7WhXc
1•keepamovin•4m ago•0 comments

US mulls tariffing devices based on the number of chips used, estimated value

https://www.tomshardware.com/tech-industry/semiconductors/u-s-govt-mulls-tariffing-devices-based-...
1•pseudolus•4m ago•0 comments

What are some ways to make your website text look cool online?

1•azideas•5m ago•0 comments

Autism may be the price of human intelligence

https://www.sciencedaily.com/releases/2025/09/250927031224.htm
1•mustaphah•5m ago•1 comments

Proposal: An Interactive Mode for Phpcbf

https://alex.kirk.at/2025/09/29/proposal-an-interactive-mode-for-phpcbf/
1•akirk•5m ago•0 comments

$55B EA buyout hands Madden over to investors

https://www.theverge.com/news/787112/electronic-arts-55-billion-privacte-acquisition-pif-silver-l...
1•srameshc•6m ago•0 comments

Three in four European companies are hooked on US tech

https://www.theregister.com/2025/09/25/three_four_european_companies/
1•pseudolus•8m ago•0 comments

CSS Unit Might Be a Combination

https://www.oddbird.net/2025/09/23/type-units/
1•speckx•8m ago•0 comments

Lufthansa to cut 4k jobs as airline turns to AI to boost efficiency

https://www.cnbc.com/2025/09/29/lufthansa-to-cut-4000-jobs-turns-to-ai-to-boost-efficiency-.html
1•geox•9m ago•0 comments

Tactility: An ESP32 OS

https://tactility.one
1•surprisetalk•10m ago•0 comments

Synergistic action of specialized metabolites in the human oral microbiome

https://www.pnas.org/doi/10.1073/pnas.2504492122
1•PaulHoule•10m ago•0 comments

My (concurrency) Wishlist for Xcode 27

https://iamkonstantin.eu/blog/my-concurrency-wishlist-for-xcode-27/
2•Bogdanp•11m ago•0 comments

Show HN: Hacker News with "Why It Matters" Analysis

https://news.jypi.org/
1•WarLord81•11m ago•0 comments

Can LIGO Detect Daylight Savings Time?

https://arxiv.org/abs/2509.11849
1•bookofjoe•12m ago•0 comments

You're Overthinking Packing

https://freakpalace.substack.com/p/youre-overthinking-packing
1•surprisetalk•12m ago•0 comments

Progress Studies and Feminization

https://arctotherium.substack.com/p/progress-studies-and-feminization
1•surprisetalk•12m ago•0 comments

Kumiko – The Art of Wood setting [video]

https://www.youtube.com/watch?v=ESI2n2lvhoo
1•surprisetalk•12m ago•0 comments

Geo² / GeoJSON Editor

https://www.geosq.com/geojson/
1•gavi•13m ago•0 comments

People are starting to talk like ChatGPT

https://www.washingtonpost.com/opinions/2025/08/20/chatgpt-claude-chatbots-language/
2•slow_typist•14m ago•0 comments

Abusing Notion's AI Agent for Data Theft

https://www.schneier.com/blog/archives/2025/09/abusing-notions-ai-agent-for-data-theft.html
1•speckx•16m ago•0 comments

Google's "G" gets a brighter look

https://blog.google/inside-google/company-announcements/googles-g-gets-a-brighter-look/
2•meetpateltech•16m ago•1 comments

Neural Emotion Matrix for NPCs Built in Rust

https://github.com/mavdol/npc-neural-affect-matrix
1•mavdol04•16m ago•0 comments

Dead Brands of Computing Past: Soltek – The CPU Shack Museum

https://www.cpushack.com/2023/08/19/dead-brands-of-computing-past-soltek/
1•rbanffy•19m ago•0 comments

Psychological Egoism – TLDR by Mockingloris

https://iep.utm.edu/psychological-egoism/
1•mockingloris•19m ago•1 comments

"Wendover Blast": a coordinated attack that broke AT&T's backbone

https://county10.com/a-blast-in-the-desert-wyoming-national-guard-1961/
1•chiffre01•24m ago•0 comments

Show HN: Web Scraping Framework for Android

https://github.com/kpliuta/termux-web-scraper
1•kpliuta•24m ago•0 comments

Minimal implementation of DeepMind's Genie world model

https://github.com/AlmondGod/tinyworlds
1•amazonhut•24m ago•0 comments