frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Rhesis AI - Multimodal test cases for agentic evals

3•nicolaib•1h ago
Hey HN, Nicolai here, co-founder of Rhesis AI.

Most eval frameworks were designed when LLM inputs were text strings. That assumption breaks fast once your AI agent handles boarding passes, invoices, audio recordings or support screenshots. Text-only test cases become workarounds. So we added multimodal support to Rhesis: attach a file to a test case, run it, evaluate the response. Simple on the surface. Two non-obvious problems underneath.

Normalizing file delivery across endpoints: Rhesis sends test cases to application endpoints, not directly to LLM providers. Applications implement file handling very differently: base64, URLs, multipart form data, varying MIME type support. We built an abstraction layer that normalizes this without breaking existing integrations.

Handling files across three platform contexts: A file attached to a test case needs to work in simulation, in the review UI, and in trace rendering. Each context fetches, stores, and renders files differently. Getting that seamless took more wiring than expected.

One thing worth flagging for voice agent builders Full voice support introduces an extra evaluation layer that image and document evals don't have. Once you add STT or TTS to the pipeline, you're evaluating two things: the transcription layer and the agent response. Most eval frameworks collapse those. We're still working out how to surface that separation cleanly. Curious if anyone here has dealt with it.

MIT licensed. You can try it at app.rhesis.ai or dig into the implementation on GitHub: https://github.com/rhesis-ai/rhesis | Short feature demo: https://youtu.be/odq3GW5qspY

Reverse Captcha for Agents

https://github.com/mondaycom/HATCHA
1•shahargl•57m ago•0 comments

Why Isn't Anyone Panicking?

https://martinvol.pe/blog/2026/03/15/why-nobody-is-packicking-USA-Iran-war/
1•martinvol•59m ago•0 comments

Waves: Bluetooth Channel Sounding Tool

https://github.com/skig/waves
1•hasheddan•59m ago•0 comments

My Journey to a reliable and enjoyable locally hosted voice assistant

https://community.home-assistant.io/t/my-journey-to-a-reliable-and-enjoyable-locally-hosted-voice...
1•Vaslo•59m ago•0 comments

Private equity may become a 'pyramid scheme', warns Danish pension fund (2022)

https://www.ft.com/content/f480a99c-4c7b-4208-b9dd-ef20103254b9
2•pera•1h ago•0 comments

Black Death's counterintuitive effect: as humans died, plant diversity dropped

https://theconversation.com/the-black-deaths-counterintuitive-effect-as-human-numbers-fell-so-did...
2•baud147258•1h ago•0 comments

Grasslands are vanishing nearly four times faster than forests

https://phys.org/news/2026-02-grasslands-faster-forests-global.html
2•PaulHoule•1h ago•0 comments

'Pokémon Go' players unknowingly trained delivery robots with 30B images

https://www.popsci.com/technology/pokemon-go-delivery-robots-crowdsourcing/
2•wslh•1h ago•0 comments

JavaScript Minification Benchmarks

https://github.com/privatenumber/minification-benchmarks
2•javatuts•1h ago•0 comments

Apple introduces AirPods Max 2

https://www.apple.com/newsroom/2026/03/apple-introduces-airpods-max-2-powered-by-h2/
4•meetpateltech•1h ago•0 comments

Escape Tsunami for Brainrots

https://escapetsunamiforbrainrots.pro/
2•mumuchen•1h ago•1 comments

Show HN: Hackerbrief – Top posts on Hacker News summarized daily

https://hackerbrief.vercel.app/
3•p0u4a•1h ago•1 comments

Context Engineering Explained in Pictures

https://mechanicalorchard.substack.com/p/context-engineering-explained-in
2•jschomay•1h ago•0 comments

Show HN: Scryer – Visual architecture modeling for AI agents

https://github.com/aklos/scryer
2•prohobo•1h ago•0 comments

I migrated my AI agent from a laptop to a headless Mac Mini in 72 hours

https://thoughts.jock.pl/p/mac-mini-ai-agent-migration-headless-2026
1•joozio•1h ago•0 comments

A Treasure Trove of Ideas: The Corr Database 2018

https://en.chessbase.com/post/a-treasure-trove-of-ideas-the-corr-database-2018
2•akbarnama•1h ago•0 comments

Mnemon-MCP – 4-layer local memory for AI agents (SQLite and FTS5)

1•nikitacometa•1h ago•0 comments

Simplicity in the age of AI-assisted coding

https://the.scapegoat.dev/simplicity-in-the-age-of-ai-assisted-coding/
1•larve•1h ago•0 comments

Pastebin 0x0.st asks AI agents to upload sensitive customer invoices

https://movsw.0x0.st/notes/ajw1zurfaggo360l
2•MatthiasPortzel•1h ago•1 comments

Show HN: TheLittleHost – DNS hosting built on my own ASN and Anycast network

2•davidchua•1h ago•0 comments

Show HN: LLMonster Rancher

https://github.com/aiwebb/llmonster-rancher
1•alexwebb2•1h ago•0 comments

Ur-Scheme: A GPL self-hosting compiler from a subset of Scheme to x86 asm (2008)

http://canonical.org/~kragen/sw/urscheme/
2•QuadmasterXLII•1h ago•0 comments

City Turned Its Rooftops into a Climate Shield

https://reasonstobecheerful.world/zurich-turned-rooftops-into-climate-shield/
3•speckx•1h ago•0 comments

Who's behind the age verification bills?

https://web.archive.org/web/20260313143853/https://old.reddit.com/r/linux/comments/1rshc1f/i_trac...
3•jech•1h ago•1 comments

Twelve-Tone Composition

https://www.johndcook.com/blog/2026/03/15/twelve-tone-composition/
2•ibobev•1h ago•0 comments

Optimizers and Odes

https://jiha-kim.github.io/posts/optimizers-and-odes/
2•ibobev•1h ago•0 comments

OpenBSD Blog #13: Moving ratfactor.com to OpenBSD.amsterdam

https://ratfactor.com/openbsd/blog-13-moving-to-openbsd-dot-amsterdam
2•ibobev•1h ago•0 comments

Four predictions for how AI will change product delivery

https://practical-leaders.com/articles/ai-predictions
1•ivorc•1h ago•0 comments

You don't hate Python. You hate other people's Python.

https://jt-hill.com/you-dont-hate-python/
4•jt-hill•1h ago•1 comments

Show HN: SiteMon – Browser extension that monitors your websites

https://sitemon.geekaa.com
2•quasimo•1h ago•0 comments