frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Font Rendering from First Principles

https://mccloskeybr.com/articles/font_rendering.html
1•krapp•1m ago•0 comments

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

https://seedance-2.net
1•dallen97•5m ago•0 comments

Wally: A fun, reliable voice assistant in the shape of a penguin

https://github.com/JLW-7/Wally
1•PaulHoule•6m ago•0 comments

Rewriting Pycparser with the Help of an LLM

https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/
1•y1n0•8m ago•0 comments

Lobsters Vibecoding Challenge

https://gist.github.com/MostAwesomeDude/bb8cbfd005a33f5dd262d1f20a63a693
1•tolerance•8m ago•0 comments

E-Commerce vs. Social Commerce

https://moondala.one/
1•HamoodBahzar•8m ago•1 comments

Avoiding Modern C++ – Anton Mikhailov [video]

https://www.youtube.com/watch?v=ShSGHb65f3M
1•linkdd•10m ago•0 comments

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

https://www.aegismind.app
2•aegismind_app•14m ago•1 comments

Zig – Package Management Workflow Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
1•Retro_Dev•15m ago•0 comments

AI-powered text correction for macOS

https://taipo.app/
1•neuling•19m ago•1 comments

AppSecMaster – Learn Application Security with hands on challenges

https://www.appsecmaster.net/en
1•aqeisi•20m ago•1 comments

Fibonacci Number Certificates

https://www.johndcook.com/blog/2026/02/05/fibonacci-certificate/
1•y1n0•21m ago•0 comments

AI Overviews are killing the web search, and there's nothing we can do about it

https://www.neowin.net/editorials/ai-overviews-are-killing-the-web-search-and-theres-nothing-we-c...
3•bundie•26m ago•1 comments

City skylines need an upgrade in the face of climate stress

https://theconversation.com/city-skylines-need-an-upgrade-in-the-face-of-climate-stress-267763
3•gnabgib•27m ago•0 comments

1979: The Model World of Robert Symes [video]

https://www.youtube.com/watch?v=HmDxmxhrGDc
1•xqcgrek2•32m ago•0 comments

Satellites Have a Lot of Room

https://www.johndcook.com/blog/2026/02/02/satellites-have-a-lot-of-room/
2•y1n0•32m ago•0 comments

1980s Farm Crisis

https://en.wikipedia.org/wiki/1980s_farm_crisis
4•calebhwin•33m ago•1 comments

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•38m ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•45m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•52m ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•52m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
2•rolph•55m ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•56m ago•3 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•57m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•1h ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•1h ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•1h ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
4•rolph•1h ago•1 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•1h ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•1h ago•0 comments
Open in hackernews

AI Agent Testing

1•dinasorous•3w ago
As I work on AI agents, I find myself constantly thinking about how to effectively test them. As we integrate more knowledge sources and expand our agents' capabilities, testing becomes increasingly complex. As a standard practice, we use evals to ensure quality is maintained. But honestly, I feel like something is missing. The issue I’m seeing is that we, as engineers, sometimes lack sufficient domain knowledge to assess an agent's response accurately. At the same time, current tooling limits the possibility of collaborating with domain experts to perform testing together. For example, current tooling gives priority to dashboards over the readability of actual outcomes This has been my experience so far—I would love to hear your thoughts on this.

Comments

falcor84•3w ago
Your question is really abstract. Maybe give an explanation of your domain, the tooling you're currently using and its particular limitations?

Also worth saying that the priority given to metrics and dashboards over actual outcomes is a fundamental issue for any structured activity (see e.g. Goodhart's law and Campbell's Law), and has little to do with AI.