frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: TrainForgeTester – deterministic scenario tests for AI agents

https://github.com/TrainForge/TrainForgeTester
2•alcray•1h ago
Hi guys,

I have built TrainForgeTester, an open-source scenario test runner for AI agents that take actions (call tools).

The idea: test how agents perform in company specific scenarios and not just on general benchmarks. More specifically test taking the wrong actions, skipping a required step, calling the wrong tool, or passing the wrong arguments.

TrainForgeTester lets you run multi-turn scenarios (you create this scenarios based on your personal use case and data following the provided scenario schema) and check:

* tool calls and arguments * strict or unordered tool execution * expected responses * regressions after model, prompt, or tool changes

This scenario tester is the first part of the project(like v 0.1.0)

I’m now working on the next part: a "scenario generator" that takes messy historical company data (customer support logs, agent traces, tool calls, transcripts, etc.) and turns them into testable scenarios for this framework. Again trying to make this as deterministic as possible

Repo: https://github.com/TrainForge/TrainForgeTester

I’d love feedback on:

* real agent-testing use cases this does not cover yet (browser use, audio, video, mouse use) * whether this direction makes sense * where this could go as a product/devtool * issues, edge cases, or missing features in the repo

Any GitHub issues/forks/prs would be highly appreciated.

Most Companies Aren't Anywhere Near Ready for AI

https://danielmiessler.com/blog/most-companies-arent-ready-for-ai
1•rmason•1m ago•0 comments

WolfCOSE: Zero alloc, PQC, MISRA-C, FIPS 140-3 built with wolfCrypt

https://github.com/aidangarske/wolfCOSE
1•aidangarske•4m ago•0 comments

Performance of a large language model on the reasoning tasks of a physician

https://www.science.org/doi/10.1126/science.adz4433
1•kakoni•7m ago•1 comments

Llama.ttf: a font file which is also a large language model and inference engine

https://fuglede.github.io/llama.ttf/
1•smitec•10m ago•0 comments

Show HN: HypergraphZ – A Hypergraph Implementation in Zig

https://github.com/yamafaktory/hypergraphz
1•yamafaktory•11m ago•0 comments

Erm: A Local CLI That Strips Ums, Uhs, and Erms from Speech

https://doug.sh/posts/erm-a-local-cli-that-strips-ums-uhs-and-erms-from-speech/
1•dougcalobrisi•13m ago•0 comments

Show HN: Interpretable AutoResearch – Legible Agent Workflows

https://github.com/BarishNamazov/interpretable-autoresearch
2•barishnamazov•14m ago•0 comments

Bose SoundTouch Cloud Replacement

https://github.com/gesellix/Bose-SoundTouch
1•gesellix•20m ago•1 comments

My Agent Memory Library Helps Write Indie Articles

https://benemson.com/blog/agents/my-agent-memory-library-helps-write-indie-articles
1•emson•21m ago•2 comments

Skin Trackers – S&P-style indices for the CS2 skin market

https://skintrackers.com/en
1•Jorgin_•21m ago•1 comments

Urban Birds Are Rising Earlier Because of Traffic Noise (2013)

https://www.audubon.org/news/urban-birds-are-rising-earlier-because-traffic-noise
2•thunderbong•27m ago•0 comments

Technical Founders Misread Adoption

https://avelino.run/technical-founders-misread-adoption-rogers-pix/
2•rafaepta•28m ago•0 comments

WolfTPM TPM 2.0 Library Now Supports PQC Mldsa and Mlkem

https://www.wolfssl.com/wolftpm-add-tpm-2-0-v1-85-pqc-post-quantum-support/
1•aidangarske•32m ago•0 comments

Roger Penrose – Why Intelligence Is Not a Computational Process (2025)

https://www.youtube.com/watch?v=iTVN6tFknCg
2•weitzj•35m ago•0 comments

Claude Code Leak: 8100 Takedown Requests and the Birth of Claw-Code

https://www.heise.de/en/news/Claude-Code-Leak-8100-Takedown-Requests-and-the-Birth-of-Claw-Code-1...
2•smartmic•37m ago•0 comments

Professor's bold prediction: AI could help cure all diseases within a decade

https://excitech.media/p/professors-bold-prediction-ai-could
1•sminchev•39m ago•0 comments

One Interface for Everything

https://letsbuildsomething.substack.com/p/one-interface-for-everything
1•iancutzul•39m ago•0 comments

Only Law Can Prevent Extinction

https://www.lesswrong.com/posts/5CfBDiQNg9upfipWk/only-law-can-prevent-extinction
1•lumenwrites•40m ago•0 comments

New portable technology detects GPS spoofing in real time

https://www.ornl.gov/news/ornls-breakthrough-detector-protects-trucking-shipments-gps-deception
1•geox•41m ago•0 comments

My favorite adversarial review prompt

https://blog.fsck.com/2026/05/01/adversarial-review/
2•tie-in•41m ago•0 comments

Steam Controller

https://store.steampowered.com/hardware/steamcontroller
1•cl3misch•43m ago•0 comments

Show HN: Tyche: An experimental distributed trading pipeline in Go Java

https://github.com/ItsArnavSh/Tyche
2•itsarnavsh•45m ago•0 comments

QUIC packet rejection in practice – Iroh

https://www.iroh.computer/blog/quic-packet-rejection-in-practice
1•janandonly•51m ago•1 comments

University Professors Disturbed to Find Their Lectures Chopped Up into AI Slop

https://www.404media.co/asu-atomic-ai-modules-arizona-state-university/
3•abdelhousni•55m ago•1 comments

ASU Using AI Tool to Create Courses from Professors' Work Without Their

https://azfreenews.com/2026/05/asu-using-ai-tool-to-create-courses-from-professors-work-without-t...
9•abdelhousni•56m ago•0 comments

ChatGPT crashed my browser when I continued 1k+ conversations

1•Sharedmemory•56m ago•1 comments

Punk, or why I don't stream anymore

https://geohot.github.io//blog/jekyll/update/2026/05/03/punk-or-why-i-dont-stream.html
2•mefengl•57m ago•1 comments

Make Your Own Microforest

https://ambrook.com/offrange/environment/a-forest-in-your-pocket
4•bookofjoe•57m ago•0 comments

Former Nintendo Executive Says Amazon Once Requested 'Illegal' Price Discounts

https://kotaku.com/reggie-fils-aime-says-nintendo-stopped-selling-to-amazon-after-being-asked-to-...
3•m463•1h ago•0 comments

Simulating Cells Fighting to the Death

https://jamiesimon.io/blog/cell-fight/
2•jamie-simon•1h ago•0 comments