frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Show HN: Releasepages.dev make release pages from Git commits

https://www.releasepages.dev/
1•cheeseblubber•2m ago•0 comments

Engine thrust incidents spur safety alert over biocides (2020)

https://www.flightglobal.com/safety/engine-thrust-incidents-spur-safety-alert-over-biocides/137638.article
1•gone35•2m ago•0 comments

Ghosting and 'breadcrumbing': the impact of bad behavior on dating apps

https://theconversation.com/ghosting-and-breadcrumbing-the-psychological-impact-of-our-bad-behaviour-on-dating-apps-258087
1•PaulHoule•3m ago•0 comments

Restoring Arctic Exceptionalism

https://quincyinst.org/events/restoring-arctic-exceptionalism/
1•Bluestein•3m ago•0 comments

Lateralized sleeping positions in domestic cats

https://www.cell.com/current-biology/fulltext/S0960-9822(25)00507-X
1•carabiner•3m ago•0 comments

Las Vegas Through Landsat 7's Eyes (2024)

https://www.usgs.gov/media/before-after/las-vegas-through-landsat-7s-eyes
1•gnabgib•5m ago•0 comments

Tech in Iran-Israel conflict: internet blackout, crypto burns and camera spying

https://www.theguardian.com/technology/2025/jun/23/iran-israel-internet-blackout-crypto-home-camera-spying
2•Bluestein•6m ago•0 comments

Show HN: Natively – AI mobile app builder (iOS and Android)

https://natively.dev
1•hamedmp•11m ago•0 comments

OpenAI Is Ruthless [video]

https://www.youtube.com/watch?v=nFoXCLi8FCc
1•data_spy•11m ago•1 comments

The consequences of Starbucks on startup culture in neighborhoods

https://thetreeoflife.cc/demo
1•WasimBhai•14m ago•0 comments

The Ant Mill: How theoretical high-energy physics descended into groupthink

https://jespergrimstrup.substack.com/p/the-ant-mill-how-theoretical-high
1•Luc•15m ago•0 comments

National Archives to restrict public access starting July 7

https://www.archives.gov/college-park
2•LastTrain•19m ago•0 comments

Mexico is now Chinas No. 1 car export market

https://mexiconewsdaily.com/business/company-owned-byd-ship-vehicles-mexican-ports/
1•ilamont•19m ago•0 comments

Python Tools Are Quickly Adopting the New pylock.toml Standard

https://socket.dev/blog/pylock-toml-standard-adoption
1•divbzero•20m ago•0 comments

The Discovery Engine (automated system for scientific discovery)

https://zenodo.org/records/15693353
1•talos•22m ago•0 comments

Show HN: Vybetr – Hire AI app developers using tools like Lovable, Bolt and more

https://vybetr.com
4•zicxor•23m ago•0 comments

Using Lxcfs Together with Podman

https://www.die-welt.net/2025/06/using-lxcfs-together-with-podman/
1•todsacerdoti•24m ago•0 comments

Lessons from LangChain and Slack and MCP Integration

https://medium.com/@valliappanr/what-i-learned-integrating-langchain-with-slack-via-mcp-and-why-ai-code-isnt-enough-3e72248b96b1
1•valliappanr•26m ago•1 comments

Use of ch unit considered inappropriate (in certain circumstances)

https://clagnut.com/blog/2432
1•mikehall314•27m ago•0 comments

Brit Watchdog Cracks Down on Data Collection by Smart TVs, Speakers, Air Fryers

https://www.theguardian.com/technology/2025/jun/16/air-fryers-smart-tv-speakers-user-data-privacy-ico
2•m463•28m ago•1 comments

Thoughts on the AI 2027 Discourse

https://dynomight.substack.com/p/ai2027
2•paulpauper•29m ago•0 comments

Childhood and Education #10: Behaviors

https://thezvi.substack.com/p/childhood-and-education-10-behaviors
1•paulpauper•29m ago•0 comments

When Can I Stop Listening to My Enemy's Points?

https://substack.com/home/post/p-166684398
1•paulpauper•33m ago•0 comments

Show HN: Letter Lockbox – A word game I built over the weekend with Claude Code

https://www.letterlockbox.com
1•christensen143•33m ago•0 comments

Programmers and Their Monospace Blogs

https://lambdaland.org/posts/2025-06-24_reading_blogs/
1•ashton314•33m ago•0 comments

Ask HN: What's your fastest conversion from cold outreach to prepaid client?

1•iamarsibragimov•33m ago•0 comments

Namespaced Pundit Policies Without the Repetition Racket

https://alec-c4.com/posts/2025-06-24-pundit-namespaced-policies/
2•alec-c4•36m ago•1 comments

The Legacy of "The Gastronomical Me"

https://lithub.com/fidelity-to-both-pleasure-and-humiliation-on-m-f-k-fishers-feminist-realism/
2•spewil•36m ago•0 comments

Show HN: How Usage Works

https://www.usage.ai/blog/how-usage-works
4•kavehkhorram•38m ago•0 comments

Why Your Car's Touchscreen Is More Dangerous Than Your Phone

https://www.carsandhorsepower.com/featured/your-fancy-car-s-touchscreen-is-worse-than-buttons-and-studies-prove-it
2•m463•38m ago•2 comments
Open in hackernews

Show HN: Simulation-Based Testing for Agents Using AG-UI Protocol

https://github.com/langwatch/scenario
6•0xdeafcafe•5h ago

Comments

rchaves•5h ago
Hello HN!

tl;dr: We built Scenario, an open-source testing library for AI agents. It simulates real conversations with your agent, its code-driven, and lets you assert anything mid-dialogue. Repo: https://github.com/langwatch/scenario Docs: https://scenario.langwatch.ai/

I'm Rogerio, founder of LangWatch, I've been helping many customers building LLM applications in this past two years and worked with Alex on this.

Most of the efforts for LLM quality so far were about evaluations, single-turn, there was nothing actually good to test agents, it all felt forced, but we believe we cracked it now, we have built an agent testing library that test your agent by simulating a user and playing a conversation back and forth with it.

One of the key challenges there was that we had to make it compatible with all the 273+ AI frameworks (and counting) there are. Luckliy AG-UI protocol popped up recently, standardizing agents frameworks and UI interactions, this is perfect, because at the end of the day, we want our user simulator to "see" just the same that the user sees.

So we made Scenario in a way that is really easy to connect to any agent no matter the tech stack, from a simple string <-> string connection, to openai standard messages format, to AG-UI.

The other key challenge was to balance testing the open-endedness of agents vs having reliable cases you want to test, so we worked a lot on thinking through the autopilot simulation vs the fully scripted one, and here again, the goal was complete interoperability. At the end of the day, the design we achieved was simply having lambdas, that you can call at any point of the test, so it's just code, where you can connect any other evaluation or assertion tool you want, we are not restrictive.

Check out the repo and the docs, we would love to get some feedback in here!

Repo: https://github.com/langwatch/scenario Docs: https://scenario.langwatch.ai/