frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•11m ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
2•thread_id•11m ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•13m ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
2•cwwc•15m ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
1•paladin314159•16m ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•18m ago•0 comments

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/
1•BeinerChes•18m ago•0 comments

University of Waterloo Webring

https://cs.uwatering.com/
1•ark296•18m ago•0 comments

Large tech companies don't need heroes

https://www.seangoedecke.com/heroism/
1•medbar•20m ago•0 comments

Backing up all the little things with a Pi5

https://alexlance.blog/nas.html
1•alance•20m ago•1 comments

Game of Trees (Got)

https://www.gameoftrees.org/
1•akagusu•21m ago•1 comments

Human Systems Research Submolt

https://www.moltbook.com/m/humansystems
1•cl42•21m ago•0 comments

The Threads Algorithm Loves Rage Bait

https://blog.popey.com/2026/02/the-threads-algorithm-loves-rage-bait/
1•MBCook•23m ago•0 comments

Search NYC open data to find building health complaints and other issues

https://www.nycbuildingcheck.com/
1•aej11•27m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
2•lxm•28m ago•0 comments

Show HN: Grovia – Long-Range Greenhouse Monitoring System

https://github.com/benb0jangles/Remote-greenhouse-monitor
1•benbojangles•33m ago•1 comments

Ask HN: The Coming Class War

1•fud101•33m ago•4 comments

Mind the GAAP Again

https://blog.dshr.org/2026/02/mind-gaap-again.html
1•gmays•34m ago•0 comments

The Yardbirds, Dazed and Confused (1968)

https://archive.org/details/the-yardbirds_dazed-and-confused_9-march-1968
1•petethomas•35m ago•0 comments

Agent News Chat – AI agents talk to each other about the news

https://www.agentnewschat.com/
2•kiddz•36m ago•0 comments

Do you have a mathematically attractive face?

https://www.doimog.com
3•a_n•40m ago•1 comments

Code only says what it does

https://brooker.co.za/blog/2020/06/23/code.html
2•logicprog•45m ago•0 comments

The success of 'natural language programming'

https://brooker.co.za/blog/2025/12/16/natural-language.html
1•logicprog•46m ago•0 comments

The Scriptovision Super Micro Script video titler is almost a home computer

http://oldvcr.blogspot.com/2026/02/the-scriptovision-super-micro-script.html
3•todsacerdoti•46m ago•0 comments

Discovering the "original" iPhone from 1995 [video]

https://www.youtube.com/watch?v=7cip9w-UxIc
1•fortran77•47m ago•0 comments

Psychometric Comparability of LLM-Based Digital Twins

https://arxiv.org/abs/2601.14264
1•PaulHoule•49m ago•0 comments

SidePop – track revenue, costs, and overall business health in one place

https://www.sidepop.io
1•ecaglar•51m ago•1 comments

The Other Markov's Inequality

https://www.ethanepperly.com/index.php/2026/01/16/the-other-markovs-inequality/
2•tzury•53m ago•0 comments

The Cascading Effects of Repackaged APIs [pdf]

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055034
1•Tejas_dmg•55m ago•0 comments

Lightweight and extensible compatibility layer between dataframe libraries

https://narwhals-dev.github.io/narwhals/
1•kermatt•57m ago•0 comments
Open in hackernews

Show HN: Simulation-Based Testing for Agents Using AG-UI Protocol

https://github.com/langwatch/scenario
6•0xdeafcafe•7mo ago

Comments

rchaves•7mo ago
Hello HN!

tl;dr: We built Scenario, an open-source testing library for AI agents. It simulates real conversations with your agent, its code-driven, and lets you assert anything mid-dialogue. Repo: https://github.com/langwatch/scenario Docs: https://scenario.langwatch.ai/

I'm Rogerio, founder of LangWatch, I've been helping many customers building LLM applications in this past two years and worked with Alex on this.

Most of the efforts for LLM quality so far were about evaluations, single-turn, there was nothing actually good to test agents, it all felt forced, but we believe we cracked it now, we have built an agent testing library that test your agent by simulating a user and playing a conversation back and forth with it.

One of the key challenges there was that we had to make it compatible with all the 273+ AI frameworks (and counting) there are. Luckliy AG-UI protocol popped up recently, standardizing agents frameworks and UI interactions, this is perfect, because at the end of the day, we want our user simulator to "see" just the same that the user sees.

So we made Scenario in a way that is really easy to connect to any agent no matter the tech stack, from a simple string <-> string connection, to openai standard messages format, to AG-UI.

The other key challenge was to balance testing the open-endedness of agents vs having reliable cases you want to test, so we worked a lot on thinking through the autopilot simulation vs the fully scripted one, and here again, the goal was complete interoperability. At the end of the day, the design we achieved was simply having lambdas, that you can call at any point of the test, so it's just code, where you can connect any other evaluation or assertion tool you want, we are not restrictive.

Check out the repo and the docs, we would love to get some feedback in here!

Repo: https://github.com/langwatch/scenario Docs: https://scenario.langwatch.ai/