frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Agent-evals – Claude skill to build your own evals

https://github.com/fsilavong/agent-eval
3•sauercrowd•1h ago
I’ve spent the past 10 years working on AI in finance, with much of that time focused on building evaluation systems for production environments.

As agents become more widely adopted, more software engineering and product people have start building them. But I’ve noticed that many teams are not yet fluent in systematic evaluation, or in the processes needed to keep agent quality high over time.

For large organizations, that gap is rarely the bottleneck due to dedicated teams. But after speaking with a number of startups, it became clear that building strong, up-to-date evals is much harder in a fast startup, especially when the team does not have a data science background.

So I tried to condense as much of my experience as possible into a Claude Skill: a practical starting point for evaluating your agent.

The idea is simple: tell Claude you need evals, and it will set up a solid baseline directly in your codebase - that's it! The evals will follow patterns I've seen many times before, and will get you a summary of what your agent does well and what it doesnt.

Looking forward to your feedback!

Anthropic's Boris Cherny: Coding is solved that's next

https://www.youtube.com/watch?v=JGubyPD_EU0
1•danebalia•16s ago•0 comments

When Networking Doesn't Work

https://www.os2museum.com/wp/when-networking-doesnt-work/
1•kencausey•1m ago•0 comments

EU accused of wasting €20B on AI computing dreams

https://www.politico.eu/article/eu-accused-wasting-20-billion-euro-ai-computing-dreams/
1•momentmaker•2m ago•0 comments

Offload MCP – Offload tasks to free models via API and save tokens

https://github.com/peterhadorn/offload-mcp
1•diioo•3m ago•0 comments

Ask HN: When did you move from AI agentic loops to simpler deterministic system?

1•laxmena•3m ago•0 comments

3D Print Flexible–Rigid Transition Mechanism for Rapid and Reversible Assembly

https://dl.acm.org/doi/10.1145/3772318.3790723
1•gnabgib•4m ago•0 comments

Breed96 – 30 years later the Amiga 500 game is back [video]

https://www.youtube.com/watch?v=E8hlHHGRCj8
1•doener•4m ago•0 comments

Half a Month of Consolation Writing Advice

https://www.astralcodexten.com/p/half-a-month-of-consolation-writing
1•paulpauper•5m ago•0 comments

Book Review: "Friendly Ambitious Nerd" by Visakan Veerasamy

https://glasshalftrue.substack.com/p/book-review-friendly-ambitious-nerd
1•paulpauper•6m ago•0 comments

Cruise ship with 17 US passengers hit by suspected hantavirus outbreak

https://www.cnn.com/2026/05/03/africa/atlantic-hantavirus-cruise-ship-dead-latam-intl
1•rawgabbit•6m ago•0 comments

The Vilification Arc

https://justanotherdot.substack.com/p/the-vilification-arc
1•mooreds•9m ago•0 comments

Ask HN: Best local agent setup for Markdown notes?

1•bwestergard•9m ago•0 comments

Inspector General Finds Homeland Security Dept. Failed to Secure Phones

https://www.nytimes.com/live/2026/05/04/us/trump-news
2•seemaze•11m ago•0 comments

Ask HN: Are employers getting the returns from AI?

3•daemon_9009•12m ago•2 comments

OpenAI Codex Surpasses Claude Code in Downloads Following April 30 Inflection

https://blog.tickertrends.io/p/openai-codex-surpasses-claude-code
2•gmays•12m ago•0 comments

The Par Programming Language

https://par.run/
1•marvinborner•12m ago•0 comments

The Effects of School Phone Bans: National Evidence from Lockable Pouches [pdf]

https://tom-dee.github.io/files/w35132.pdf
1•goplayoutside•14m ago•0 comments

Nature's Overlooked Role in National Security

https://nautil.us/natures-overlooked-role-in-national-security-1280439
2•lschueller•19m ago•0 comments

Show HN: Gitbar – A menu bar app for GitHub PRs and issues

https://usegitbar.app/
1•brunokiafuka•19m ago•0 comments

LLxprt Code Is the Anti-Claw

https://vybestack.dev/blog/rendered/2026-02-20-anti-claw.html
1•mooreds•19m ago•0 comments

Sam Altman is "the face of evil" for not reporting school shooter, says lawyer

https://arstechnica.com/tech-policy/2026/04/school-shooting-lawsuits-accuse-openai-of-hiding-viol...
2•asplake•20m ago•0 comments

Lilex. The Font for Developers

https://lilex.myrt.co/
5•hmokiguess•20m ago•1 comments

Bambu labs sends legal threat to orcaslicer dev over use of AGPL code [video]

https://www.youtube.com/watch?v=jIbpQtoz6hs
2•mindcrime•22m ago•0 comments

Practical Ways to Reduce Claude Code Token Usage

https://www.kdnuggets.com/7-practical-ways-to-reduce-claude-code-token-usage
3•sminchev•25m ago•1 comments

Recession and Revolution: Our Experience Isn't a Model or System

http://charleshughsmith.blogspot.com/2026/05/recession-and-revolution-our-experience.html
1•speckx•26m ago•0 comments

Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004)

https://www.ticalc.org/programming/columns/83plus-bas/cherny/
1•suoken•29m ago•0 comments

Empty Screenings

https://walzr.com/empty-screenings
2•jbegley•30m ago•0 comments

AI startup JuliaHub raises $65M to rival Simulink

https://www.axios.com/2026/04/30/bob-muglia-ai-hardware-engineering
12•ViralBShah•30m ago•1 comments

XGrammar-2: 80x Faster Structured Generation for Agent Tool Calling

https://blog.mlc.ai/2026/05/04/xgrammar-2-fast-customizable-structured-generation
4•ubospica•31m ago•0 comments

Show HN: Full-featured CLI textarea component for React Ink

https://github.com/omranjamal/ink-textarea
1•omranjamal•31m ago•0 comments