frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)

https://github.com/beyhangl/evalcraft
1•beyhang•8h ago
Testing AI agents is painful. Every test run calls the LLM API, costs real money, takes minutes, and gives different results each time. CI? Forget about it.

Evalcraft fixes this with cassette-based capture and replay — think VCR for HTTP, but for LLM calls and tool use.

How it works:

1. Run your agent once with real API calls. Evalcraft records every LLM request, tool call, and response into a JSON cassette file.

2. In tests, replay from the cassette. Zero API calls, zero cost, deterministic output.

3. Assert on what matters: tool call sequences, output content, cost budgets, token counts.

  run = replay("cassettes/support_agent.json")
  assert_tool_called(run, "lookup_order", with_args={"order_id": "ORD-1042"})
  assert_tool_order(run, ["lookup_order", "search_knowledge_base"])
  assert_cost_under(run, max_usd=0.01)
It's pytest-native — fixtures, markers, CLI flags. Works with OpenAI, Anthropic, LangGraph, CrewAI, AutoGen, and LlamaIndex out of the box. Adapters auto-instrument your agent with zero code changes.

Also ships with golden-set management, regression detection, PII sanitization, and 16 CLI commands for inspecting/diffing cassettes.

555 tests, MIT licensed, `pip install evalcraft`.

Repo: https://github.com/beyhangl/evalcraft PyPI: https://pypi.org/project/evalcraft/ Docs: https://beyhangl.github.io/evalcraft/docs/

Would love feedback from anyone testing agents in CI.

Show HN: Claudine – A Kanban board for your Claude Code and Codex conversations

https://claudine.pro
1•ycmatt•19s ago•0 comments

Show HN: I built the first scripting language for multiplayer game dev

https://docs.allout.game/scripting/syntax
1•joshuamanton•21s ago•0 comments

Cognitive and Physical Improvement with Positive Age Beliefs

https://www.mdpi.com/2308-3417/11/2/28
1•wjb3•58s ago•0 comments

Manual to Phil Zimmermans PGPfone Circa 1996 [pdf]

https://philzimmermann.com/docs/pgpfone10b7.pdf
1•smalltorch•1m ago•0 comments

Self taught gen-xers with senior dev/pm exp. Where's my imposter syndrome team?

1•_hugerobots_•1m ago•0 comments

Lotus 1-2-3 on the PC with DOS

https://stonetools.ghost.io/lotus123-dos/
1•TMWNN•2m ago•0 comments

Knightian Uncertainty

https://en.wikipedia.org/wiki/Knightian_uncertainty
1•jerlendds•2m ago•0 comments

Generate cell-type specific mRNAs for better vaccines autoregressively

https://tsone.notion.site/Generate-cell-type-specific-mRNAs-for-better-vaccines-autoregressively-...
1•tdsone3•2m ago•0 comments

Withheld Epstein files with accusations against Trump released by justice dept

https://www.bbc.com/news/articles/c4g0dzg6e4mo
1•tartoran•4m ago•0 comments

Three Quiet Brothers on Long Island, All of Them Related to Hitler

https://www.nytimes.com/2006/04/24/nyregion/three-quiet-brothers-on-long-island-all-of-them-relat...
1•Anon84•6m ago•0 comments

Time to teach our children about finance

https://cointales.ai/en
1•mhalifax•6m ago•1 comments

A Plea for Lean Software (1995) [pdf]

https://berthub.eu/articles/LeanSoftware_text.pdf
1•tosh•8m ago•0 comments

Show HN: CloakPipe – Rust privacy proxy for LLM APIs with pseudonymization

1•rohansx•9m ago•0 comments

An approach to provably safe AI engineering for legacy codebases

https://evok.dev
1•devconcierge•11m ago•1 comments

M6 MacBook Pro could have four innovations new to the Mac

https://9to5mac.com/2026/03/06/m6-macbook-pro-could-have-four-innovations-new-to-the-mac/
2•blacktulip•11m ago•1 comments

We fixed Postgres connection pooling on serverless with PgDog

https://circleback.ai/blog/how-we-fixed-postgres-connection-pooling-on-serverless-with-pgdog
1•levkk•11m ago•0 comments

Interpreting Pull Request Changes Before CI Enforcement

https://github.com/signalprism/execution-boundary-interpretation
1•mattgallant001•12m ago•1 comments

Colorado SB26-051 Age Attestation

https://aphyr.com/posts/408-colorado-sb26-051-age-attestation
1•speckx•13m ago•0 comments

When Using AI Leads to "Brain Fry"

https://hbr.org/2026/03/when-using-ai-leads-to-brain-fry
1•dracula_x•14m ago•0 comments

Artificial Intelligence: friend or foe for hiring in Europe today?

https://www.ecb.europa.eu/press/blog/date/2026/html/ecb.blog20260304~d9e34fc95f.en.html
1•akyuu•16m ago•0 comments

Making Hybrid Bonding Better

https://semiengineering.com/making-hybrid-bonding-better/
1•PaulHoule•16m ago•0 comments

Building a High-Performance Postgres Time Series Stack with Iceberg

https://www.snowflake.com/en/engineering-blog/postgres-time-series-iceberg/
2•craigkerstiens•17m ago•0 comments

Advice for Staying in the Hospital for a Week

https://xeiaso.net/blog/2026/hospital-advice/
1•speckx•18m ago•0 comments

Scientist rule out a 2032 lunar impact for asteroid 2024 YR4

https://www.theregister.com/2026/03/06/no_moon_asteroid_impact/
2•LorenDB•20m ago•0 comments

Claude Code Skill to write better Lean4 proofs

https://spec.workers.io/axiom/
1•chaitanyya•21m ago•1 comments

US companies denied refunds on Trump's illegal tariffs

https://www.ft.com/content/0315349e-763e-4faa-a5b1-c02ce7801cbd
4•petethomas•24m ago•1 comments

Why Can't I Think of Anything to Vibe Code?

https://degruchy.org/2026/03/04/why-cant-i-think-of-anything-to-vibe-code/
1•speckx•24m ago•1 comments

Show HN: What Is AI Citation Optimization?

https://www.latticeocean.com/blog/what-is-ai-citation-optimization/
1•arunkumars91•25m ago•1 comments

OpenAI sued for practicing law without a license

https://www.abajournal.com/news/article/openai-sued-for-practicing-law-without-a-license
2•Jimmc414•25m ago•0 comments

Context Engineering

https://github.com/m727ichael/context-engineering
1•m727ichael•27m ago•1 comments