frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Decipher x Claude Code – Infra to auto-generate and maintain E2E tests

https://docs.getdecipher.com/pages/features/testing/claude-code-integration
4•mrosenfield•2h ago
Hey HN — I'm Michael from Decipher (https://getdecipher.com). We build infrastructure for autonomously generating and maintaining end-to-end tests.

Today we’re launching our Claude Code integration.

We built this because as teams ship more code, especially with coding agents, they need more regression coverage. Claude can already generate a decent Playwright file from a repo and prompt. That solves first-draft generation. It does not solve repeatability.

A generated test is still a static guess. The real problems start when it meets the live app: the browser is logged out, a modal appears, a feature flag changes the path, a selector is stale, or the app changed in a way that requires updating the test without changing what it is supposed to verify.

That is the gap between “Claude wrote a script” and “we have durable E2E coverage.”

Our system splits that loop in two. Claude handles local planning: it reads the request, inspects the repo, infers the flow, and drafts the initial step plan. Decipher handles runtime: agents in our infrastructure run the steps in a live browser, observe what happened after each step, classify failures, and use the product knowledge captured during planning to repair the failing segment.

Once the test is on Decipher, our agents continue maintaining it against the test’s original intent. As the UI or flow changes, they update the test mechanics without silently changing what the test is supposed to verify.

We chose Skills + CLI instead of MCP because this is not a single tool call. It is a stateful loop: gather context, compile steps, start a remote run, inspect runtime state, patch failures, and resume. The CLI handles auth and transport. Skills keep Claude on that path and preserve a clean boundary between local context and remote execution.

In practice, Claude builds an initial plan and sends it through the CLI to our backend. A remote worker runs it against the live app in a cloud browser. The remote agent turns Claude’s steps into real actions on the product, figuring out the right element to click and modifying steps as needed. After each step, or on failure, the Decipher agent sends structured state back to Claude: what step ran, what the agent did, what state the page is in, what kind of failure happened, and the artifacts needed to repair it. Claude can then chime in and make changes.

Feel free to give it a try. We'd greatly appreciate any feedback you might have.

Comments

anvithA•1h ago
How are you defining test coverage and how do you know if all the possible user flows are being tested?
mrosenfield•49m ago
For us, coverage means the important user journeys are covered and stay covered. We start with likely flows from the codebase, then let users point the agent at the journeys that matter most. We also already have session replay agents in production to find bugs, and we’re now using that same system to spot coverage gaps and generate tests for missing flows.

Show HN: I made Claude Code block my distractions and track everything I ship

https://twitter.com/daxaur/status/2029258604084158559
1•daxaur•35s ago•0 comments

My MCP Server Setup: A Practical Guide to Wiring AI into Everything

https://crunchtools.com/my-mcp-server-setup-practical-guide/
1•abdelhousni•50s ago•0 comments

Man Arrested for Plotting with Others to Murder or Kidnap Two Dissidents Abroad

https://www.justice.gov/usao-sdny/pr/man-arrested-plotting-others-murder-or-kidnap-two-victims-ab...
1•737min•57s ago•0 comments

Does Altman Deserve the Heat?

https://tapestry.news/tech/altman-heat/
1•sonalidee•1m ago•0 comments

Harjus v4 adds kernel bypass and more

https://shufflingbytes.com/posts/harjus-release-4.0.0/
1•ValtteriL•1m ago•0 comments

Show HN: TerminalNexus – Turn CLI commands into reusable buttons (Windows)

1•danhof_sss•2m ago•0 comments

Why Autonomous Agents Failed the Initial Hype: An AutoGen Retrospective

https://www.youtube.com/watch?v=2cnxea3xkzM
1•alexchaomander•2m ago•1 comments

Rob Grant Obituary on Ganymede and Titan

https://www.ganymede.tv/2026/03/obituary-rob-grant/
1•nephihaha•2m ago•1 comments

Agent-experience: visual reference to patterns, surfaces, and infrastructure

https://github.com/ygwyg/agent-experience
1•simonpure•3m ago•0 comments

C++ Reflection: Another Monad

https://www.elbeno.com/blog/?p=1813
1•ingve•4m ago•0 comments

Invoicesio.app – Invoice and billing for freelancers and small businesses

https://invoicesio.app/
1•dimitrisal•4m ago•1 comments

AWS-hosted tech providers urge Middle East customers to fail over now

https://www.theregister.com/2026/03/04/aws_saas_middle_east_customer_warnings/
1•Bender•4m ago•0 comments

Dev stunned by $82K Gemini bill after unknown API key thief goes to town

https://www.theregister.com/2026/03/03/gemini_api_key_82314_dollar_charge/
1•Bender•5m ago•1 comments

Faster C software with Dynamic Feature Detection

https://gist.github.com/jjl/d998164191af59a594500687a679b98d
1•todsacerdoti•5m ago•0 comments

Get Paid for Good Posts

https://treechat.com/
3•mitya777•6m ago•0 comments

Up to 10% of Firefox crashes are due to bad memory [thread]

https://mas.to/@gabrielesvelto/116171753263415921
1•MBCook•6m ago•0 comments

With developer verification, Google's Apple envy threatens Android's open legacy

https://arstechnica.com/gadgets/2026/03/with-developer-verification-googles-apple-envy-threatens-...
1•Bender•7m ago•0 comments

Ask HN: Does Claude Code's abilities fluctuate for you too?

1•ammerfest•7m ago•0 comments

CodeRabbit tops the F1 score in Martian's code review benchmarks

https://www.coderabbit.ai/blog/coderabbit-tops-martian-code-review-benchmark
1•smb06•8m ago•0 comments

Open Source Iran War Cost Tracker: 45.7B

https://iranwarcost.com
6•koverda•9m ago•1 comments

Unfiltered bald joy in the most uplifting corner of the internet

https://okayzoomer.substack.com/p/unfiltered-bald-joy-in-the-most-uplifting
1•speckx•9m ago•0 comments

I wrote a spec-driven ISO 8583 parser/builder in Go

https://github.com/leo-aa88/go-iso8583
1•araujo88•9m ago•1 comments

Redesigning Mathematics for Elegant Physics

https://twitter.com/devrimyasar/status/2029006461267857637
1•aesopsfable•9m ago•0 comments

What AI Safety Means to Me

https://olshansky.info/thoughts/2026-03-04-what-ai-safety-means-to-me
1•Olshansky•10m ago•0 comments

Windows 12 in 2026: AI, CorePC and the Future of the AI PC

https://comuniq.xyz/post?t=837
1•01-_-•10m ago•0 comments

Show HN: Auctionnow.io – Launch a store to sell items via auction or buy-it-now

https://auctionnow.io/
4•chptung•11m ago•0 comments

Show HN: AutosClaw – security first *claw with live chat to any agent session

https://github.com/BreuerFlorian/autosclaw
1•fbreuer•12m ago•0 comments

Ex-NYPD Official Indicted for Accepting Bribes from Tech Exec

https://www.thecity.nyc/2026/02/12/kevin-taylor-phil-david-terence-banks-saferwatch-indictment/
2•PaulHoule•12m ago•0 comments

Samsung's 100% DRAM Price Hike and Why Even Apple Had to Pay Up

https://www.buysellram.com/blog/samsungs-100-dram-price-hike-and-why-even-apple-had-to-pay-up/
1•jamesbsr•13m ago•1 comments

The plan to kill Ali Khamenei

https://www.ft.com/content/bf998c69-ab46-4fa3-aae4-8f18f7387836
1•e12e•14m ago•1 comments