Show HN: Decipher x Claude Code – Infra to auto-generate and maintain E2E tests

https://docs.getdecipher.com/pages/features/testing/claude-code-integration

4•mrosenfield•2h ago

Hey HN — I'm Michael from Decipher (https://getdecipher.com). We build infrastructure for autonomously generating and maintaining end-to-end tests.

Today we’re launching our Claude Code integration.

We built this because as teams ship more code, especially with coding agents, they need more regression coverage. Claude can already generate a decent Playwright file from a repo and prompt. That solves first-draft generation. It does not solve repeatability.

A generated test is still a static guess. The real problems start when it meets the live app: the browser is logged out, a modal appears, a feature flag changes the path, a selector is stale, or the app changed in a way that requires updating the test without changing what it is supposed to verify.

That is the gap between “Claude wrote a script” and “we have durable E2E coverage.”

Our system splits that loop in two. Claude handles local planning: it reads the request, inspects the repo, infers the flow, and drafts the initial step plan. Decipher handles runtime: agents in our infrastructure run the steps in a live browser, observe what happened after each step, classify failures, and use the product knowledge captured during planning to repair the failing segment.

Once the test is on Decipher, our agents continue maintaining it against the test’s original intent. As the UI or flow changes, they update the test mechanics without silently changing what the test is supposed to verify.

We chose Skills + CLI instead of MCP because this is not a single tool call. It is a stateful loop: gather context, compile steps, start a remote run, inspect runtime state, patch failures, and resume. The CLI handles auth and transport. Skills keep Claude on that path and preserve a clean boundary between local context and remote execution.

In practice, Claude builds an initial plan and sends it through the CLI to our backend. A remote worker runs it against the live app in a cloud browser. The remote agent turns Claude’s steps into real actions on the product, figuring out the right element to click and modifying steps as needed. After each step, or on failure, the Decipher agent sends structured state back to Claude: what step ran, what the agent did, what state the page is in, what kind of failure happened, and the artifacts needed to repair it. Claude can then chime in and make changes.

Feel free to give it a try. We'd greatly appreciate any feedback you might have.

Comments

anvithA•1h ago

How are you defining test coverage and how do you know if all the possible user flows are being tested?

mrosenfield•49m ago

For us, coverage means the important user journeys are covered and stay covered. We start with likely flows from the codebase, then let users point the agent at the journeys that matter most. We also already have session replay agents in production to find bugs, and we’re now using that same system to spot coverage gaps and generate tests for missing flows.

Show HN: I made Claude Code block my distractions and track everything I ship

My MCP Server Setup: A Practical Guide to Wiring AI into Everything

Man Arrested for Plotting with Others to Murder or Kidnap Two Dissidents Abroad

Does Altman Deserve the Heat?

Harjus v4 adds kernel bypass and more

Show HN: TerminalNexus – Turn CLI commands into reusable buttons (Windows)

Why Autonomous Agents Failed the Initial Hype: An AutoGen Retrospective

Rob Grant Obituary on Ganymede and Titan

Agent-experience: visual reference to patterns, surfaces, and infrastructure

C++ Reflection: Another Monad

Invoicesio.app – Invoice and billing for freelancers and small businesses

AWS-hosted tech providers urge Middle East customers to fail over now

Dev stunned by $82K Gemini bill after unknown API key thief goes to town

Faster C software with Dynamic Feature Detection

Get Paid for Good Posts

Up to 10% of Firefox crashes are due to bad memory [thread]

With developer verification, Google's Apple envy threatens Android's open legacy

Ask HN: Does Claude Code's abilities fluctuate for you too?

CodeRabbit tops the F1 score in Martian's code review benchmarks

Open Source Iran War Cost Tracker: 45.7B

Unfiltered bald joy in the most uplifting corner of the internet

I wrote a spec-driven ISO 8583 parser/builder in Go

Redesigning Mathematics for Elegant Physics

What AI Safety Means to Me

Windows 12 in 2026: AI, CorePC and the Future of the AI PC

Show HN: Auctionnow.io – Launch a store to sell items via auction or buy-it-now

Show HN: AutosClaw – security first *claw with live chat to any agent session

Ex-NYPD Official Indicted for Accepting Bribes from Tech Exec

Samsung's 100% DRAM Price Hike and Why Even Apple Had to Pay Up

The plan to kill Ali Khamenei