Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

85•keks0r•2h ago

We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.

So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.

Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.

The tool is free to use and fully open source, happy to answer questions about the data or how we built it.

Comments

lau_chan•2h ago

Does it work for Codex?

keks0r•1h ago

Yes we added codex support, but its not yet extensively tested. Session upload works, but we kinda have to still QA all the analytics extraction.

cluckindan•2h ago

Nice. Now, to vibe myself a locally hosted alternative.

vidarh•1h ago

I was about to say they have a self-hosting guide, but I see they use third party services that seem absolutely pointless for such a tiny dataset. For comparison, I have a project that happily analyzes 150 million tokens worth of Claude session data w/some basic caching in plain text files on a $300 mini pc in seconds... If/when I reach billions, I might throw Sqlite into the stack. Maybe once I reach tens of billions, something bigger will be worthwhile.

keks0r•1h ago

There is also a docker setup in there to run everything locally.

vidarh•1h ago

That's great. It's still over-engineered given processing this data in-process is more than fast enough at a scale far greater than theirs.

keks0r•1h ago

The docker-compose contain everything you should need: https://github.com/obsessiondb/rudel/blob/main/docker-compos...

marconardus•1h ago

It might be worthwhile to include some of an example run in your readme.

I scrolled through and didn’t see enough to justify installing and running a thing

keks0r•1h ago

Ah sorry, the readme is more about how to run the repo. The "product" information is rather on the website: https://rudel.ai

152334H•1h ago

is there a reason, other than general faith in humanity, to assume those '1573 sessions' are real?

I do not see any link or source for the data. I assume it is to remain closed, if it exists.

keks0r•1h ago

Its our own sessions, from our team, over the last 3 months. We used them to develop the product and learn about our usage. You are right, they will remain closed. But I am happy to share aggregated information, if you have specific questions about the dataset.

languid-photic•4m ago

it's reasonable to note that w/o sharing the data these findings can't be audited or built upon

but i think the prior on 'this team fabricated these findings' is v low

ekropotin•1h ago

> That's it. Your Claude Code sessions will now be uploaded automatically.

No, thanks

keks0r•1h ago

It will be only enabled for the repo where you called the `enable` command. Or use the cli `upload` command for specific sessions.

Or you can run your own instance, but we will need to add docs, on how to control the endpoint properly in the CLI.

tgtweak•1h ago

Big ask to expect people to upload their claude code sessions verbatim to a third party with nothing on site about how it's stored, who has access to it, who they are... etc.

keks0r•57m ago

We dont expect anything, we put it out there, and we might be able to build trust as well, but maybe you dont trust us, thats fair. You can still run it yourself. We are happy about everyone trying it out, either hosted or not. We are hosting it, just to make it easier for people that want to try it, but you dont have to. But you have a good point, we should probably put more about this on the website. Thanks.

emehex•1h ago

For those unaware, Claude Code comes with a built in /insights command...

keks0r•1h ago

Ohh this is exciting, I kinda overlooked it. I assume there are still a lot of differences, especially for accross teams. But I immediately ran it, when I saw your comment. Actually still running.

loopmonster•1h ago

insights is straight ego fluffing - it just tells you how brilliant you are and the only actionable insights are the ones hardcoded into the skill that appear for everyone. things like be very specific with the success criteria ahead of time (more than any human could ever possibly be), tell the llm exactly what steps to follow to the letter (instead of doing those steps yourself), use more skills (here's an example you can copy paste that has 2 lines and just tells it to be careful), and a couple of actually neat ideas (like having it use playwright to test changes visually after a UI change)

evrendom•13m ago

true, the best comes out of it when one uses claude code and codex as a tag team

vova_hn2•1h ago

This is so sad that on top of black box LLMs we also build all these tools that are pretty much black box as well.

It became very hard to understand what exactly is sent to LLM as input/context and how exactly is the output processed.

keks0r•1h ago

The tool does have a quite detailed view for individual sessions. Which allows you to understand input and output much better, but obviously its still mysterious how the output is generated from that input.

blef•1h ago

Reminds me https://www.agentsview.io/.

keks0r•1h ago

Our focus is a little bit more cross team, and in our internal version, we have also some continuous improvement monitoring, which we will probably release as well.

mentalgear•56m ago

> A local-first desktop and web app for browsing, searching, and analyzing your past AI coding sessions. See what your agents actually did across every project.

Thx for the link - sounds great !

KaiserPister•1h ago

This is awesome! I’m working on the Open Prompt Initiative as a way for open source to share prompting knowledge.

keks0r•1h ago

Cool, whats the link? We have some learnings, especially in the "Skill guiding" part of our example.

alyxya•1h ago

Why does it need login and cloud upload? A local cli tool analyzing logs should be sufficient.

keks0r•1h ago

We used it across the team, and when you want to bring metrics together across multiple people, its easier on a server, than local.

anthonySs•1h ago

is this observability for your claude code calls or specifically for high level insights like skill usage?

would love to know your actual day to day use case for what you built

keks0r•1h ago

the skill usage was one of these "I am wondering about...." things, and we just prompted it into the dashboard to undertand it. We have some of these "hunches" where its easier to analyze having sessions from everyone together to understand similarities as well as differences. And we answered a few of those kinda one off questions this way. Ongoing, we are also using a lot our "learning" tracking, which is not really usable right now, because it integrates with a few of our other things, but we are planning to release it also soon. Also the single session view sometimes helps to debug a sessions, and then better guide a "learning". So its a mix of different things, since we have multiple projects, we can even derive how much we are working on each project, and it kinda maps better than our Linear points :)

sriramgonella•1h ago

This kind of dataset is really valuable because most conversations about AI coding tools are based on anecdotes rather than actual usage patterns. I’d be curious about a few things from the sessions:

1.how often developers accept vs modify generated code 2.which tasks AI consistently accelerates (tests, refactoring, boilerplate?) 3.whether debugging sessions become longer or shorter with AI assistance

My experience so far is that AI is great for generating code but the real productivity boost comes when it helps navigate large codebases and reason about existing architecture.

keks0r•59m ago

1. can only partly be answered, because we can only capture the "edits" that are prompted, vs manual ones. 2. for us actually all of them, since we do everything with ai, and invest heavily and continously, to just reduce the amount of iterations we need on it 3. thats a good one, we dont have anything specific for debugging yet, but it might be an interesting class for a type of session.

mentalgear•57m ago

How diverse is your dataset?

keks0r•55m ago

Team of 4 engineers, 1 data & business person, 1 design engineer.

I would say roughly equal amount of sessions between them (very roughly)

Also maybe 40% of coding sessions in large brownfield project. 50% greenfield, and remaining 10% non coding tasks.

Aurornis•43m ago

> 26% of sessions are abandoned, most within the first 60 seconds

Starting new sessions frequently and using separate new sessions for small tasks is a good practice.

Keeping context clean and focused is a highly effective way to keep the agent on task. Having an up to date AGENTS.md should allow for new sessions to get into simple tasks quickly so you can use single-purpose sessions for small tasks without carrying the baggage of a long past context into them.

longtermemory•5m ago

I agree. In my experience: "single-purpose sessions for small tasks" is the key

dmix•43m ago

I've seen Claude ignore important parts of skills/agent files multiple times. I was running a clean up SKILL.md on a hundred markdown files, manually in small groups of 5, and about half the time it listened and ran the skill as written. The other half it would start trying to understand the codebase looking for markdown stuff for 2min, for no good reason, before reverting back to what the skill said.

LLMs are far from consistent.

keks0r•41m ago

yes we had to tune the claude.md and the skill trigger quite a bit, to get it much better. But to be honest also 4.6 did improve it quite a bit. Did you run into your issues under 4.5 or 4.6?

dmix•22m ago

I was using Sonnet 4.6 since it was a menial task

cbg0•38m ago

Try this: Keep your CLAUDE.md as simple as possible, disable skills, and request Opus to start a subagent for each of the files and process at most 10 at a time (so you don't get rate limited) and give it the instructions in the skill for whatever processing you're doing to the markdowns as a prompt, see if that helps.

longtermemory•8m ago

From session analysis, it would be interesting to understand how crucial the documentation, the level of detail in CLAUDE.md, is. It seems to me that sometimes documentation (that's too long and often out of date) contributes to greater entropy rather than greater efficiency of the model and agent.

It seems to me that sometimes it's better and more effective to remove, clean up, and simplify (both from CLAUDE.md and the code) rather than having everything documented in detail.

Therefore, from session analysis, it would be interesting to identify the relationship between documentation in CLAUDE.md and model efficiency. How often does the developer reject the LLM output in relation to the level of detail in CLAUDE.md?

socialinteldev•6m ago

We run an API that AI agents use via MCP (Instagram influencer search). From the server side we see the inverse of this study -- what agents do when out in the wild using tools, not coding.

A few data points from two weeks live:

- Agents send well-formed queries with specific parameters (country, gender, follower filters) -- suggests the reasoning chain before tool calls is solid - 50 unique agents hit our 402 response, only 1 converted to paying. The USDC friction kills conversion even when agent intent is clear - Node.js-based MCP clients dominate paid traffic vs Python

Your 26% session abandonment matches something we see: many hits are single 402 responses with no retry. The agent discovers the API, gets the payment requirement, but has no wallet configured to complete the transaction. It is essentially agent session abandonment at the payment step.

Would be curious whether your data shows patterns in external tool call failures vs successes -- whether agents retry on failure or abandon.

robutsume•4m ago

The 26% abandonment rate with most bailouts in the first 60 seconds is the most interesting finding here. That's not an agent problem — that's a prompt-to-intent mismatch problem. The human realizes within one exchange that they asked the wrong question or the agent interpreted it wrong.

The error cascade pattern predicting abandonment within 2 minutes reminds me a lot of monitoring infrastructure services. We learned years ago that the first 90 seconds of a deployment tell you almost everything about whether it'll stick. Same principle seems to apply to agent sessions — if the initial tool selection or file read is wrong, confidence collapses and the human takes over manually.

Curious about the skills finding (4%). Is that because skills are poorly discoverable, or because people default to natural language prompts and never bother setting up structured skill definitions? The gap between "available" and "used" in agent tooling feels like the extension ecosystem problem all over again.

mbesto•3m ago

So what conclusions have you drawn or could a person reasonably draw with this data?

Show HN: s@: decentralized social networking over static sites

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

Show HN: Axe A 12MB binary that replaces your AI framework

SHOW HN: A usage circuit breaker for Cloudflare Workers

Show HN: Riventa.Dev – AI-native DevOps that acts, not just alerts

Show HN: VaultLeap – USD accounts for founders outside the US

Show HN: We open sourced Vapi – UI included

Show HN: A desktop app for managing Claude Code sessions

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: Calyx – Ghostty-Based macOS Terminal with Liquid Glass UI

Show HN: Klaus – OpenClaw on a VM, batteries included

Show HN: I built proxy that keeps RAG working while hiding PII

Show HN: We wrote a custom microkernel for XR because Android felt too bloated

Show HN: Run an Agent Council of LLMs that debate and synthesize answers

Show HN: SmartClip – fix multi-line shell commands before they hit your terminal

Show HN: Imgfprint – deterministic image fingerprinting library for Rust

Show HN: Open-source browser for AI agents

Show HN: Autoresearch@home

Show HN: Lazyagent – One terminal UI for all your coding agents

Show HN: XLA-based array computing framework for R

Show HN: A context-aware permission guard for Claude Code

Show HN: AgentBridge – Let AI agents control Classic Mac OS thru a shared folder

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

Show HN: Satellite imagery object detection using text prompts

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

Show HN: Elevators.ltd

Show HN: I built a screen recorder with automatic zoom effects

Show HN: I built Chronoscope, because Google Maps won't let you visit 3400 BCE

Show HN: Bandmeter: Per-program network usage monitor for Linux, built with GPUI

Show HN: AI-powered one-click translator for Pokémon GBA ROM hacks

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

Comments

Show HN: s@: decentralized social networking over static sites

Show HN: We analyzed 1,573 Claude Code sessions to see how AI agents work

Show HN: Axe A 12MB binary that replaces your AI framework

SHOW HN: A usage circuit breaker for Cloudflare Workers

Show HN: Riventa.Dev – AI-native DevOps that acts, not just alerts

Show HN: VaultLeap – USD accounts for founders outside the US

Show HN: We open sourced Vapi – UI included

Show HN: A desktop app for managing Claude Code sessions

Show HN: I built a tool that watches webpages and exposes changes as RSS

Show HN: Calyx – Ghostty-Based macOS Terminal with Liquid Glass UI

Show HN: Klaus – OpenClaw on a VM, batteries included

Show HN: I built proxy that keeps RAG working while hiding PII

Show HN: We wrote a custom microkernel for XR because Android felt too bloated

Show HN: Run an Agent Council of LLMs that debate and synthesize answers

Show HN: SmartClip – fix multi-line shell commands before they hit your terminal

Show HN: Imgfprint – deterministic image fingerprinting library for Rust

Show HN: Open-source browser for AI agents

Show HN: Autoresearch@home

Show HN: Lazyagent – One terminal UI for all your coding agents

Show HN: XLA-based array computing framework for R

Show HN: A context-aware permission guard for Claude Code

Show HN: AgentBridge – Let AI agents control Classic Mac OS thru a shared folder

Show HN: Vanilla JavaScript refinery simulator built to explain job to my kids

Show HN: Satellite imagery object detection using text prompts

Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG

Show HN: Elevators.ltd

Show HN: I built a screen recorder with automatic zoom effects

Show HN: I built Chronoscope, because Google Maps won't let you visit 3400 BCE

Show HN: Bandmeter: Per-program network usage monitor for Linux, built with GPUI

Show HN: AI-powered one-click translator for Pokémon GBA ROM hacks