Why we built our own background agent

https://builders.ramp.com/post/why-we-built-our-background-agent

122•jrsj•3w ago

Comments

sdwr•3w ago

The commitment to reducing friction is really incredible. Are they implying that any developer could recreate the system with AI from the description?

inssein•3w ago

Probably the best internal ai platform I've seen to date, incredible work.

ostegm•3w ago

This is a great writeup! Could you share more about the sandbox <-> client communication architecture? e.g., is the agent emitting events to a queue/topic, writing artifacts to object storage, and the client subscribes; or is it more direct (websocket/gRPC) from the sandbox? I’ve mostly leaned on sandbox.exec() patterns in Modal, and I’m curious what you found works best at scale.

mootoday•3w ago

After reading the article, I built a tool like that with sprites.dev. There's a websocket to communicate stdout and stderr to the client.

Web app submits the prompt, a sandbox starts on sprites.dev and any Claude output in the sandbox gets piped to the web app for display.

Not sure I can open source it as it's something I built for a client, but ask if you have any questions.

memset•3w ago

I work at Ramp and have always been on the “luddite” side of AI code tools. I use them but usually I’m not that impressed and a curmudgeon when I see folks ask Claude to debug something instead of just reading the code. I’m just an old(er) neckbeard at heart.

But. This tool is scarily good. I’m seeing it “1-shot” features in a fairly sizable code base and fixes with better code and accuracy than me.

keyle•3w ago

This basically sums up where we're at. Undeniably useful but careful in approach.

ColinEberhardt•3w ago

An important point here is that it isnt doing a 1-shot implementation, it is iteratively solving a problem over multiple iterations, with a closed feedback loop.

Create the right agentic feedback loop and a reasoning model can perform far better through iteration than its first 1-shot attempt.

This is very human. How much code can you reliable write without any feedback? Very little. We iterate, guided by feedback (compiler, linter, executing and exploring)

yomismoaqui•3w ago

These Xmas there have been a lot of converted programmers after having some free time and playing with things like Codex, Claude Code, AMP...

ManuelKiessling•3w ago

I guess we all know and „love“ how every five minutes, some breathless hipster influencer posts „This changes everything!!!“ to every new x.y.1 AI bubble increment.

But honestly? This here really is something.

I can vividly imagine how in a not too far future, there will only be two types of product companies: those that work like this, and those that don’t — and vanish.

Edit: To provide a less breathless take myself:

What I can very realistically imagine is that just like today sane and level-headed startups go „let’s first set up some decent infrastructure-as-code, a continuous delivery pipeline, and a solid testing framework, and then start building the product for good“, in the future sane and level-headed startups will go „let’s first set up some decent infrastructure-as-code, a continuous delivery pipeline, a solid testing framework, and a Ramp-style background agent — and then start building the product for good“.

llms01•3w ago

Yeah I feel somewhat the same way. This looks like some serious engineering effort went into it, and it looks like there should be a way to measure its impact on developer productivity and quality of output. I'm a bit hesitant considering finance is not an industry you want to introduce security problems in, but nonetheless will be a good test of these tools.

If it really does work I expect there will be many paid and open source variants that other companies can adopt into their workflows. So I'll patiently wait for the outcomes before trying something like this, but I'm glad someone is.

383toast•3w ago

the chrome extension bit is super interesting and well thought out

383toast•3w ago

i wonder what percentage of PRs etc is now from non eng?

cloudking•3w ago

We use https://devin.ai for this and it works very well. Devin has it's own virtual environment, IDE, terminal and browser. You can configure it to run your application and connect to whatever it needs. Devin can modify the app, test changes in the browser and send you a screen recording of the working feature with a PR.

martypitt•3w ago

Interestingly, Devin lists Ramp (the OP) as a customer on their front page.

Surprised they need both.

martypitt•3w ago

This is a really great post - and what they've built here is very impressive.

I wonder if we're at the point where the cost of building and maintaining this yourselves (assisted with an AI Copilot) is now more effective than an off-the-shelf?

It feels like there's a LOT of moving parts here, but also it's deeply tailored to their own setup.

FWIW - I tried pointing Claude at the post and asking it to design an implementation, (like the post said to do) and it struggled - but perhaps I prompted it wrong.

heffstaDug•3w ago

I had this exact idea, I pointed Codex to it, with giving it context of our environment which is pretty complex. It is struggling, but that is because even our dev experience where I work is not great and not documented, so that would need to be lifted before I can reliably get an agent setup as well integrated as this blog post details.

theturtletalks•3w ago

And here I am trying to get 1 terminal agent to control 4-5 other terminal agents

tosti•3w ago

Is overengineering the norm nowadays?

If you need a queue, lpd. If you need scheduling, cron. If you need backups, tar. If you need to communicate, email and irc. If you need to remote any of those, ssh.

Things shouldn't be difficult, yet they are.

hrimfaxi•3w ago

Do you have a specific critique or is this another dropbox-esque comment?

tosti•3w ago

I tried reading TFA but it was full of garbage.

falloutx•3w ago

This kind of project totally shows that Claude Code is nothing special, if anything it lacks a lot of features. I hope every company develops a model agnostic coding agent rather than using a one tightly controlled by one company.

willtemperley•3w ago

Yes. I don't think that one-size-fits-all is the future of coding agents. Different companies have different requirements. I would like to build specialised test harnesses that internal coding agents could use to iterate rapidly.

Also, inevitably these AI companies will start selling out data and become part of the surveillance state, if they're not already.

redman25•3w ago

It's really a shame because anthropic had a lot of opportunity to show good will by open sourcing claude code.

aschen•3w ago

Reading this article and discovering how Ramp team use Modal for sandboxed dev environment just saved us weeks of custom infra development and potentially months of headache, thanks you !

yoav•3w ago

Fun marketing experiment, but you basically implemented ralph wiggum in the cloud.

Claude code locally in a vm and/or with work trees will 1 shot far better without burning cloud infra cash.

I’d bet this ends up wasting more money and time than it’s worth in practice.

suralind•3w ago

Definitely impressive. How many engineering hours did you need to build an MVP?

mootoday•3w ago

I built something similar after reading the blog post, based on sprites.dev.

A day of work to get the prototype working and a few hours the next day to allow multiple users to authenticate.

It's surprisingly simple.

Show HN: Poddley.com – Follow people, not podcasts

Layoffs Surge 118% in January – The Highest Since 2009

Papyrus 114: Homer's Iliad

DicePit – Real-time multiplayer Knucklebones in the browser

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Show HN: AI Agent Tool That Keeps You in the Loop

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

Achieving Ultra-Fast AI Chat Widgets

Show HN: Runtime Fence – Kill switch for AI agents

Researchers surprised by the brain benefits of cannabis usage in adults over 40

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "