A lot of friends tell me it’s impossible. At the end of the day these models are just gradient descent and probability machines — basically a very sophisticated Markov chain. Obviously I’m exaggerating lol, but you get the idea.
Still, I wanted to see if better infrastructure around the model could push agents a little closer to being reliable individual coders instead of assistants with severe amnesia.
After reading a bunch of papers on the topic, I started experimenting with persistent memory and guided learning for agents.
The idea is simple:
Every action an agent takes gets stored. When future agents work on the same repo, they receive condensed “tips and tricks” derived from past successes and failures.
These memories are embedded so a supervisor layer can semantically search for relevant context and inject it into the prompt without blowing up the context window.
I'm also experimenting with letting agents communicate with each other so multiple tasks can run in parallel — even if they overlap in code — and resolve conflicts themselves.
To test this, I built a small site.
Right now it's very bare bones, but it lets you connect:
a GitHub repo
an Anthropic API key
Linear issues
Then you create issues labeled gradient-agent in the To-Do column and the agents start working on them.
The more you iterate on a repo, the more “learning” accumulates around it. In theory this should make future PRs more stable.
The long-term goal is a system that can plan, design, code, test, and review an entire PR on its own without constant prompting.
Now the unfortunate part.
I’m a poor university student and this infrastructure is not exactly cheap lol. I added monthly free credits so people can test it, but once those run out you either wait for the reset or pay as you go. (LLM API costs still fall on you.)
Also: it’s very much a beta, so be careful with extremely complex tasks.
I’m considering hosting something like qwen3.5:122b so people can test without their own API keys. Credits would run out faster but it might be cheaper overall. Curious if people would want that.
Site: https://usegradient.dev
Originally I planned to limit testing to ~20 people, but honestly I'm curious how it behaves with more users.
Pretty please don't DDOS me.
Any feedback is appreciated.