Oh cool!
> in well-tested codebases
Oh ok never mind
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
(I work on Copilot coding agent)
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
I gave up trying to watch the stream after the third authentication timeout, but if I'd known it was this I'd maybe have tried a fourth time.
I’d love for this to blow past cursor. Will definitely tune in to see it.
I'm senior enough that I get to frequently see the gap between what my dev team thinks of our work and what actual customers think.
As a result, I no longer care at all what developers (including myself on my own projects) think about the quality of the thing they've built.
In coding agent, we encourage the agent to be very thorough in its work, and to take time to think deeply about the problem. It builds and tests code regularly to ensure it understands the impact of changes as it makes them, and stops and thinks regularly before taking action.
These choices would feel too “slow” in a synchronous IDE based experience, but feel natural in a “assign to a peer collaborator” UX. We lean into this to provide as rich of a problem solving agentic experience as possible.
(I’m working on Copilot coding agent)
I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.
Everything old is new again!
This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.
LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.
Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.
Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".
Which is true, however there's a big caveat: Time saved isn't time gained.
You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).
1 - https://github.com/plandex-ai/plandex
Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.
I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.
In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.
So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.
[0] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...
The actual quote by Satya says, "written by software".
In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.
Similar to google. MS now requires devs to use ai
So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.
In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
(Source: I'm the product lead at GitHub for Copilot coding agent.)
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
where are we wrt the agent surveying open issues (say, via JIRA) and evaluating which ones it would be most effective at handling, and taking them on, ideally with some check-in for conirmation?
Or, contrariwise, from having product management agents which do track and assign work?
The entire website was created by Claude Sonnet through Windsurf Cascade, but with the “Fair Witness” prompt embedded in the global rules.
If you regularly guide the LLM to “consult a user experience designer”, “adopt the multiple perspectives of a marketing agenc”, etc., it will make rather decent suggestions.
I’ve been having pretty good success with this approach, granted mostly at the scale of starting the process with “build me a small educational website to convey this concept”.
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html
That seemed to drop off the Github changelog after February. I’m wondering if that team got reallocated to the copilot agent.
Copilot Workspace could take a task, implement it and create a PR - but it had a linear, highly structured flow, and wasn't deeply integrated into the GitHub tools that developers already use like issues and PRs.
With Copilot coding agent, we're taking all of the great work on Copilot Workspace, and all the learnings and feedback from that project, and integrating it more deeply into GitHub and really leveraging the capabilities of 2025's models, which allow the agent to be more fluid, asynchronous and autonomous.
(Source: I'm the product lead for Copilot coding agent.)
It seems Copilot could have really owned the vibe coding space. But that didn’t happen. I wonder why? Lots of ideas gummed up in organizational inefficiencies, etc?
But the upgraded Copilot was just in response to Cursor and Winsurf.
We'll see.
Good to see an official way of doing this.
I've cancelled my copilot subscription last week and when it expires in two weeks I'll mostly likely shift to local models for autocomplete/simple stuff.
That said, months ago I did experience the kind of slow agent edit times you mentioned. I don't know where the bottleneck was, but it hasn't come back.
I'm on library WiFi right now, "vibe coding" (as much as I dislike that term) a new tool for my customers using Copilot, and it's snappy.
The claude and gemini models tend to be the slowest (yes, including flash). 4o is currently the fastest but still not great.
Edit: From the TFA: Using the agent consumes GitHub Actions minutes and Copilot premium requests, starting from entitlements included with your plan.
[0] https://docs.github.com/en/copilot/managing-copilot/monitori...
Now Microsoft sits on a goldmine of source code and has the ability to offer AI integration even to private repositories. I can upload my code into a private repo and discuss it with an AI.
The only thing Google can counter with would be to build tools which developers install locally, but even then I guess that the integration would be limited.
And considering that Microsoft owns the "coding OS" VS Code, it makes Google look even worse. Let's see what they come up with tomorrow at Google I/O, but I doubt that it will be a serious competition for Microsoft. Maybe for OpenAI, if they're smart, but not for Microsoft.
I'm all for new tech getting introduced and made useful, but let's make it all opt in, shall we?
AMAZING
Especially now that Copilot supports MCP I can plug in my own custom "Tools" (i.e. Function calling done by the AI Agent), and I have everything I need. Never even bothered trying Cursor or Windsurf, which i'm sure are great too, but _mainly_ since they're just forks of VSCode, as the IDE.
r0ckarong•3h ago
postalrat•3h ago
OutOfHere•3h ago
olex•3h ago
To me, this reads like it'll be a good junior and open up a PR with its changes, letting you (the issue author) review and merge. Of course, you can just hit "merge" without looking at the changes, but then it's kinda on you when unreviewed stuff ends up in main.
tmpz22•3h ago
DeepYogurt•3h ago
odiroot•3h ago
erikerikson•3h ago
freeone3000•2h ago
timrogers•2h ago
Copilot literally can't push directly to the default branch - we don't give it the ability to do that - precisely because we believe that all AI-generated code (just like human generated code) should be carefully reviewed before it goes to production.
(Source: I'm the product lead for Copilot coding agent.)