Oh cool!
> in well-tested codebases
Oh ok never mind
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
(I work on Copilot coding agent)
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
I gave up trying to watch the stream after the third authentication timeout, but if I'd known it was this I'd maybe have tried a fourth time.
I’d love for this to blow past cursor. Will definitely tune in to see it.
I'm senior enough that I get to frequently see the gap between what my dev team thinks of our work and what actual customers think.
As a result, I no longer care at all what developers (including myself on my own projects) think about the quality of the thing they've built.
In many cases, this means pushing for more stable deployments which requires other quality improvements.
In coding agent, we encourage the agent to be very thorough in its work, and to take time to think deeply about the problem. It builds and tests code regularly to ensure it understands the impact of changes as it makes them, and stops and thinks regularly before taking action.
These choices would feel too “slow” in a synchronous IDE based experience, but feel natural in a “assign to a peer collaborator” UX. We lean into this to provide as rich of a problem solving agentic experience as possible.
(I’m working on Copilot coding agent)
I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.
Everything old is new again!
This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.
LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.
Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.
The only people who believe this level of AI marketing are the people who haven't yet used the tools.
> which means no one will be watching the cost in real time.
Maybe some day there's an agentic coding tool that goes off into the weeds and runs for days doing meaningless tasks until someone catches it and does a Ctrl-C, but the tools I've used are more likely to stop short of the goal than to continue crunching indefinitely.
Regardless, it seems like a common experience for first-timers to try a light task and then realize they've spent $3, instantly setting expectations for how easy it is to run up a large bill if you're not careful.
Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".
Which is true, however there's a big caveat: Time saved isn't time gained.
You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
Hourly_rate / 12 = 5min_rate
If light_edit_cost < 5min_rate then savings=true
What do you mean?
If I have some task that requires 1000 hours, and I'm able to shave it down to one hour, then I did just "save" 999 hours -- just in the same way that if something costs $5 and I pay $4, I saved $
You still get your 24 hours, no matter how much time you save.
What actually matters is the value of what is delivered, not how much time it actually saves you. Justifying costs by "time saved" is a good way to eat up your money on time-saving devices.
You could also say you saved 41.666 people an entire 24 hour day, by "saving 1000 hours", or some other fractional way.
How you're trying to explain it as "saving 1000 hours each day" is really not making any sense without further context.
And I'm sure if I hadn't written this comment I would be saving 1000 hours on a stupid comment thread.
It's like this coupon booklets they used to sell. "Over $10,000 of savings!"
Yes but how much money do I have to spend in order to save $10,000?
There was this funny commercial in the 90s for some muffler repair chain that was having a promotion: "Save Fifty Dollars..."
The theme was "What will you do with the fifty dollars you saved?" And it was people going to Disneyland or afancy dinner date.
The people (actors) believed they were receiving $50. They acted as if it was free money. Meanwhile there was zero talk about whether their cars needed muffler repair at all.
It's called "Thinking past the sale". It's a common sales tactic.
But the llm bill will always invoice you for all the saved work regardless.
The more of my washing you can take off me, the more of your time you can save by then using a washing machine or laundry service!
Saving an hour of my time is a waste, when saving an hour of your time is worth so much more. So it makes economic sense for you to pay me, to take my washing off me!
( Does that better illustrate my point? )
Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.
Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).
1 - https://github.com/plandex-ai/plandex
Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management
I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.
Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.
I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.
For example “this module contains logic defining routes for serving an HTTP API. We don’t write any logic that interacts directly with db models in these modules. Rather, these modules make calls to services in `/services`, which make such calls.”
It wouldn’t make sense to duplicate this comment across every router sub-module. And it’s not obvious from looking at any one module that this rule is applied across all modules, without such guidance.
These little bits of scaffolding really help narrow down the scope of the code that LLMs eventually try to write.
My experience agrees that separating the README and the TODO is super helpful for managing context.
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
This is a popular workflow I first read about here[1].
This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.
[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
[1] https://notes.jessmart.in/My+Writings/Pair+Programming+with+...
READMEs per module also help, but it really depends a lot on the model. Gemini will happily traipse all over your codebase at random, gpt-4.1 will do inline imports inside functions because it seems to lack any sort of situational awareness, Claude so far gets things mostly right.
For now.
If this product is going to be successful they are going to need the bulk of their customers at 40-100k employees.
That doesn’t matter anymore when you’re vibe coding it. No human is going to look at it anyway.
It can all be if/else on one line in one file. If it works and if the LLMs can work at, iterate and implement new business requirements, while keeping performance and security - code structure, quality and readability don’t matter one bit.
Customers don’t care about code quality and the only reason businesses used to care is to make it less money consuming to build and ship new things, so they can make more money.
This is a common view, and I think will be the norm on the near-to-mid term, especially for basic CRUD apps and websites. Context windows are still too small for anything even slightly complex (I think we need to be at about 20m before we start match human levels), but we'll be there before you know it.
Engineers will essentially become people who just guide the AIs and verify tests.
Then as the context window increases, it’s less and less of an issue
Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.
I've intentionally picked up simple stack: html+js+php.
A couple of things:
* I'd say I'm happy about the result from product's perspective * Codebase could be better, but I could not care less about in this case * By default, AI does not care about security unless I specifically tell it * Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)
On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.
Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.
I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.
In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.
So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.
[0] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...
The actual quote by Satya says, "written by software".
In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.
Similar to google. MS now requires devs to use ai
I guess maybe different teams have different requirements/workflows?
I can imagine there are groups where it is true. I mostly want to push back on the idea that there's one monolithic culture that is Microsoft.
LLM use is now part of the annual review process, its self reported if I'm not mistaken but at least at Microsoft they would have plenty of data to know how often you use the tools.
So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.
In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
(Source: I'm the product lead at GitHub for Copilot coding agent.)
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
Those orgs that value high-quality documentation won’t have undocumented codebases to begin with.
And let’s face it, like writing code, writing docs does have a lot of repetitive, boring, boilerplate work, which I bet is exactly why it doesn’t get done. If an LLM is filling out your API schema docs, then you get to spend more time on the stuff that’s actually interesting.
A good example of the kind of result is something like the Laravel documentation[1] and its associated API reference[2]. I don't believe AI can help with this.
[0]: https://en.wikipedia.org/wiki/Docstring
I'd argue the vast majority of software development is neither critical nor commonly used. Anecdotal, but I've written documentation and never got any feedback on it (whether it's good or bad), which implies it's not read or the quality doesn't matter.
Actually if you want well-written prose you'll read AI slop there too. I saw people comparing their "vibe writing" workflows for their "books" on here the other day. Nothing is to be spared, apparently
I expect though that most people don't read in that much detail, and AI generated stuff will be 80-90% "good enough", at least the same if not better than someone who doesn't actually like writing documentation.
> What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
Isn't that already the case for a lot of software development? If it's boring and mundane, an AI can do it too so you can focus on more difficult or higher level issues.
Of course, the danger is that, just like with other automated PRs like dependency updates, people trust the systems and become flippant about it.
So they won’t like working on their job ?
I use all of these tools, but you also know what "they're doing"...
I know our careers are changing dramatically, or going away (I'm working on a replacement for myself), but I just like listening to all the "what we're doing is really helping you..."
Doing either of them _well_ - the way you do when you actually care about them and they actually matter - is still so far beyond LLMs. Good documentation and good tests are such a differentiator.
If/when will this take over your job?
If we're expected to even partially believe the marketing, LLM coding agents are useful today at junior level developer tasks and improving quickly enough that senior tasks will be doable soon too. How do you convince so many junior and senior level devs to build that?
I get paid for the mundane, boring, annoying tasks, and I really like getting paid.
It’s very, very far from possible today.
The goal here is for it to be able to do everything, taking 100% of the work
2nd best is to do the hard, big value adds so companies can hire cheap labor for the boring shit
3rd best is to only do the mundane and boring stuff
Where does the most come from? There's a certain sense of satisfaction in knowing I've tested a piece of code per my experience in the domain coupled with knowledge of where we'll likely be in six months. The same can be said for documentation - hell, on some of the projects I've worked on we've entire teams dedicated to it, and on a complicated project where you're integrating software from multiple vendors the costs of getting it wrong can be astronomical. I'm sorry you feel this way.
> There's a certain sense of satisfaction in knowing I've tested a piece of code per my experience in the domain coupled with knowledge of where we'll likely be in six months.
one of the other important points about writing unit tests isn't to just to confirm the implementation but to improve upon it through the process of writing tests and discovering additional requirements and edge cases etc (tdd and all that)i suppose its possible at some point an ai could be complex enough to try out additional edge cases or confirm with a design document or something and do those parts as well... but idk its still after-the-fact testing instead of at design-time its less valuable imo...
I would not be surprised if things end up the other way around – humans doing the boring and annoying tasks that are too hard for AI, and AI doing the fun easy stuff ;-)
where are we wrt the agent surveying open issues (say, via JIRA) and evaluating which ones it would be most effective at handling, and taking them on, ideally with some check-in for conirmation?
Or, contrariwise, from having product management agents which do track and assign work?
The entire website was created by Claude Sonnet through Windsurf Cascade, but with the “Fair Witness” prompt embedded in the global rules.
If you regularly guide the LLM to “consult a user experience designer”, “adopt the multiple perspectives of a marketing agenc”, etc., it will make rather decent suggestions.
I’ve been having pretty good success with this approach, granted mostly at the scale of starting the process with “build me a small educational website to convey this concept”.
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
By that logic, literally every statement would be survivorship bias.
Everyone who has used AI coding tools interactively or as agents knows they're unpredictably hit or miss. The old, non-agent Copilot has a dashboard that shows org-wide rejection rates for for paying customers. I'm curious to learn what the equivalent rejection-rate for the agent is for the people who make the thing.
It seems places don't prioritize it, so you don't see it very often. Some developers are outright dismissive of the practice.
Unfortunately, AI won't seemingly help with that
Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html
We started with Pro+ and Enterprise first because of the higher number of premium requests included with the monthly subscription.
Whilst we've seen great results within GitHub, we know that Copilot won't get it right every time, and a higher allowance of free usage means that a user can play around and experiment, rather than running out of credits quickly and getting discouraged.
We do expect to open this up to Pro and Business subscribers - and we're also looking at how we can extend access to open source maintainers like yourself.
It is difficult to get a man to understand something when his salary depends upon his not understanding it. - Upton Sinclair
Who would've thought (except you) that this would be one of the things that AI would be especially suited for. I don't know what this progression means in the long run. Will good engineers just become 1000x more productive as they manage X number of agents building increasingly complex code (with other agents constantly testing, debugging, refactoring and documenting them) or will we just move to a world where we just have way fewer engineers because there is only a need for so much code.
Well, that's back rationalization. I saw the advances like conducting meta sentiment analysis on medical papers in the 00's. Deep learning was clearly just the beginning. [0]
> Who would've thought (except you)
You're othering me, which is rude, and you're speaking as though you speak for an entire group of people. Seems kind of arrogant.
0. (2014) https://www.ted.com/talks/jeremy_howard_the_wonderful_and_te...
My view is in between yours: A bit of column A and B in the sense both outcomes to an extent will play out. There will be less engineers but not by the factor of productivity (Jevon's paradox will play out but eventually tap out), there will be even more software especially of the low end, and the ones that are there will be expected to be smarter and work harder for the same or less pay grateful they got a job at all. There will be more "precision and rigor", more keeping up required by workers, but less reward for the workers that perform it. In a capitalist economy it won't be seen as a profession to aspire to anymore by most people.
Given most people don't live to work, and use their career to also finance and pursue other life meanings it won't be viable for most people long term especially when other careers give "more bang for buck" w.r.t effort put into them. The uncertainty in the SWE career that most I know are feeling right now means to newcomers I recommend on the balance of risk/reward its better to go another career path especially for juniors who have a longer runway. To be transparent I want to be wrong, but the risk of this is getting higher now everyday.
i.e. AI is a dream for the capital class, and IMO potentially disastrous for social mobility long term.
Even in the early days of LLM-assisted coding tools, I already know that there will be executives who would said: Let's replace our pool of expensive engineers with a less expensive license. But the only factor that led to this decision is cost comparison. Not quality, not maintenance load, and very much not customer satisfaction.
We absolutely have not reached anything resembling anyone's definition of a singularity, so you are very much still not proven correct in this. Unless there are weaker definitions of that than I realised?
I think you'll be proven wrong about the economy too, but only time will tell there.
How does this align with Microsoft's AI safety principals? What controls are in place to prevent Copilot from deciding that it could be more effective with less limitations?
That ensures that all of Copilot's code goes through our normal review process which requires a review from an independent human.
I'd like a breakdown of this phrase, how much human work vs Copilot and in what form, autocomplete vs agent. It's not specified seems more like a marketing trickery than real data
Pretty much every developer at GitHub is using Copilot in their day to work, so its influence touches virtually every code change we make ;)
Genuine question, but is CoPilot use not required at GitHub? I'm not trying to be glib or combative, just asking based on Microsoft's current product trajectory and other big companies (e.g. Shopify) forcing their devs to use AI and scoring their performance reviews based on AI use.
Copilot said: There is currently no official GitHub setting or option to remove or hide the sidebar with "Latest Changes" and similar widgets from your GitHub home page.
I'm using this an example to show that it is no longer possible to set up a GitHub account to NOT use CoPilot, even if it just lurks in the corner of every page waiting to offer a suggestion. Like many A.I. features it's there, whether you want to use it or not, without an option to disable.
So I'm suss of the "pretty much every developer" claim, no offense.
Thats a fun stat! Are humans in the #1-4 slots? Its hard to know what processes are automated (300 repos sounds like a lot of repos!).
Thank you for sharing the numbers you can. Every time a product launch is announced, I feel like its a gleeful announcement of a decrease of my usefulness. I've got imposter syndrome enough, perhaps Microsoft might want to speak to the developer community and let us know what they see happening? Right now its mostly the pink slips that are doing the speaking.
After hearing feedback from the community, we’re planning to share more on the GitHub Blog about how we’re using Copilot coding agent at GitHub. Watch this space!
Ah yes, the takeoff.
Similarly, the newest MS Word has CoPilot that you "don't have to use" but you still have to put up with the "what would you like to write today?" prompt request at the start of every document or worse "You look like you're trying to write a...formal letter...here are some suggestions."
As part of the dogfooding I could see them really pushing hard to try having agents make and merge PRs, at which point the data is tainted and you don't know if the 1,000 PRs were created or merged to meet demand or because devs genuinely found it useful and accurate.
I'm interested in the [vague] ratio of {internallyDevlopedTool} vs alternatives - essentially the "preference" score for internal tools (accounting for the natural bias towards ones own agent for testing/QA/data purposes). Any data, however vague is necessary, would be great.
(and if anybody has similar data for _any_ company developing their own agent, please shout out).
Without data, a comprehensive study and peers review, it's a hell no. Would GitHub willing to be at academic scrutiny to prove it?
This was true up around 15 years ago. Hasn't been the case since.
https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-3...
We have invested plenty of money and time into nuclear fusion with little progress. The list of key acheivments from CERN[1] is also meager in comparison to the investment put in, especially if you consider their ultimate goal to ultimately be towards applying research to more than just theory.
They just cut down their workforce, letting some of their AI people go. So, I assume there isn't that much success.
Have they tried dogfooding their dogshit little tool called Teams in the last few years? Cause if that's what their "famed" dogfooding gets us, I'm terrified to see what lays in wait with copilot.
That seemed to drop off the Github changelog after February. I’m wondering if that team got reallocated to the copilot agent.
Copilot Workspace could take a task, implement it and create a PR - but it had a linear, highly structured flow, and wasn't deeply integrated into the GitHub tools that developers already use like issues and PRs.
With Copilot coding agent, we're taking all of the great work on Copilot Workspace, and all the learnings and feedback from that project, and integrating it more deeply into GitHub and really leveraging the capabilities of 2025's models, which allow the agent to be more fluid, asynchronous and autonomous.
(Source: I'm the product lead for Copilot coding agent.)
It seems Copilot could have really owned the vibe coding space. But that didn’t happen. I wonder why? Lots of ideas gummed up in organizational inefficiencies, etc?
But the upgraded Copilot was just in response to Cursor and Winsurf.
We'll see.
Good to see an official way of doing this.
I've cancelled my copilot subscription last week and when it expires in two weeks I'll mostly likely shift to local models for autocomplete/simple stuff.
That said, months ago I did experience the kind of slow agent edit times you mentioned. I don't know where the bottleneck was, but it hasn't come back.
I'm on library WiFi right now, "vibe coding" (as much as I dislike that term) a new tool for my customers using Copilot, and it's snappy.
The claude and gemini models tend to be the slowest (yes, including flash). 4o is currently the fastest but still not great.
Cursor is quicker, I guess it's a response parsing thing - when they make the decision to show it in the UI.
Edit: From the TFA: Using the agent consumes GitHub Actions minutes and Copilot premium requests, starting from entitlements included with your plan.
[0] https://docs.github.com/en/copilot/managing-copilot/monitori...
(Source: I'm on the product team for Copilot coding agent.)
In my experience using Claude Sonnet 3.7 in GitHub Copilot extension in VSCode, the model produced hideously verbose code, completely unnecessary stuff. GPT-4.1 was a breath of fresh air.
One of the things we've done here is to treat Copilot's commits like commits from a first-time contributor to an open source project.
When Copilot pushes changes, your GitHub Actions workflows won't run by default, and you'll have to click the "Approve and run workflows" button in the merge box.
That gives you the chance to run Copilot's code before it runs in Actions and has access to your secrets.
(Source: I'm on the product team for Copilot coding agent.)
On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
Oh wow, that was great - particularly if I then look at my own body parts (like my palm) that I know are not moving, it's particularly disturbing. That's a really well done effect, I've seen something similar but nothing quite like that.
I doubt LLM's have anything like what we would conceptualize as trust. They have information, which is regurgitated because it is activated as relevant.
That being said, many humans don't really have a strong concept of information validation as part of day to day action and thinking. Development theory talks about this in terms of 'formal operational' thinking and 'personal epistemology' - basically how does thinking happen and then how is knowledge in those models conceptualized. Learning Sciences research generally talks about Piaget and formal operational before adulthood and stages of personal epistemology in higher education.
Research consistently suggests that about 50% of adults are not able to consistently operate in the formal thinking space. The behavior you are talking about is also typical of 'absolutist' epistemic perspectives where answers are right or wrong and aren't meaningfully evaluated - just identifed as relevant or not. Evaluating the credibility of information is that it comes from a source that is trusted - most often an authority figure - it is not the role of the person knowing it.
That's not hallucination. That's just an optical illusion.
Would you be able to drop me an email? My address is my HN login @github.com.
(I work on the product team for Copilot coding agent.)
Then people realized that was BS, so the marketing moved on to "This will enhance everyone's jobs, as a companion that will help everyone".
People also realized that was pure BS. A few more marketing rebrands later and we're at the current situation where we try to equate it to the lowest possible rung of employee they can think of, because surely Junior == Incompetent Idiot You Can't Trust Not To Waste Your Time†. The funny part is that they have been objectively and undeniably only getting better since the early days of the hype bubble, yet the push now is that they're "basically junior level!". Speaks volumes IMO, how those goal posts keep getting moved whenever people actually use these systems in the real work.
---
† IMO every single Junior I've ever worked with has been some of the best moments of my career. It allowed space for me to grow my own knowledge, while I get to talk to and help someone extremely passionate if a bit overeager. This stance on Juniors is, frankly, baffling to me because it's so far from my experiences with how they tend to work, oftentimes they're a million times better than those "10x rockstars" you hear about all the time.
Now Microsoft sits on a goldmine of source code and has the ability to offer AI integration even to private repositories. I can upload my code into a private repo and discuss it with an AI.
The only thing Google can counter with would be to build tools which developers install locally, but even then I guess that the integration would be limited.
And considering that Microsoft owns the "coding OS" VS Code, it makes Google look even worse. Let's see what they come up with tomorrow at Google I/O, but I doubt that it will be a serious competition for Microsoft. Maybe for OpenAI, if they're smart, but not for Microsoft.
https://developers.google.com/gemini-code-assist/docs/review...
Definitely not Google Code, but better than Cloud Source Repositories.
I'm all for new tech getting introduced and made useful, but let's make it all opt in, shall we?
That isn't running locally
AMAZING
Especially now that Copilot supports MCP I can plug in my own custom "Tools" (i.e. Function calling done by the AI Agent), and I have everything I need. Never even bothered trying Cursor or Windsurf, which i'm sure are great too, but _mainly_ since they're just forks of VSCode, as the IDE.
I'm looking forward to seeing how Agent Mode is better. Copilot has been such a great experience so far I haven't tried to keep up with every little new feature they add, and I've fallen behind.
My favorite tool that I've written is one that simply lets me specify named blocks by name, in a prompt, and AI figures out how to use the tool to read each block. A named block is defined like:
# block_begin MyBlock ...lines of code # block_end
So I can just embed those blocks around the code rather change pasting into prompts.
Wait, is this going to pollute the `gh` tool? Please tell me this isn't happening.
Sure! How can I help you?
(Source: I'm on the product team for Copilot coding agent.)
I kind of love the idea that all of this works in the familiar flow of raising an issue and having a magic coder swoop in and making a pull request.
At the same time, I have been spoiled by Cursor. I feel I would end up preferring that the magic coder is right there with me in the IDE where I can run things and make adjustments without having to do a followup request or comment on a line.
Stop fighting and sink!
But rest assured that with Github Copilot Coding Agent, your codebase will develop larger and larger volumes of new, exciting, underexplored technical debt that you can't be blamed for, and your colleagues will follow you into the murky depths soon.
You can tell because they advertise “Pro” and “Pro+” but then the FAQ reads,
> Does GitHub use Copilot Business or Enterprise data to train GitHub’s model? > No. GitHub does not use either Copilot Business or Enterprise data to train its models.
Aka, even paid individuals plans are getting brain raped
https://docs.github.com/en/copilot/managing-copilot/managing...
And this one too: https://docs.github.com/en/site-policy/privacy-policies/gith...
Copilot has been pretty useless. It couldn’t maintain context for more than two exchanges.
Copilot: here’s some C code to do that
Me: convert that to $OTHER_LANGUAGE
Copilot: what code would you like me to convert?
Me: the code you just generated
Copilot: if you can upload a file or share a link to the code, I can help you translate it …
It points me in a direction that’s a minimum of 15 degrees off true north (“true north” being the goal for which I am coding), usually closer to 90 degrees. When I ask for code, it hallucinates over half of the API calls.
Anyway, I can just use another LLM that serves me better.
I recently created an course for LinkedIn Learning using generative AI for creating SDKs[0]. When I was onsite with them to record it, I found my Github Copilot calls kept failing.. with a network error. Wha?
Turns out that LinkedIn doesn't allow people onsite to to Copilot so I had to put my Mifi in the window and connect to that to do my work. It's wild.
Btw, I love working with LinkedIn and have 15+ courses with them in the last decade. This is the only issue I've ever had.. but it was the least expected one.
0: https://www.linkedin.com/learning/build-with-ai-building-bet...
They definitely use it for full-time SWEs
Source: I work there
Our key differentiator is cross-platform support - we work with Jira, Linear, GitHub, and GitLab - rather than limiting teams to GitHub's ecosystem.
GitHub's approach is technically impressive, but our experience suggests organizations derive more value from targeted automation that integrates with existing workflows rather than requiring teams to change their processes. This is particularly relevant for regulated industries where security considerations supersede feature breadth. Not everyone can just jump off of Jira on moment's notice.
Curious about others' experiences with integrating AI into your platforms and tools. Has ecosystem lock-in affected your team's productivity or tool choices?
Automating the reputation and network of an individual person doesn't seem like a good fit for an LLM, regardless of the person. But the _decisionmaking_ capacities for a position that's largely trend-following is something that's at the very least well-supported by interacting with a well-trained model.
In my mind, though, that doesn't look like a niched service that you sell to a company. That looks like a cofounder-type for someone with an idea and a technical background. If you want to build something but need help figuring out how to market and sell it, you could do a lot worse than just chatting with Claude right now and taking much of its advice.
That might just by my own lack of bizdev expertise, though.
Then we have very different interpretations of what constitutes a medium complexity task
Microsoft, besides maybe Google and OpenAI, are the only ones that are actually exploring towards the practical usefulness of AIs. Other kiddies like Sonnet and whatnot are still chasing meaningless numbers and benchmarking scores, that sort of stuff may appeal to high school kids or immatures but burning billions of dollars and energy resources just to sound like a cool kid?
I found it very confusing - we have GH Business, with Copilot active. Could not find a way to upgrade our Copilot to the level required by the agent.
I tried using my personal Copilot for the purpose of trialing the agent - again, a no-go, as my Copilot is "managed" by the organization I'm part of.
Also, you will want to add more control over to who can assign things to Copilot Agent - just having write access to the repository is a poor descriminator, I think.
https://github.com/dotnet/runtime/pull/115733 https://github.com/dotnet/runtime/pull/115732 https://github.com/dotnet/runtime/pull/115762
A wall of noise that tells you nothing of any substance but with an authoritative tone as if what it's doing is objective and truthful - Immediately followed by:
- The 8 actual lines of code (discounting the tests & boilerplate) it wrote to actually fix the issue is being questioned by the person reviewing the code, it seems he's not convinced this is actually fixing what it should be fixing.
- Not running the "comprehensive" regression tests at all
- When they do run, they fail
- When they get "fixed" oh-so confidently, they still fail. Fifty-nine failing checks. Some of these tests take upward of an hour to run.
So the reviewer here has to read all the generated slop in the PR description and try to grok what the PR is about, read through the changes himself anyway (thankfully it's only a ~50 line diff in this situation, but imagine if this was a large refactor of some sort with a dozen files changed), and then drag it by the hand multiple times to try fix issues it itself is causing. All the while you have to tag the AI as if it's another colleague and talk to it as if it's not just going to spit out whatever inane bullshit it thinks you want to hear based on the question asked. Test failed? Well, tests fixed! (no, they weren't)
And we're supposed to be excited about having this crap thrust on us, with clueless managers being sold on this being a replacement for an actual dev? We're being told this is what peak efficiency looks like?
r0ckarong•1mo ago
postalrat•1mo ago
OutOfHere•1mo ago
namaria•1mo ago
olex•1mo ago
To me, this reads like it'll be a good junior and open up a PR with its changes, letting you (the issue author) review and merge. Of course, you can just hit "merge" without looking at the changes, but then it's kinda on you when unreviewed stuff ends up in main.
tmpz22•1mo ago
th0ma5•1mo ago
DeepYogurt•1mo ago
odiroot•1mo ago
erikerikson•1mo ago
freeone3000•1mo ago
timrogers•1mo ago
Copilot literally can't push directly to the default branch - we don't give it the ability to do that - precisely because we believe that all AI-generated code (just like human generated code) should be carefully reviewed before it goes to production.
(Source: I'm the product lead for Copilot coding agent.)