"Usability declines in inverse proportion to the number of vice-presidents who sign the release notes." Law of Interface Inversion
Liking kiro a lot these days
Or is it more of a vibe code thing where every new feature from everyone is recreated by every other company in a matter of days?
Do they even realize they are destroying their own industry economics? The only reason anyone uses big tech is because there are no alternatives
No special environment instructions required.
One nice perk on the ChatGPT Team and Enterprise plans is that Codex environments can be shared, so my work setting this up saved my coworkers a bunch of time. I pretty much just showed how it worked to my buddy and he got going instantly
It's interesting that most people seem to prefer local code, I love that it allows me to code from my mobile phone while on the road.
Tapping the prompts in is the easy part, but async model is different to work with, I feel more like a manager, not a co-developer.
Sometimes I just realize that CC going nuts and stop it before it goes too far (and consume too much). With this async setup, you may come after a couple of hours and see utter madness(and millions of tokens burned).
A tight feedback loop is best for me. The opposite of these async models. At least for now.
There used to be this thesis in software of [Cathedral vs Bazaar](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar), the modern version of it is you either 1) build your own cathedral, and you bring the user to your house. It is a more controlled environment, deployment is easier, but also the upside is more limited and also shows the model can't perform out-of-distribution. OpenAI has taken this approach for all of its agentic offering, whether ChatGPT Agent or Codex.
2) the alternative is Bazaar, where you bring the agent to the user, and let it interact with 1000 different apps/things/variables in their environment. It is 100x more difficult to pull this off, and you need better model that are more adaptable. But payoff is higher. The issues that you raised (env setup/config/etc) are temporary and fixable.
cathedral = sandbox env in the provider's cloud, so [codex](https://chatgpt.com/codex) uses this model. Their codex-cli product is the Bazaar model, where you run in your computer, in your own environment.
Claude Code, on the other hand, doesn't have the cloud-based sandboxing product, you have to run in on your computer, so the bazaar model. You can also run in in a way that anthropic never envisioned (e.g. give it control to your house). Curser also follows the same model, albeit they have been trying to get into the cathedral model by using the background agent (as someone also pointed out below). Presumably not to lose the market share to codex/jules/etc.
Can deploy as a github action right now.
Tag it in any new issue, pr, etc.
Future history will highlight Claude Code as the first true form agent. These other analogies are not intuitive enough for the evolution of an os-native agent into eventual ai robotics.
-----
> The software essay contrasts two different free software development models:
> The cathedral model, in which source code is available with each software release, but code developed between releases is restricted to an exclusive group of software developers. GNU Emacs and GCC were presented as examples.
> The bazaar model, in which the code is developed over the Internet in view of the public. Raymond credits Linus Torvalds, leader of the Linux kernel project, as the inventor of this process. Raymond also provides anecdotal accounts of his own implementation of this model for the Fetchmail project
-----
Source: Wikipedia
If you're a software developer and especially if you're doing open source, CATB is still worth a read today. It's free on the author's website: http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral...
From the introduction:
>No quiet, reverent cathedral-building here—rather, the Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches (aptly symbolized by the Linux archive sites, who'd take submissions from anyone) out of which a coherent and stable system could seemingly emerge only by a succession of miracles.
> The fact that this bazaar style seemed to work, and work well, came as a distinct shock. As I learned my way around, I worked hard not just at individual projects, but also at trying to understand why the Linux world not only didn't fly apart in confusion but seemed to go from strength to strength at a speed barely imaginable to cathedral-builders.
It then goes on to analyze why this worked at all, and if the successful bazaar-style model can be replicated (it can).
But pushing this existing process - which was designed for limited participation of scarce people - onto a use-case of managing a potentially huge reservoir of agent suggestions is going to get brittle quickly. Basically more suggestions require a more streamlined and scriptable review workflow.
Which is why I think working in the command line with your agents - similar to Claude and Aider - is going to be where human maintainers can most leverage the deep scalability of async and parallel agents.
> is way better than having to set up git worktrees or any other type of sandbox yourself
I've built up a helper library that does this for you for either aider or claude here: https://github.com/sutt/agro. And for FOSS purposes, I want to prevent MS, OpenAI, etc from controlling the means of production for software where you need to use their infra for sandboxing your dev environment.
And I've been writing about how to use CLI tricks to review the outputs on some case studies as well: https://github.com/sutt/agro/blob/master/docs/case-studies/i...
OK found a good page for the plans here… ymmv if you're not logged in:
Had to implies that you pointed other models at the task and they failed, or that grok is your go-to model for this.
Can you explain?
Edit: it seems this is a hosted version. Would be nice if they actually joined up some of their products.
(Or a deprecated code fine tuned model)
You give it a task and it produces a PR. While gemini-cli is more like pair programming with AI.
We already see bots that monitor repos to bump versions. I suspect we will see this expand to handle larger version bumps, minor issues, minor features. Basically junior dev learning tasks.
I stopped worrying about what techbro's think a long time ago. I saw one slinging a blockchain ai nft filesystem that will ingest and organize your documents for you on twitter yesterday.
If a company prefers small teams right now, at the cost of not having juniors to grow into seniors in the future, they are well within their rights to make that decision.
Might be an awful decision, might be a smart one, in any case there is no “we” here.
Even for the next 5 years I'd like to be able to have some capable humans in my teams.
Part of living in a society is considering the social impact of things. Such as the erosion of training opportunities for young talent.
Each business can make their own decisions, but someone should be thinking about the greater good. “Within your rights” doesn’t mean it’s a good thing, nor should that be the sole standard we set for members of our society. Same reason people hire interns and write technical blogs, open source code and sponsor school hackathons. Sometimes the greater good should be a consideration.
I’m sorry but almost nobody does this for the greater good
I think the main reason I'm not personally excited about AI is that... no, I don't, actually.
I'm in my late 40s. I have had many opportunities to move into management. I haven't because while I enjoy working with others, I derive the most satisfaction from feeling like I'm getting my hands dirty and doing work myself.
Spending the entire day doing code reviews of my army of minions might be strictly more productive, but it's not a job I would enjoy having. I have never, for a second, felt some sort of ambitious impulse to move up the org chart and become some sort of executive giving marching orders.
The world that AI boosters are driving towards seems to me to be one where the only human jobs left are effectively middle management where the leaf nodes of the org chart are all machines. It may the case that such a world has greater net productivity and the stock prices will go up.
But it's not a world that feels meaningful, dignified, or desirable to me.
At the same time, I've realized that "let me just try to squeeze out the last of my career" is a really unhealthy mindset for me to hold. It sort of locks me into a feeling like my best days are behind me or something.
So I am trying to dabble in using AI for coding and trying to make sure I stay open-minded and open to learning new things. I don't want to feel like a dinosaur.
There are many perspectives on coding agents because there are many different types of engineers, with different levels of experience.
In my interactions I've found that junior engineers overestimate or overuse the capabilities of these agents, while more senior engineers are better calibrated.
The biggest challenge I see is what to do in 5 years once a generation of fresh engineers never learned how compilers, operating systems, hardware, memory, etc actually work. Innovation almost always requires deep understanding of the fundamentals, and AI may erode our interest in learning these critical bits of knowledge.
What I see as a hiring manager is senior (perhaps older) engineers commanding higher comp, while junior engineers become increasingly less in demand.
Agents are here to stay, but I'd estimate your best engineering days are still ahead.
now I'm liberated to do all the crap I don't like and never code. fuck off
How can you do good design work if the only "people" who have experience with what you're designing are the AI agents you order around? I guess if you're designing an API that you only intend to be used by other AI agents, that's probably fine.
At some point, though, it's gotta feel like working at a pet food company coming up with new cat food recipes. You can be very successful by focus testing on cats, but you'll never really know the thing you're making. (No judgement if you do want to eat cat food, I guess.)
At that level, you almost never get to be hands-on with code; the closest you get is code reviews. Instead you "deliver value" through identifying large-scale opportunities, proposing projects for them, writing design and architecture docs, and conducting "alignment meetings" where you convince peers and other teams to build the parts needed to achieve your vision. The actual coding grunt work is done by a bunch of other, typically more junior engineers.
That is also the role that often gets derided as "architecture astronauts." But it is still an extremely technical role! You need to understand all the high-level quirks of the underlying systems (and their owners!) to ensure they can deliver what you envision. But your primary skills become communication and people skills. When I was in that role, I liked to joke that my favorite IDEs are "IntelliJ, Google Docs, and other engineers."
You'll note that is a very different role from management, where your primary responsibilities are more people-management and owning increasingly large divisions of the business. As a senior engineer you're still a leaf node in the org-chart, but as a manager you have a sub-tree that you are trying to grow. That is where org-chart climbing (and uncharitably, "empire-building") become the primary skillset.
As such, the current Coding Agent paradigm seems very well-suited for senior engineers. A lot of the skillsets are the same, only instead of having to persuade other teams you just write a deisgn doc and fire off a bunch of agents, review their work, and if you don't like their outputs, you can try again or drop down to manual coding.
Currently, I'm still at the "pair-program with AI" stage, but I think I'll enjoy having agents. These days I find that coding is just a means to an end that is personally more satisfying: solving problems.
I have tried this a few times, it's not there yet. The failures are consistently-shaped enough to make we wonder about the whole LLM approach.
Compared to handing off to other engineers there are a few problems:
- other engineers learn the codebase much better over time, vs relying on either a third party figuring out the right magic sauce to make it understand/memoize/context-ize your codebase or a bunch of obnoxious prompt engineering
- other engineers take feedback and don't make the same types of mistakes over and over. I've had limited luck with things like "rules" for more complex types of screwups - e.g. "don't hack a solution for one particular edge case three-levels deep in a six-level call tree, find a better abstraction to hoist out the issue and leave the codebase better than you found it"
- while LLMs are great at writing exhaustive coverage tests of simple functionality, they aren't users of the project and generally struggle to really get into that mindset to anticipate cross-cutting interactions that need to be tested; instead you get a bunch of local maxima "this set of hacks passes all the current testing" candidate solutions
- the "review" process starts to become silly and demoralizing when your coworker is debating with you about code neither of you wrote in a PR (I sure hope you're still requiring a second set of human eyes on things, anyway!)
If you have a huge backlog of trivial simple small-context bugs, go nuts! It'll help you blow through that faster! But be prepared to do a lot of QA ;)
Generally I'd call most of the issues "context rot" in that even after all the RL that's been done on these things to deal better with out-of-distribution scenarios, they still struggle with the huge amount of external context that is necessary for good engineering decision making in a large established codebase. And throwing more snippets, more tickets, more previous PRs, etc, at it seems to rapidly hit a point of diminishing returns as far as its "judgement" in picking and following the right bits from that pile at the right time.
It's like being a senior engineer with a team of interns but who aren't progressing so you're stuck as a senior engineer cleaning up crappy PRs constantly without being able to grow into the role of an architect who's mentored and is guiding a bunch of other staff and senior engineers who themselves are doing more of the nitty gritty.
Maybe the models get better, maybe they don't. But for now, I find it's best to go for the escape hatch quickly once things start going sideways. Because me getting better at using today's models won't cause any generational leap forward. That feels like it will only come from lower level model advances, and so I don't want to get better at herding interns-who-can't-learn-from-me. Better for now to stick to mentoring the other people instead.
1. Those that are motivated by "building things". The actual programming is just a means to an end.
2. Those that are motivated by the salary alone and either hate the work or are indifferent to it.
3. Those that are motivated by the art of programming itself. Hands on keyboard, thinking through a problem and solving it with code.
Developers that fall into category 1 and 2 love AI. Its basically a dream come true for them ("I knocked out 3 sides projects in a month" for #1 and "You're telling me that all I have to do is supervise the AI and I still get paid?" for #2).
Its basically a living nightmare for developers in category 3.
I've noticed that founders seem to be way higher on AI than non-founders. I think a lot of founders fit into category 1.
But, yes, I think that's a good breakdown of where most of the reward from coding comes from.
Not saying that I trust LLMs more…
I still end up ahead.
My experience using these is that it makes more time to reverse engineer the bloat they spill out than to write the thing myself.
God help you if you attempt to teach them anything, they will say "You're absolutely right!" and then continue churning out the same broken code.
Then you have to restart with a "fresh" context and give them the requirements from scratch and hope that this time they come up with something better.
The former CTO of stripe, for one.
They show you the code they produce. Why wouldn’t you trust it after reading it?
Where can I check them out?
To communicate with the Jules team join https://discord.gg/googlelabs
Are you on Google Pro or using it free?
Also, I've found that even with 60, over an entire full day/night of using it for different things, I never went over 10 tasks and didn't feel like I was losing anything. To be clear, I've used this every weekend for months and I mean that I've never gone over 10 on any one day, not overall.
15 should be plenty, especially if you aren't paying for it. I will likely never use 100 even on my busiest of weekends
Our asynchronous coding agent can run Docker in its GitHub Actions-powered development environment - for example it could start a Dockerized web server.
You can learn more about the agent at https://docs.github.com/en/copilot/concepts/coding-agent/cod....
There are both obvious annoying UI bugs (which should be easy to fix unless they vibe coded the whole thing) and the output of the tool isn't very good for anything but the simplest problems.
If the model was really good, I'd love this, but it's not.
Might be worth trying again now:
"Jules now uses the advanced thinking capabilities of Gemini 2.5 Pro to develop coding plans, resulting in higher-quality code outputs"
Looking at "Google AI Ultra" it looks like I get this Jules thing, Gemini App, Notebook, etc. But if I want Gemini CLI, then I've got to go through the GCP hellscape of trying to create subscriptions, billing accounts then buying Google Code Assist or something, but then I can't get the Gemini app.
Then of course, this Google AI gives me YouTube Premium for some reason (no idea how that's related to anything).
One of the common tests I've seen for the Google models specifically is understanding of YT videos: Summarization, transcription, diarization, etc. One of their APIs allows you to provide a YT video ID rather than making you responsible for downloading the content yourself.
"Can't share the subscription because the other person in your family is in another country."
Okay guess I'll change countr- "No you can't change your Google Workspace account's country."
Yes, of course, the individual employees know. But the decision making for these kinds of things is usually a full-time middle manager, who isn't deciding on behalf of Google as a whole, but on behalf of their organization within Google (could be 50 people, could be 2000). It's not just _not_ that manager's job to make the globally optimal decision for Google, it's actually likely often in direct conflict with their job, which is basically "set the priorities of your org such that they launch things that make your boss look good to his boss". Spending headcount on maintaining niche stuff is usually not that (and takes resources away from whatever is).
The integration between Google products is definitely one of the things that keeps me with them.
I've seen more than a few companies that are no better at their core service than the giants.
Small companies I've been working in are sometimes even worse.
Web app which takes 10+ sec to fully load? That's ok, focus on the new features!
IMO, Google is still the best option. I like GMail (their spam filtering is nearly flawless), Google Drive and Docs is the right mix of working and complexity, Google Photos integrates well with what I use, etc.
It's basically Google One with the tradeoff of my rough edges in exchange for hosting my own final.
I've occasionally looked at switching away (to proton, MSFT, etc) and the most likely switch would be to a personal Google account.
I'm arguably "in too deep" because it'll soon be 20 years that I'm a google customer for that domain. At the time they were definitely the best option (I used to even self-host DNS for my domain on my home desktop).
In the corporate world, having faced a mix of options over my career, I still prefer the google stack (with the exception of google chat which I last used 3 years ago and wow was that bad at the time).
Trying to get that stuff resolved was such a pain that I eventually had to ask a friend who knew someone that worked at Google for assistance. Their support team had absolutely no public contact info available. I eventually managed to get my data and migrate the services I actually use (Google Fi and Youtube) to a non-workspace account.
The funny thing is that a few months later they tried to send a $60 bill to collections because they reopened the account for 2 days for me to migrate things off. I was originally going to pay it to just get them off my back, but Google's own collections agency wouldn't let me pay through card or check or anything. The only way I could pay was to "Log into your Google Workspace account" which NO LONGER EXISTED.
Now it's just an amusing story about incompetence to look back on, but at the time it was stressful because I almost lost my domains, cell phone number, and email addresses all at once. Now I never trust anything to a single company.
Their reputation precedes them.
The dev focused products are a sideshow amongst different fiefdoms at google. They will never get first billing or focus.
When you enter Google Cloud you end up in a shroud of darkness in which admins can’t see the projects created by users.
I’m the admin for our Google Workspace, I can’t even tell if we have access to Google AI studio or not, their tutorials are complete bullshit, the docs are just plain wrong because they reference to things that are not reflected in the platform.
I had to switch back to English because their automated translations are so awful, didn’t they really think to at least let one person review once each document before releasing it to the public?!
It’s a 450 billion dollars company and they can’t realize that they added so many layers of confusion that 99% of their users won’t need. The biggest issue is that they won’t solve this anytime soon, they dug themselves into a limitless pit.
Typically applied to mobile phone plans but applicable to many other markets.
Thats my hope anyway
From a professional context for example, we are using in my company both Google Workspaces and GCP.
With Google Workspaces, we have including in our subscription Gemini, Veo 3, Jules, etc. Everything is included in a subscription models, rate-limited but unlimited. The main entrypoint is gemini.google.com
However, everytime we need to use the API, then we need to use GCP. It gives us access to some more advances models like Veo3 instead of Veo3-fast, and more features. It's unlimited usage, but pay-as-you-go. The main entrypoint is GCP Vertex AI
And both teams, Google Workspaces and GCP are quite separate. They often don't know really well what the others teams provides.
OT question about Google Workspaces: What's the difference between My Drive, Shared Drives, and "Workspaces"? When would I want to use each in a team setup?
My drive are files you created / uploaded (and thus count against your space quota) and shared is where things go that have been shared with you by others, or public drive links that you've visited.
Workspace is a shared space private to the company/organization workspace group.
In the (latest, of three different) Go SDK, you can use either Vertex AI or Gemini. But not all features exist in either. Gemini can use uploaded files as attachments, and Vertex AI can use RAG stores, for example. Gemini uses API key based authentication, while Vertex AI uses the traditional credentials. All in the same SDK.
It's a mess.
Happens with their other products as well, eg their meet / chat / voice / hangouts products.
One can make an argument that other Gemini stuff shouldn't be in there because it's not dev related, but Jules at least should
Its sandbox is very limited and prevents proper grounding IMO. However, if their sandbox works for your project, it will be alright.
That said, Gemini is very powerful for it's quality long-context capabilities: https://www.reddit.com/r/ClaudeAI/comments/1miweuv/comment/n...
Do you let juniors complete full features without asking questions or make them check in when they get flustered?
In contrast, Claude Code seems to interpret my prompts better and helps me ship real product features for users.
Maybe it’s a system prompt issue. Its likely my prompting causing the problem. But Claude Code seems to understand my intent better.
I think the software is now a very important part of the training process. Which is why I think frontier labs are only capable of shipping "actual" agents.
Anthropic has figured something out here that others have not.
I’m impressed by Gemini Pro 2.5’s NLP capabilities. I use that model in production on several projects. My comments are directed only at Gemini CLI. Which FWIW is better than OpenAI Codex CLI, but much worse (for me) than Claude Code.
Even with Pro, the strict token limits combined with the model's tendency to add unrequested modifications means I run out of tokens before completing my intended tasks. Others have the same issue https://github.com/google-gemini/gemini-cli/issues/4300
But I sometimes reach for it for code review in particular since it calls out to o3 via its “oracle” tool
I'm building integrations for both Claude Code and AMP! AMP also provides really important features of a harness that others haven't quite caught up on. OpenCode, sort of, but that is driven in a bit of a cultish open source way.
I have found it is not very good when trying to make new projects with different react libraries, inside of existing projects (for instance, my admin UI that I had it place inside of my existing server project).
If you start noticing it change directories and move around and delete/move directories a lot, you should stop the process, reconsider what you're telling it to do and how, then start from scratch with a new task.
I have very limited spare time these days, but sometimes on my walk to work I can think of an idea/feature, plan out what I want it to do (and sometimes use the github app to revise the existing code), then send out a few jobs. By the time I get home in the evening I've got a few PRs to review. Most of the code is useless to me, but it usually runs, and means I can jump straight into testing out the idea before going back and writing it properly myself.
Next step is to add automatic builds to each PR, so that on the way home I can just check out the different branches on my phone instead of waiting to be home to run the ios simulator :D
It might be worth trying again.
"Jules now uses the advanced thinking capabilities of Gemini 2.5 Pro to develop coding plans, resulting in higher-quality code outputs"
It tries to be funny and authentic, but the cheap looking mascot and low contrast text makes it feel like IBM pretending to be vibecoded startup.
Google has/had a distinct branding with its austere and no-nonsense style in the past, then moved into a clunky-but-not-AWS design aesthetic with GCP (which is still recognizable), and now the AI products just look so completely inconsistent, you can’t even tell they’re from Google
- https://blog.langchain.com/introducing-open-swe-an-open-sour...
- https://github.com/newsroom/press-releases/coding-agent-for-...
At a cursory glance, it did a great job. It failed the first time. I gave it the error message and it fixed it. I was shocked it ran after that. Not bad for the free plan.
The worst part is that there is no "STOP" button to quickly get it out of the loop it's stuck in.
"Write a basic raytracer in Rust that will help me learn the fundamentals of raytracing and 3D graphics programming."
Last time it apparently had working or atleast compiling code, but it refused to push the changes to the branch so I could actually look at it. I asked it, cajoled it, guilted it, threatened it, it just would not push the damn code. So i have no idea if it worked.
This time it wrote some code but wrote 2 main.rs files in separate directories. It split the code randomly across 2 directories and then gets very confused about why it doesn't run right. I explained the problem and it got very lost running around the whole filesystem trying to run the program or cargo in random directories, then gave up.
Not sure why folks continue to zero shot things.
It does what it wants, often just “finishes” a task preemptively and asking follow ups does nothing besides it rambling for a bit, the env sometimes doesn’t persist and stuff just stops working. For a while it just failed completely instantly because the model was dead or something
Out of the dozen times I tried it, I think I merged maybe one of its PRs. The rest I trashed and reassigned to a different agent.
My ranking
- Claude Code (through gh action), no surprise there
- ChatGPT Codex
- GitHub Copilot Agent
- Jules
I will try it again today to see if the full release changed anything (they give 3 months free trial for previous testers) but if it’s the same, I wouldn’t use or pay for Jules. Just use Codex or GitHub agent. Sorry for the harsh words
oblio•16h ago
felipemesquita•16h ago
span_•16h ago
joshdick•16h ago
loloquwowndueo•15h ago
longtimelistnr•15h ago
loloquwowndueo•14h ago
nathan_douglas•14h ago
mattnewton•15h ago
not the pair-programming-on-your-machine space I would put the cli tools in
throwup238•15h ago
rmonvfer•15h ago
hiatus•13h ago
esafak•15h ago