Only a closed set of languages are supported and the hook for startup installation of additional software seems to be not fully functioning at the moment.
Interested to give this a go. But I would also need it to be able to run docker compose and playwright, to keep things on the rails.
Codex handles this much better. You choose when to make a PR and you can also just copy a .patch or git apply to your clipboard.
EDIT. They might have fixed this. Just testing. Does the mobile android app have Claude Code support yet or is it still annoyingly an iOS only thing?
EDIT2. It creates a public branch but not a PR. I'd still prefer that was a manual step.
creating container -> cloning repo -> making change -> test -> send PR
is too slow of a loop for me to do anything much useful. It's only good for trivial "one-shot" stuff.
I love the feature set of Claude Code and my entire workflow has been fine tuned around it, but i had to to codex this month. Hopefully the Claude Code team spends some time to slow down and focus on bugs.
Everything Anthropic does from an engineering standpoint is bad, they're a decent research lab and that's it.
This may be true, but then I wonder why it is still the case that no other agentic coding tool comes close to Claude Code.
Take Gemini Pro: excellent model let down by a horrible Gemini CLI. Why are the major AI companies not investing heavily in tooling? So far all the efforts I've seen from them are laughable. Every few weeks there is an announcement of a new tool, I go to try it, and soon drop it.
It seems to me that the current models are as good as they are goingto be for a long time, and a lot of the value to be had from LLMs going forward lies in the tooling
Claude is a very good model for "vibe coding" and content creation. It's got a highly collapsed distribution that causes it to produce good output with poor prompts. The problem is that collapsed distribution means it also tends to disobey more detailed prompts, and it also has a hard time with stuff that's slightly off manifold. Think of it like the car that test drives great but has no end of problems under atypical circumstances. It's also a naturally very agentic, autonomous model, so it does well in low information scenarios where it has to discover task details.
I like that Codex commits using your identity as if it was your changes. And I like that you can interact with it directly from the PR as if it was a team member.
I've been using Sonnet whenever I run into the Codex limit, and the difference is stark. Twice yesterday I had to get Codex to fix something Sonnet just got entirely wrong.
I registered a domain a year ago (pine.town) and it came up for renewal, so I figured that, instead of deleting it, I'd build something on it, and came up with the idea of an infinite collaborative pixel canvas with a "cozy town" vibe. I have ZERO experience with frontend, yet Codex just built me the entire damn thing over two days of coding:
It's the first model I can work with and be reasonably assured that the code won't go off the rails. I keep adding and adding code, and it hasn't become a mess of spaghetti yet. That having been said, I did catch Codex writing some backend code that could have been a few lines simpler, so I'm sure it's not as good as me at the stuff I know.
Then again, I wouldn't even have started this without Codex, so here we are.
I do look at the backend code it writes, and it seems moderately sane. Sometimes it overcomplicates things, which makes me think that there are a few dragons in the frontend (I haven't looked), but by and large it's been ok.
Oh.
Not good enough for you?
If I skip 5 Pro but still have a large task, I have Codex write a spec file to use as a task list and to review for completeness as it works.
This is how you can use Codex without a plan mode.
I have similar workflow as parent, GPT 5 Pro for aiding with specifications and deep troubleshooting, rely on Codex to ground it in my actual code and project, and to execute the changes.
Yes Codex is still very early. We use it because it's the best model. The client experience will only get better from here. I noticed they onboarded a bunch of devs to the Codex project in GitHub around the time of 5's release.
That hasn't been my experience at all, neither first with the Codex UI since it was available to Pro users, nor since the CLI was available and I first started using that. GPT 5 Pro will (can, to be precise) only read what you give it, Codex goes out searching for what it needs, almost always.
I wonder how much of it comes down to how models "train us" to work in ways they are most effective.
Sonnet is much less successful.
It's really easy to steer both Claude Code and Codex against that though, plop "Don't do any other changes than the ones requested" in the system prompt/AGENTS.md and they mostly do good with that.
I've tried the same with Gemini CLI and Gemini seems to mostly ignore the overall guidelines you setup for it, not sure why it's so much worse at that.
After all these years, maybe even decades, of seeing your blog posts and projects on here, surely you must have had more experience with frontend than ZERO since you first appeared here? :)
That specific part doesn't have anything to do with Claude Web though, does it? When I use Codex and Claude they repeatedly look up stuff in the local git history when working on things I've mentioned I've worked on a branch or similar. As long as you make any sort of mention that you've used git, directly or indirectly, they'll go looking for it, is my feeling.
I'd like to build an integration with Whisper Memos (https://whispermemos.com/)
Then I'd be able to dictate a note on my Apple Watch such as:
> Go into repository X and look at the screen Y, and fix bug Z.
That'd be so cool.
- Time to start your container (or past project) is ~1 sec to 1 min. - Fully supported NixOS container with isolated, cloned agent layer. Most tools available locally to cut download times and ai web access risk. - Github connections are persistent. Agents do a reasonable job with clean local commits. - Very fast dev loops (plan/build/test/architect/fix/test/document/git commit / push to user layer) with adjustable user involvement. - Phone app is fully featured... I've never built apps on roadtrips before replit. - Uses claude code currently (has used chatgpt in the past).
Tips: - Consider tig to help manage git from cli before you push to github. - Gitlab can be connected but is clumsy with occasional server state refreshes. - Startups that haven't committed to an IDE yet and expect compatibility with NixOS would have strong reason to consider this. It should save them the need to build their own OS-local AI code through early builds.
Is the 1.5 years that I have left worth it? (I already have an Associate's Degree).
jaffa2•3h ago
embedding-shape•3h ago
jaffa2•1h ago