But overall, looks very nice and I'm looking forward to giving it a try.
That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...
Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.
Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.
But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.
BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.
We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!
Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).
I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...
I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.
Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.
Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex
Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.
I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.
0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20
That's like calling Coca Cola a random beverage vendor
- workspace agent runner apps (like Conductor) get more and more obsolete
- "vibe working" is becoming a thing - people use folder based agents to do their work (not just coding)
- new workflows seem to be evolving into folder based workspaces, where agents can self-configure MCP servers and skills + memory files and instructions
kinda interested to see if openai has the ideas & shipping power to compete with anthropic going forward; anthropic does not only have an edge over openai because of how op their models are at coding, but also because they innovate on workflows and ai tooling standards; openai so far has only followed in adoption (mcp, skills, now codex desktop) but rarely pushed the SOTA themselves.
linux / windows requires extra testing as well as some adjustments to the software stack (e.g. liquid glass only works on mac); to get the thing out the door ASAP, they release macos first.
looks like the same framework they used to build chatgpt desktop (electron)
edit - from another comment:
> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
May give a go at this and Claude Code desktop as well, but Cursor guys are still working the hardest to keep themselves alive.
I love competition
Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.
From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.
To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.
> gh-address-comments address comments
Inspiring stuff. I would love to be the one writing GH comments here. /s
But maybe there's a complementary gh-leave-comments to have it review PRs for you too.
First phase: Plan. Mandatory to complete, as well as get AI feedback from a separate context or model. Iterate until complete.
Only then move on to the Second Phase: make edits.
Better planning == Better execution
With Codex, I increasingly can skip the plan step, and it just toils along until it has finished the issue. It can be more "lazy" at times and ask before going ahead more often, but usually in a reasonable scope (and sometimes at points where I think other services would have gone ahead on a wrong tangent and burnt more tokens of their more limited usage).
I wouldn't be surprised that with the next 1-2 model iterations a plan step won't be worth the effort anymore, given a good enough initial written issue.
Weaker models give your experience, or when using a 100% LLM codebase I think it can end up in a hall of mirrors.
Now I have an idea to try, have a 2nd LLM processing pass that normalizes the vibe-code to some personal style and standard to break it out of the Stack Overflow snippet maze it can get itself in.
One cool thing about this: upon installing it immediately found all previous projects I've used with Codex and has those projects in the sidebar with all of the "threads" (sessions) I've had with Codex on these projects!
What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.
Of those that are, most are not vibe coding, so an editor is still required at many points
Apple is great but this is OpenAI devs showing their disconnect from the mainstream. Its complacent at best, contemptuous at worst.
SamA or somebody really needs to give the product managers here a kick up the arse.
Wouldn’t native give better performance and more system integration?
I’m aware Mac OS has some isolation/sandboxes but without running codex via docker I wouldn’t be running codex.
(Appreciate there are still risks)
Is there more information about it? For how long and what are the limits?
Bunch of the features u listed were already in the codex extension too. False outrage it its finest.
From the video, I can see how this app would be useful in:
- Creating branches without having to open another terminal, or creating a new branch before the session.
- Seeing diff in the same app.
- working on multiple sessions at once without switching CLI
- I quite like the “address the comments”, I can see how this would be valuable
I will give it a try for sure
Is it in the main Codex build? There doesn’t seem to be an experiment for it.
/experimental > enable collab
It works pretty great.Codex team, I know you’re reading this. Just call them subagents. Using the word “collab” to describe working with subagents and collab_mode to describe plan mode is not helpful
Here's the Codex tech stack in case anyone was interested like me.
Framework: Electron 40.0.0
Frontend:
- React 19.2.0
- Jotai (state management)
- TanStack React Form
- Vite (bundler)
- TypeScript
Backend/Main Process:
- Node.js
- better-sqlite3 (local database)
- node-pty (terminal emulation)
- Zod (validation)
- Immer (immutable state)
Build & Dev:
- pnpm (package manager)
- Electron Forge
- Vitest (testing)
- ESLint + Prettier
Native/macOS:
- Sparkle (auto-updates)
- Squirrel (installer)
- electron-liquid-glass (macOS vibrancy effects)
- Sentry (error tracking)
So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.
But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.
They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?
Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."
And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.
Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.
Everything is Codex for code?
Codex App is something brand new and different (or evolved) from Codex CLI and it's different than Codex Cloud aka Codex Web. There is also codex for vscode.
Then we have the Codex models beside the GPT models.
Session A knocks it out of the park. Chef’s kiss.
Session B just does some random vandalism.
But you can already do that, in the terminal. Open your favourite terminal, use splits or tmux and spin up as many claude code or codex instances as you want. In parallel. I do it constantly. For all kinds of tasks, not only coding.
drcongo•1h ago
Translated from Marketingspeak, this is presumably "we're also desperate for some people to actually use it because everyone shrugged and went back to Claude Code when we released it".
embedding-shape•1h ago
sigbottle•1h ago
MattDamonSpace•1h ago
alansaber•1h ago