I have a pretty complex project, so I need to keep an eye on it to ensure it doesn't go off the rails and delete all the code to get a build to pass (it wouldn't be the first time).
In fact you should convert your code to spaces at least before LLM sees it. It’ll improve your results by looking more like its training data.
I don't like that at all. Actually running the code is the single most effective protection we have against coding mistakes, from both humans and machines.
I think it's absolutely worth the complexity and performance overhead of hooking up a real container environment.
Not to mention you can run a useful code execution container in 100MB of RAM on a single CPU (or slice thereof). Simulating that with an LLM takes at least one GPU and 100GB or more of VRAM.
But when I installed Codex and tried to make a simple code bugfix, I got rate limited nearly immediately. As in, after 3 "steps" the agent took.
Are you meant to only use Codex with their $200 "unlimited" plans? Thanks!
I was tempted to give Codex a try but a colleague was stung by their pricing. Apparently if you go over your Pro plan allocation, they just quietly and automatically start billing you per-token?
https://cdn.openai.com/pdf/97cc5669-7a25-4e63-b15f-5fd5bdc4d...
bayesianbot•58m ago
EnPissant•54m ago
- The smartest model I have used. Solves problems better than Opus-4.1.
- It can be lazy. With Claude Code / Opus, once given a problem, it will generally work until completion. Codex will often perform only the first few steps and then ask if I want to continue to do the rest. It does this even if I tell it to not stop until completion.
- I have seen severe degradation near max context. For example, I have seen it just repeat the next steps every time I tell it to continue and I have to manually compact.
I'm not sure if the problems are Gpt-5 or Codex. I suspect a better Codex could resolve them.
brookst•46m ago
Very frustrating, and happening more often.
elliot07•46m ago
conception•38m ago
M4v3R•45m ago
EnPissant•42m ago
Jcampuzano2•37m ago
But they have suffered quite a lot of degradation and quality issues recently.
To be honest unless Anthropic does something very impactful sometime soon I think they're losing their moat they had with developers as more and more jump to codex and other tools. They kind of massively threw their lead imo.
EnPissant•34m ago
darkteflon•4m ago
tanvach•45m ago
bayesianbot•42m ago
mritchie712•40m ago
mmaunder•33m ago
mmaunder•36m ago
Jcampuzano2•34m ago
Everyone else slowly caught up and/or surpassed them while they simultaneously had quality control issues and service degradation plaguing their system - ALL while having the most expensive models comparatively in terms of intelligence.
mmaunder•26m ago
bjackman•8m ago
My experience after a month or so of heavy use is exactly this. The AI is rock solid. I'm pretty consistently impressed with its ability to derive insights from the code, when it works. But the client is flaky, the backend is flaky, and the overall experience for me is always "I wish I could just use Claude".
Say 1 in 10 queries craps out (often the client OOMs even though I have 192Gb of RAM). Sounds like a 10% reliability issue but actually it just pushes me into "fuck this I'll just do it myself" so it knocks out like 50% of the value of the product.
(Still, I wouldn't be surprised if this can be fixed over the next few months, it could easily be very competitive IMO).
FergusArgyll•19m ago
ollybee•13m ago
gizmodo59•10m ago
Tiberium•7m ago