frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Open Codex – OpenAI Codex CLI with open-source LLMs

https://github.com/codingmoh/open-codex
106•codingmoh•9mo ago
Hey HN,

I’ve built Open Codex, a fully local, open-source alternative to OpenAI’s Codex CLI.

My initial plan was to fork their project and extend it. I even started doing that. But it turned out their code has several leaky abstractions, which made it hard to override core behavior cleanly. Shortly after, OpenAI introduced breaking changes. Maintaining my customizations on top became increasingly difficult.

So I rewrote the whole thing from scratch using Python. My version is designed to support local LLMs.

Right now, it only works with phi-4-mini (GGUF) via lmstudio-community/Phi-4-mini-instruct-GGUF, but I plan to support more models. Everything is structured to be extendable.

At the moment I only support single-shot mode, but I intend to add interactive (chat mode), function calling, and more.

You can install it using Homebrew:

   brew tap codingmoh/open-codex
   brew install open-codex

It's also published on PyPI:

   pip install open-codex

Source: https://github.com/codingmoh/open-codex

Comments

strangescript•9mo ago
curious why you went with Phi as the default models, that seems a bit unusual compared to current trends
codingmoh•9mo ago
I went with Phi as the default model because, after some testing, I was honestly surprised by how high the quality was relative to its size and speed. The responses felt better in some reasoning tasks-but were running on way less hardware.

What really convinced me, though, was the focus on the kinds of tasks I actually care about: multi-step reasoning, math, structured data extraction, and code understanding.There’s a great Microsoft paper on this: "Textbooks Are All You Need" and solid follow-ups with Phi‑2 and Phi‑3.

jasonjmcghee•9mo ago
agreed - thought the qwen2.5-coder was kind of standard non-reasoning small line of coding models right now
codingmoh•9mo ago
I saw pretty good reasoning quality with phi-4-mini. But alright - I’ll still run some tests with qwen2.5-coder and plan to add support for it next. Would be great to compare them side by side in practical shell tasks. Thanks so much for the pointer!
siva7•9mo ago
At least it can't be worse than the original codex using o4-mini.
codingmoh•9mo ago
fair jab - haha; if we’re gonna go small, might as well go fully local and open. At least with phi-4-mini you don’t need an API key, and you can tweak/replace the model easily
KTibow•9mo ago
Without any changes, you can already use Codex with a remote or local API by setting base URL and key environment variables.
asadm•9mo ago
i think this was made before that PR was merged into codex.
KTibow•9mo ago
Good correction - while the SDK used has supported changing the API through environment variables for a long time, Codex only recently added Chat Completions support recently.
xiphias2•9mo ago
Maybe it was part of the reason that they accepted the PR. The fork would happen anyways if they don't allow any LLM.

A bit like how Android came after iPhone with open source implementation.

kingo55•9mo ago
Does it work for local though? It's my understanding this is still missing.
KTibow•9mo ago
If your favorite LLM inference program can run a Chat Completions API.
codingmoh•9mo ago
Thanks for bringing that up - it's exactly why I approached it this way from the start.

Technically you can use the original Codex CLI with a local LLM - if your inference provider implements the OpenAI Chat Completions API, with function calling, etc. included.

But based on what I had in mind - the idea that small models can be really useful if optimized for very specific use cases - I figured the current architecture of Codex CLI wasn't the best fit for that. So instead of forking it, I started from scratch.

Here's the rough thinking behind it:

   1. You still have to manually set up and run your own inference server (e.g., with ollama, lmstudio, vllm, etc.).
   2. You need to ensure that the model you choose works well with Codex's pre-defined prompt setup and configuration.
   3. Prompting patterns for small open-source models (like phi-4-mini) often need to be very different - they don't generalize as well.
   4. The function calling format (or structured output) might not even be supported by your local inference provider.
Codex CLI's implementation and prompts seem tailored for a specific class of hosted, large-scale models (e.g. GPT, Gemini, Grok). But if you want to get good results with small, local models, everything - prompting, reasoning chains, output structure - often needs to be different.

So I built this with a few assumptions in mind:

   - Write the tool specifically to run _locally_ out of the box, no inference API server required.
   - Use model directly (currently for phi-4-mini via llama-cpp-python).
   - Optimize the prompt and execution logic _per model_ to get the best performance.
Instead of forcing small models into a system meant for large, general-purpose APIs, I wanted to explore a local-first, model-specific alternative that's easy to install and extend — and free to run.
xyproto•9mo ago
This is very convenient and nice! But I could not get it to work with the best small models available for Ollama for programming, like https://ollama.com/MFDoom/deepseek-coder-v2-tool-calling for example.
smcleod•9mo ago
That's a really old model now. Even the old Qwen 2.5 coder 32b model is better than DSv2
codingmoh•9mo ago
I want to add support for qwen 2.5 next
manmal•9mo ago
QwQ-32 might be worth looking into also, as a high level planning tool.
codingmoh•9mo ago
Thank you so much!
smcleod•9mo ago
Hopefully Qwen 3 and maybe if we're lucky Qwen 3 Coder might be out this week too.
smcleod•9mo ago
Also GLM 4 is pretty amazing - https://www.reddit.com/r/LocalLLaMA/comments/1k4w9p2/i_uploa...
codingmoh•9mo ago
Thanks, I'll have a look
codingmoh•9mo ago
Thanks so much!

Was the model too big to run locally?

That’s one of the reasons I went with phi-4-mini - surprisingly high quality for its size and speed. It handled multi-step reasoning, math, structured data extraction, and code pretty well, all on modest hardware. Phi-1.5 / Phi-2 (quantized versions) also run on raspberry pi as others have demonstrated.

xyproto•9mo ago
The models work fine with "ollama run" locally.

When trying out "phi4" locally with:

open-codex --provider ollama --full-auto --project-doc README.md --model phi4:latest

I get this error:

      OpenAI rejected the request. Error details: Status: 400, Code: unknown, Type: api_error, Message: 400
    registry.ollama.ai/library/phi4:latest does not support tools. Please verify your settings and try again.
shmoogy•9mo ago
Codex merged in to allow multiple providers today - https://github.com/openai/codex/pull/247
bravura•9mo ago
Sorry, does that mean I can use anthropic and gemini with codex? And switch during the session?
asadm•9mo ago
yes
ai-christianson•9mo ago
> So I rewrote the whole thing from scratch using Python

So this isn't really codex then?

user_4028b09•9mo ago
Great work making Codex easily accessible with open-source LLMs – really excited to try it!
vincent0405•9mo ago
Cool project! It's awesome to see someone taking on the challenge of a fully local Codex alternative.
underlines•9mo ago
Don't forget https://ollama.com/library/deepcoder which ranks really well for its size
submeta•9mo ago
Sounds great! Although I would prefer Claude Code to be open sourced as it’s a tool that works best for Vibe coding. Albeit expensive using Anthropic‘s models via API. There is an inofficial clone though („Anon Kode“), but it’s not legitimate.
Philpax•9mo ago
I believe anon-kode is a decompiled Claude Code, so it should work identically when paired with Claude.
submeta•9mo ago
Unfortunately it does not. Where I can feed Claude Code with a file larger than 256k, Anon Code (like Roo) will complain that the file is too large, using Gemini 2.5 Pro.
fcap•9mo ago
Why forking and use open codex when the original OpenAI opened it for multiple models? Just trying to understand.
codingmoh•9mo ago
Hey, that is a very good question, I have answered that before. I hope you don't mind, if I simply copy paste my previous answer:

Technically you can use the original Codex CLI with a local LLM - if your inference provider implements the OpenAI Chat Completions API, with function calling, etc. included.

But based on what I had in mind - the idea that small models can be really useful if optimized for very specific use cases - I figured the current architecture of Codex CLI wasn't the best fit for that. So instead of forking it, I started from scratch.

Here's the rough thinking behind it:

   1. You still have to manually set up and run your own inference server (e.g., with ollama, lmstudio, vllm, etc.).
   2. You need to ensure that the model you choose works well with Codex's pre-defined prompt setup and configuration.
   3. Prompting patterns for small open-source models (like phi-4-mini) often need to be very different - they don't generalize as well.
   4. The function calling format (or structured output) might not even be supported by your local inference provider.
Codex CLI's implementation and prompts seem tailored for a specific class of hosted, large-scale models (e.g. GPT, Gemini, Grok). But if you want to get good results with small, local models, everything - prompting, reasoning chains, output structure - often needs to be different. So I built this with a few assumptions in mind:

   - Write the tool specifically to run _locally_ out of the box, no inference API server required.
   - Use model directly (currently for phi-4-mini via llama-cpp-python).
   - Optimize the prompt and execution logic _per model_ to get the best performance.
Instead of forcing small models into a system meant for large, general-purpose APIs, I wanted to explore a local-first, model-specific alternative that's easy to install and extend — and free to run.
danielktdoranie•9mo ago
but what about the Codex Giggas my niggas?
jpmonette•9mo ago
You can also do the same with OpenAI Codex (with Ollama for example):

1. ~ codex --provider ollama 2. Run: /model 3. Pick your model 4. Profit!

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
539•klaussilveira•9h ago•150 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
865•xnx•15h ago•525 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
73•matheusalmeida•1d ago•15 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
185•isitcontent•10h ago•21 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
186•dmpetrov•10h ago•82 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
296•vecti•12h ago•132 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
72•quibono•4d ago•15 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
346•aktau•16h ago•168 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
341•ostacke•15h ago•90 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
437•todsacerdoti•17h ago•226 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
8•videotopia•3d ago•0 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
240•eljojo•12h ago•147 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
4•helloplanets•4d ago•0 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
15•romes•4d ago•2 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
43•kmm•4d ago•3 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
378•lstoll•16h ago•253 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
222•i5heu•12h ago•166 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
94•SerCe•5h ago•77 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
62•phreda4•9h ago•11 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
162•limoce•3d ago•82 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
128•vmatsiiako•14h ago•55 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
38•gfortaine•7h ago•11 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
6•neogoose•2h ago•2 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
261•surprisetalk•3d ago•35 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
18•gmays•5h ago•2 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1030•cdrnsf•19h ago•428 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
55•rescrv•17h ago•19 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
84•antves•1d ago•60 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
19•denysonique•6h ago•2 comments