Show HN: Open Codex – OpenAI Codex CLI with open-source LLMs

47•codingmoh•7h ago

Hey HN,

I’ve built Open Codex, a fully local, open-source alternative to OpenAI’s Codex CLI.

My initial plan was to fork their project and extend it. I even started doing that. But it turned out their code has several leaky abstractions, which made it hard to override core behavior cleanly. Shortly after, OpenAI introduced breaking changes. Maintaining my customizations on top became increasingly difficult.

So I rewrote the whole thing from scratch using Python. My version is designed to support local LLMs.

Right now, it only works with phi-4-mini (GGUF) via lmstudio-community/Phi-4-mini-instruct-GGUF, but I plan to support more models. Everything is structured to be extendable.

At the moment I only support single-shot mode, but I intend to add interactive (chat mode), function calling, and more.

You can install it using Homebrew:

   brew tap codingmoh/open-codex
   brew install open-codex

It's also published on PyPI:

   pip install open-codex

Source: https://github.com/codingmoh/open-codex

Comments

strangescript•5h ago

curious why you went with Phi as the default models, that seems a bit unusual compared to current trends

codingmoh•5h ago

I went with Phi as the default model because, after some testing, I was honestly surprised by how high the quality was relative to its size and speed. The responses felt better in some reasoning tasks-but were running on way less hardware.

What really convinced me, though, was the focus on the kinds of tasks I actually care about: multi-step reasoning, math, structured data extraction, and code understanding.There’s a great Microsoft paper on this: "Textbooks Are All You Need" and solid follow-ups with Phi‑2 and Phi‑3.

jasonjmcghee•4h ago

agreed - thought the qwen2.5-coder was kind of standard non-reasoning small line of coding models right now

codingmoh•4h ago

I saw pretty good reasoning quality with phi-4-mini. But alright - I’ll still run some tests with qwen2.5-coder and plan to add support for it next. Would be great to compare them side by side in practical shell tasks. Thanks so much for the pointer!

siva7•5h ago

At least it can't be worse than the original codex using o4-mini.

codingmoh•4h ago

fair jab - haha; if we’re gonna go small, might as well go fully local and open. At least with phi-4-mini you don’t need an API key, and you can tweak/replace the model easily

KTibow•4h ago

Without any changes, you can already use Codex with a remote or local API by setting base URL and key environment variables.

asadm•4h ago

i think this was made before that PR was merged into codex.

KTibow•3h ago

Good correction - while the SDK used has supported changing the API through environment variables for a long time, Codex only recently added Chat Completions support recently.

kingo55•4h ago

Does it work for local though? It's my understanding this is still missing.

KTibow•3h ago

If your favorite LLM inference program can run a Chat Completions API.

codingmoh•2h ago

Thanks for bringing that up - it's exactly why I approached it this way from the start.

Technically you can use the original Codex CLI with a local LLM - if your inference provider implements the OpenAI Chat Completions API, with function calling, etc. included.

But based on what I had in mind - the idea that small models can be really useful if optimized for very specific use cases - I figured the current architecture of Codex CLI wasn't the best fit for that. So instead of forking it, I started from scratch.

Here's the rough thinking behind it:

   1. You still have to manually set up and run your own inference server (e.g., with ollama, lmstudio, vllm, etc.).
   2. You need to ensure that the model you choose works well with Codex's pre-defined prompt setup and configuration.
   3. Prompting patterns for small open-source models (like phi-4-mini) often need to be very different - they don't generalize as well.
   4. The function calling format (or structured output) might not even be supported by your local inference provider.

Codex CLI's implementation and prompts seem tailored for a specific class of hosted, large-scale models (e.g. GPT, Gemini, Grok). But if you want to get good results with small, local models, everything - prompting, reasoning chains, output structure - often needs to be different.

So I built this with a few assumptions in mind:

   - Write the tool specifically to run _locally_ out of the box, no inference API server required.
   - Use model directly (currently for phi-4-mini via llama-cpp-python).
   - Optimize the prompt and execution logic _per model_ to get the best performance.

Instead of forcing small models into a system meant for large, general-purpose APIs, I wanted to explore a local-first, model-specific alternative that's easy to install and extend — and free to run.

xyproto•4h ago

This is very convenient and nice! But I could not get it to work with the best small models available for Ollama for programming, like https://ollama.com/MFDoom/deepseek-coder-v2-tool-calling for example.

smcleod•4h ago

That's a really old model now. Even the old Qwen 2.5 coder 32b model is better than DSv2

codingmoh•4h ago

I want to add support for qwen 2.5 next

manmal•3h ago

QwQ-32 might be worth looking into also, as a high level planning tool.

codingmoh•3h ago

Thank you so much!

codingmoh•4h ago

Thanks so much!

Was the model too big to run locally?

That’s one of the reasons I went with phi-4-mini - surprisingly high quality for its size and speed. It handled multi-step reasoning, math, structured data extraction, and code pretty well, all on modest hardware. Phi-1.5 / Phi-2 (quantized versions) also run on raspberry pi as others have demonstrated.

shmoogy•3h ago

Codex merged in to allow multiple providers today - https://github.com/openai/codex/pull/247

bravura•2h ago

Sorry, does that mean I can use anthropic and gemini with codex? And switch during the session?

asadm•1h ago

yes

ai-christianson•3h ago

> So I rewrote the whole thing from scratch using Python

So this isn't really codex then?

Evertop: E-ink IBM XT clone with 100+ hours of battery life

Prolog Adventure Game

Blog hosted on a Nintendo Wii

'Immediate red flags': questions raised over 'expert' much quoted in UK press

Show HN: Dia, an open-weights TTS model for generating realistic dialogue

The Future of Compute: Nvidia's Crown Is Slipping

Cheating the Reaper in Go

Astronomers confirm the existence of a lone black hole

Launch HN: Magic Patterns (YC W23) – AI Design and Prototyping for Product Teams

Ultra-precision formation flying demonstration for space-based interferometry

A new form of verification on Bluesky

A M.2 HDMI capture card

101 Basic Computer Games

Cekura (Formerly Vocera) (YC F24) Is Hiring

Pipelining might be my favorite programming language feature

LLM-powered tools amplify developer capabilities rather than replacing them

Pydrofoil: Accelerating Sail-based instruction set simulators

Show HN: Open Codex – OpenAI Codex CLI with open-source LLMs

A new record for California's highest tree

FTC takes action against Uber for deceptive billing and cancellation practices

AI assisted search-based research works now

An update to our pricing

How I use Kate editor

Dumb statistical models, always making people look bad

Getting forked by Microsoft

Adding keyword parameters to Tcl procs

Reworking 30 lines of Linux code could cut power use by up to 30 percent

Tabular Programming: A New Paradigm for Expressive Computing

Is 1 Prime, and Does It Matter?

Out of the Fog

Evertop: E-ink IBM XT clone with 100+ hours of battery life

Prolog Adventure Game

Blog hosted on a Nintendo Wii

'Immediate red flags': questions raised over 'expert' much quoted in UK press

Show HN: Dia, an open-weights TTS model for generating realistic dialogue

The Future of Compute: Nvidia's Crown Is Slipping

Cheating the Reaper in Go

Astronomers confirm the existence of a lone black hole

Launch HN: Magic Patterns (YC W23) – AI Design and Prototyping for Product Teams

Ultra-precision formation flying demonstration for space-based interferometry

A new form of verification on Bluesky

A M.2 HDMI capture card

101 Basic Computer Games

Cekura (Formerly Vocera) (YC F24) Is Hiring

Pipelining might be my favorite programming language feature

LLM-powered tools amplify developer capabilities rather than replacing them

Pydrofoil: Accelerating Sail-based instruction set simulators

Show HN: Open Codex – OpenAI Codex CLI with open-source LLMs

A new record for California's highest tree

FTC takes action against Uber for deceptive billing and cancellation practices

AI assisted search-based research works now

An update to our pricing

How I use Kate editor

Dumb statistical models, always making people look bad

Getting forked by Microsoft

Adding keyword parameters to Tcl procs

Reworking 30 lines of Linux code could cut power use by up to 30 percent

Tabular Programming: A New Paradigm for Expressive Computing

Is 1 Prime, and Does It Matter?

Out of the Fog

Show HN: Open Codex – OpenAI Codex CLI with open-source LLMs

Comments