Maincoder-1B: a 1B-parameter transformer model trained from scratch for code generation. Hitting 76% on HumanEval, it is the best result we’re aware of among open models in this size range.
The motivation is simple. Strong coding models don’t have to be large. With better data processing across pre-, mid-, and post-training, plus RL-based post-training, small models can deliver surprisingly strong functional correctness while staying cheap and fast.
Why we care about small models:
Low latency / low cost → usable for interactive tools and large batch jobs
Local & on-device inference → privacy-sensitive and offline workflows
Many fast rollouts → program synthesis with search, verification loops, RL environments, fine tuning to personal preferences
Composable systems → cascades, speculative decoding, tool-use agents
Maincoder-1B performs best on small, self-contained tasks (2048k context), and it’s not meant for security- or safety-critical code without human review. It’s designed for practical deployment and should quantize well.
Weights are released under Apache 2.0.
armeet•3h ago
When you step back, it's kind of absurd that a 1B model can achieve this eval score. Makes me very excited about the future of on-device inference.
necovek•1h ago
It kind of is expected, right? If a 70B model can have great overall performance, a 1B model focused on coding and a single language could even be comparable.
I am actually hoping we see more per language models soon, though obviously, it can be as "smart" if trained only on a single language.
MainNews•4h ago
The motivation is simple. Strong coding models don’t have to be large. With better data processing across pre-, mid-, and post-training, plus RL-based post-training, small models can deliver surprisingly strong functional correctness while staying cheap and fast.
Why we care about small models: Low latency / low cost → usable for interactive tools and large batch jobs Local & on-device inference → privacy-sensitive and offline workflows Many fast rollouts → program synthesis with search, verification loops, RL environments, fine tuning to personal preferences Composable systems → cascades, speculative decoding, tool-use agents
Maincoder-1B performs best on small, self-contained tasks (2048k context), and it’s not meant for security- or safety-critical code without human review. It’s designed for practical deployment and should quantize well.
Weights are released under Apache 2.0.