Hi everyone. I’ve been experimenting with local models for autonomous coding, specifically qwen2.5-coder:32b. Using standard tools like Aider directly against Ollama, the model kept suffering from severe context drift, hitting a hard ceiling of a 20% pass rate on multi-step tasks.
Instead of throwing more parameters at it (which caused OOM errors for me), I built a deterministic RAG wrapper in Java that intercepts the model's output and forces it to consult a local index before executing code. By isolating the stochastic nature of the LLM and forcing strict structural patterns, the pass rate jumped to 100% on the same test sets, and execution was roughly 4.6x faster.
I wrote down the full methodology, the benchmark details (using the Aider Polyglot suite), and some architectural notes on how the 'Index Layer' handles the self-correction loop in the article linked above. (Note: The site has an English/Spanish toggle at the top).
I'd love to hear your thoughts or if anyone has tackled local context drift in a similar way.
Beating Aider's 20% pass rate on local Qwen 32B using deterministic RAG
ebercruzdev•2h ago
Instead of throwing more parameters at it (which caused OOM errors for me), I built a deterministic RAG wrapper in Java that intercepts the model's output and forces it to consult a local index before executing code. By isolating the stochastic nature of the LLM and forcing strict structural patterns, the pass rate jumped to 100% on the same test sets, and execution was roughly 4.6x faster.
I wrote down the full methodology, the benchmark details (using the Aider Polyglot suite), and some architectural notes on how the 'Index Layer' handles the self-correction loop in the article linked above. (Note: The site has an English/Spanish toggle at the top).
I'd love to hear your thoughts or if anyone has tackled local context drift in a similar way.
Beating Aider's 20% pass rate on local Qwen 32B using deterministic RAG