I developed a design called GCRI (Generalized Cognitive Refinement Iteration) to see if LLMs can reach 'System 2' reasoning without external tools or self-correction loops. By forcing agents into a structured debate, I achieved 95% on HumanEval using GPT-OSS-120B. Thought I'd share the methodology and results.
drbt•2h ago