This paper introduces BRAID (Bounded Reasoning for Autonomous Inference and Decisions), a structured prompting framework that replaces free-form chain-of-thought with bounded, symbolic reasoning encoded as Mermaid flowcharts.
We evaluate BRAID across GSM-Hard, SCALE MultiChallenge, and AdvancedIF.
Key findings:
- Structured symbolic reasoning improves accuracy on complex tasks
- Smaller models often match or outperform larger models using classic prompting
- Significant cost reductions (up to 74× performance-per-dollar)
- Even SOTA models see accuracy gains when pure performance is the goal
dashersw•1h ago
We evaluate BRAID across GSM-Hard, SCALE MultiChallenge, and AdvancedIF.
Key findings:
- Structured symbolic reasoning improves accuracy on complex tasks
- Smaller models often match or outperform larger models using classic prompting
- Significant cost reductions (up to 74× performance-per-dollar)
- Even SOTA models see accuracy gains when pure performance is the goal
All benchmarks and detailed logs are public: https://benchmark.openserv.ai
Happy to discuss methodology, evaluation choices, limitations, or failure cases.