It uses schema grounding (graph-based), multi-model SQL generation, execution sandboxing, and semantic validation instead of relying on prompting alone.
Evaluated honestly on Spider: • ~55% accuracy on single DB • ~34% cross-DB zero-shot
No APIs, no fine-tuning, everything runs locally.
Would appreciate feedback on the architecture and evaluation.