tldr: We took our hypertuned coding agent trained it on millions of internal data engineering workflows and data, with specialized custom-built tools, and it only managed to complete 3 more tasks than Claude Code (out of 43) on a super niche domain-specific benchmark.
att126•1h ago