At first, we kind hated AI tools. We’d ask it to build something and we’d get the most generic looking nonsense that we’d ever seen. Over time we learned with everyone else about .md files, context windows, and how to basically “onboard” an agent into a project.
The benchmark tries to measure that gap directly. How agents suck until you not only tell them about your code, but also the context around what you’re building.
hank9•38m ago
At first, we kind hated AI tools. We’d ask it to build something and we’d get the most generic looking nonsense that we’d ever seen. Over time we learned with everyone else about .md files, context windows, and how to basically “onboard” an agent into a project. The benchmark tries to measure that gap directly. How agents suck until you not only tell them about your code, but also the context around what you’re building.