> The agent consists of three nested agentic loops, shown below. Each loop executes an LLM to produce one step of reasoning and a tool invocation. The tool is executed and the outputs are attached to the agent's context. The outermost agent loop is an orchestrator that repeatedly calls the two other agents, the build-fixer agent and the test-fixer agent. The build-fixer agent tries to build a particular target and makes modifications to files until the target builds successfully or the agent gives up. The test-fixer agent tries to run a particular test and makes modifications until the test succeeds or the agent gives up (and in the process, it may use the build-fixer agent to address build failures in the test).
> Despite no special prompts or other optimizations, early tests were very encouraging, successfully fixing failed tests 30% of the time. CogniPort was particularly effective for test fixes, platform-specific conditionals, and data representation fixes. We're confident that as we invest in further optimizations of this approach, we will be even more successful.
Jesus. They used gemini-flash! on a google-scale problem, and got promising early results. On real problems with real data! Granted, the problem suits itself to automated testing better than other problems (it helps having something to migrate from, you kinda know the "ground truth" or expected behaviour).
Absolutely bananas that this is possible, and with such a "cheap" model.
NitpickLawyer•1h ago
> Despite no special prompts or other optimizations, early tests were very encouraging, successfully fixing failed tests 30% of the time. CogniPort was particularly effective for test fixes, platform-specific conditionals, and data representation fixes. We're confident that as we invest in further optimizations of this approach, we will be even more successful.
Jesus. They used gemini-flash! on a google-scale problem, and got promising early results. On real problems with real data! Granted, the problem suits itself to automated testing better than other problems (it helps having something to migrate from, you kinda know the "ground truth" or expected behaviour).
Absolutely bananas that this is possible, and with such a "cheap" model.