We let our Software Factory run for 15 hours autonomously. Output was 83 lines of highly optimized C++ code. 714 lines of tests. 8:1 test to code ratio. It fixed the bottleneck in a large codebase. Improved the TPC-H benchmark 2x. Verified memory leak using ASAN. Spent $160 of LLM calls.
davidbuniat•1h ago