I’m a DevOps engineer who wanted to test whether an LLM could build high-quality systems code if guided by a strong architectural mental model?
I guided Codex to build AllocDB (a "one resource, one winner" database) using TigerStyle principles—strict determinism, zero-allocation hot paths, and logical time.
Because I didn't trust the AI (or my own coding skills), I had it build a Jepsen harness using my homelab's KubeVirt infrastructure to "bully" the database into failing. After a number of iterations, it now passes a 15-scenario Jepsen matrix.
Would love to hear from the community if this is interesting to you, and if you have any suggestions for how to improve the system or the testing harness!