If the agent can apply these processes to the output, then we're on our way to getting good chunk of our work done for us. Even from the product pov, if the agent is allowed to experiment by making deployments and check user-facing metrics, it eventually could build software product - but we should still solve the coding part as it seems easier to objectively verify quickly.
If the agent has fast+safe feeback loop to experiment then it can go through more cycles, faster, and improve its output.
And verification ("evaluation" we call it now) really is the key, although most people working on "AI apps" haven't figured it out yet.
Follow Hamel to catch up on the state of the art: https://x.com/HamelHusain
a3w•10h ago
Symbolic AI seems to prove everything it states, but never novel ideas, either.
Let's see if we get neurosymbolic AI that can do something both could not do on their own — I doubt it, AI might just be a doom cult after all.
tasuki•8h ago
A sufficiently rich type system (think Idris rather than C) or a sufficiently powerful test suite (eg property-based tests) should do the trick.