I repeatedly rewrite prompts, restate the same constraints, and write detailed acceptance criteria, yet still end up with broken or non-functional code.its very frustrating to say the least Yesterday alone I spent about $200 on generations that now require significant manual rewrites just to make them work.
At that point, the gains are questionable. My biggest success is having the model take over the first Design in my app and I take it from there, but those hundred lines if not thousand lines of code it generates are so Messi, it's insanely painful to refactor the mess afterwards
I understand we are all in different camps for a multitude of reasons;
- The jouissance of rote coding and abstraction
- The tree of knowledge specifically in programming, and which branches and nodes we each currently sit at in our understanding
- Technical paradigms that humans may have argued about have now shifted to obvious answers for agentic harnesses (think something like TDD, I for one barely used that as a style because I've mostly worked in startups building apps and found the cost of my labour not worth it, but agentic harnesse loops absolutely excel at it)
- The geography and size of the markets we work in
- The complexity of the subject matter / domain expertise
- The cost prohibitive nature of token based programming (not everyone can afford it, and the big fish seemingly have quite the advantage going fourth)
- Agentic coding has proven it can build UI's very easily, and depending on experience, it can build a very very many things easily. it excels in having feedback loops such as linting or simple javascript errors, which are observability problems in my opinion. Once it can do full stack observability (APM, system, network), it's ability to reason and correct problems on the fly for any complex system seems overly easy from my purvue.
- At the human nature level, some individuals prefer to think in 0's and 1's, some in words, some inbetween, and so on, what type of communication do agentic setups prefer?
With some of that above intuition that is easily up for debate, I've decided to lean 100% into agentic coding, I think it will be absolutely everywhere and obviously with humans in the loop but I don't think humans will need to review the pull requests. I am personally treating it as an existential threat to my career after having seen enough of what it's capable of. (with some imagination and a bit of a gambling spirit, as us mere mortals surely can't predict the future)
With my gambit, I'm not choosing to exit the tech scene and instead optimistically investing my mental prowess into figuring out where "humans in the loop" will be positioned. Currently I'm looking into CI level tooling, the known being code quality, and all the various forms of software testing paradigms. The emerging evals in my mind will keep evolving and beyond testing our ideas of model intelligence and chat bot responses will do a lot more.
---
A more practical rant: If you are building a recommendation engine for A and B, the engine could have X amount of modules that return a score which when all combined make up the final decision between A and B. Forgive me but let's just use dating as an example. A product manager would say we need a new module to calculate relevance between A and B based off their food preferences. An agentic harness can easily code that module and create the tests for it. The product manager could ask an LLM to make a list of 1000 reasons why two people might be suitable for dating. The agent could easily go away and code and test all those modules and probably maintain technical consistency but drift from the companies philosophical business model. I am looking into building "semantic linting" for codebases, how can the agent maintain the code so it aligns with the company's business model. And if for whatever reason those 1000 modules need to be refactored, how can the agent maintain the code so it aligns with the company's business model. Essentially trying to make a feedback loop between the companies needs and the code itself. To stop the agent and the business from drifting in either directions, and allowing for automatic feedback loops for the agent to fix them. In short, I think there will be new tools invented that us human's will be mastering as to Karpathy's point.
alexcos•2h ago
ldng•1h ago