After these findings, any rational person would take a step back and consider whether they are actually using these models properly.
Maybe, even if you believe that LLM code output nowadays is both 100% perfect and always as high performant as possible (they aren't), having the lowest LOC is still the ideal cause the simplest functional implementation will always stay the best, all else being equal. Even more so considering this is a bloody Rails Blog, not a highly complex project with no existing reference point.
But Garry Tan, he isn't most people.
Instead, double down, call a teenager just doing some frankly fair, polite and professional analysis of a poor codebase names and do anything but reflect that maybe, just maybe, you might be wrong.
Mind you, this would be childish and stupid if it were him that had coded these offences. At least with handcrafted poor code, there is a sunk cost element to it. But here there is not. His emotional involvement in this code should be zero, just like the actual effort expended.
We are talking about code he has likely never even skimmed. Code that is unusably unoptimised. Code for a simple blog that contains deficiencies such as uncompressed pngs, broken accessibility, etc. which any decent hobbyist or old school automated tooling would catch without "AI" magic pretty quickly. One run of e.g. Lighthouse shows that this is unusably poor, though for that one must focus on something other than "look, I am spending thousands to get ever more unaudited output".
LLMs for coding, even agentic processes with limited intervention, are incredibly powerful and valuable. But even with me auditing every line of code I receive from a model, I have little to no emotional investment in said code and feel no issue throwing it out completely if I find any issue with it, far more so than before.
Despite all of that, rather than saying, "Yeah, this is poor, let's just get rid of it, thanks for pointing that out, egg on my face, let me just vibe code a better replacement now that I know what to look for", he became emotional and enraged, for code he never wrote.
gstack overall looks very odd for someone who does evals myself. I view this as build by someone who struggles to view these models through a lens beyond quantity=productivity which is the exact opposite of my goals. I will always tend towards less tokens of output with much higher quality. Faster, less expensive, easier to audit, what's there not to prefer?
In any case, if gstack makes LLMs struggle to create a maintainable blog (something these models with all their flaws most certainly can do), that should give major pause that maybe this isn't barking up the right tree. Maybe stop using gstack for a while and seeing that a solution in the hundreds of LOC can be just as achievable (likely better overall) might do a world of good.
Godspeed Garry, may we soon finish the DSM-VI with some new entires focused on the harm these LLMs can cause in certain people, so they may get the help they so desperately need. Alternatively, there is always starting his own FS and trying to get that into Linux kernel...
Topfi•1h ago
Maybe, even if you believe that LLM code output nowadays is both 100% perfect and always as high performant as possible (they aren't), having the lowest LOC is still the ideal cause the simplest functional implementation will always stay the best, all else being equal. Even more so considering this is a bloody Rails Blog, not a highly complex project with no existing reference point.
But Garry Tan, he isn't most people.
Instead, double down, call a teenager just doing some frankly fair, polite and professional analysis of a poor codebase names and do anything but reflect that maybe, just maybe, you might be wrong.
Mind you, this would be childish and stupid if it were him that had coded these offences. At least with handcrafted poor code, there is a sunk cost element to it. But here there is not. His emotional involvement in this code should be zero, just like the actual effort expended.
We are talking about code he has likely never even skimmed. Code that is unusably unoptimised. Code for a simple blog that contains deficiencies such as uncompressed pngs, broken accessibility, etc. which any decent hobbyist or old school automated tooling would catch without "AI" magic pretty quickly. One run of e.g. Lighthouse shows that this is unusably poor, though for that one must focus on something other than "look, I am spending thousands to get ever more unaudited output".
LLMs for coding, even agentic processes with limited intervention, are incredibly powerful and valuable. But even with me auditing every line of code I receive from a model, I have little to no emotional investment in said code and feel no issue throwing it out completely if I find any issue with it, far more so than before.
Despite all of that, rather than saying, "Yeah, this is poor, let's just get rid of it, thanks for pointing that out, egg on my face, let me just vibe code a better replacement now that I know what to look for", he became emotional and enraged, for code he never wrote.
gstack overall looks very odd for someone who does evals myself. I view this as build by someone who struggles to view these models through a lens beyond quantity=productivity which is the exact opposite of my goals. I will always tend towards less tokens of output with much higher quality. Faster, less expensive, easier to audit, what's there not to prefer?
In any case, if gstack makes LLMs struggle to create a maintainable blog (something these models with all their flaws most certainly can do), that should give major pause that maybe this isn't barking up the right tree. Maybe stop using gstack for a while and seeing that a solution in the hundreds of LOC can be just as achievable (likely better overall) might do a world of good.
Godspeed Garry, may we soon finish the DSM-VI with some new entires focused on the harm these LLMs can cause in certain people, so they may get the help they so desperately need. Alternatively, there is always starting his own FS and trying to get that into Linux kernel...