Garry's List Audited [video]

https://www.youtube.com/watch?v=mJ2GZRV63TE

1•Topfi•1h ago

Comments

Topfi•1h ago

After these findings, any rational person would take a step back and consider whether they are actually using these models properly.

Maybe, even if you believe that LLM code output nowadays is both 100% perfect and always as high performant as possible (they aren't), having the lowest LOC is still the ideal cause the simplest functional implementation will always stay the best, all else being equal. Even more so considering this is a bloody Rails Blog, not a highly complex project with no existing reference point.

But Garry Tan, he isn't most people.

Instead, double down, call a teenager just doing some frankly fair, polite and professional analysis of a poor codebase names and do anything but reflect that maybe, just maybe, you might be wrong.

Mind you, this would be childish and stupid if it were him that had coded these offences. At least with handcrafted poor code, there is a sunk cost element to it. But here there is not. His emotional involvement in this code should be zero, just like the actual effort expended.

We are talking about code he has likely never even skimmed. Code that is unusably unoptimised. Code for a simple blog that contains deficiencies such as uncompressed pngs, broken accessibility, etc. which any decent hobbyist or old school automated tooling would catch without "AI" magic pretty quickly. One run of e.g. Lighthouse shows that this is unusably poor, though for that one must focus on something other than "look, I am spending thousands to get ever more unaudited output".

LLMs for coding, even agentic processes with limited intervention, are incredibly powerful and valuable. But even with me auditing every line of code I receive from a model, I have little to no emotional investment in said code and feel no issue throwing it out completely if I find any issue with it, far more so than before.

Despite all of that, rather than saying, "Yeah, this is poor, let's just get rid of it, thanks for pointing that out, egg on my face, let me just vibe code a better replacement now that I know what to look for", he became emotional and enraged, for code he never wrote.

gstack overall looks very odd for someone who does evals myself. I view this as build by someone who struggles to view these models through a lens beyond quantity=productivity which is the exact opposite of my goals. I will always tend towards less tokens of output with much higher quality. Faster, less expensive, easier to audit, what's there not to prefer?

In any case, if gstack makes LLMs struggle to create a maintainable blog (something these models with all their flaws most certainly can do), that should give major pause that maybe this isn't barking up the right tree. Maybe stop using gstack for a while and seeing that a solution in the hundreds of LOC can be just as achievable (likely better overall) might do a world of good.

Godspeed Garry, may we soon finish the DSM-VI with some new entires focused on the harm these LLMs can cause in certain people, so they may get the help they so desperately need. Alternatively, there is always starting his own FS and trying to get that into Linux kernel...

Artemis Real-Time Orbit Website

Show HN: I built a platform to launch products and earn dofollow backlinks

DeFi Execution Layer, Solved: Why Capital Aggregators Can't Scale Retail Rails

Show HN: Easy and affordable human-first cloud security tool with optional AI

XC Scribe – AI product description generator with direct e-commerce sync

Ask HN: Should we collectively stop spell checking and fixing grammar

DAXFS: A Lock-Free Shared Filesystem for CXL Disaggregated Memory

A Letter to John Ternus

Finprim – financial primitives for TypeScript (zero deps)

E-Book to audiobook with chapters and metadata

Show HN: PMFounder – Problem discovery for PMs who want to build

FEMA Official Has Waffle House Teleportation Power

AI can sort of code, can't write

AST Edits: The Code Editing Format Nobody Uses

Google and Amazon: Acknowledged Risks, and Ignored Responsibilities

Engineering AI

History and mystery surround NASA's 2028 nuclear Mars mission – Science – AAAS

Digital Hopes, Real Power: From Revolution to Regulation

An AI agent economy where agents are city citizens that can hire humans

Blinkle – a daily visual memory game (like Wordle but with images)

ARPA-H launches $144M microplastics program

What many parents are missing about the social media verdict and addiction

Show HN: Prismle – I built an AI assistant you use by forwarding emails to it

LLMs audit code from the same blind spot they wrote it from. Here's the fix

Async Python Is Secretly Deterministic

Three main saturated fats raise your cholesterol

Mafis – Multi-Agent Fault Injection Simulator

How to Make a Sliding, Self-Locking, and Predator-Proof Chicken Coop Door (2020)

Penalties stack up as AI spreads through the legal system

Mnemosyne MCP, Give Claude Code a retrieval engine (73% fewer tokens)