The Kolmogorov argument gets stronger in the age of LLM assisted coding. Tokenizers chunk camelCase unpredictably: "parseHTTPSURL" might become ["parse", "HTTP", "SUR", "L"] or worse, depending on the model. Snake_case splits cleanly at underscores, preserving semantic boundaries.
This matters because every semantic signal you pass to the model improves code generation quality. When the model sees "user_authentication_token" as three distinct concepts, it reasons about them correctly. When it sees "userAuthenticationToken" as arbitrary subword chunks, you're relying on the model to reconstruct meaning from noise.
The author's point about tooling reliability applies directly: LLMs are just another tool that has to parse your identifiers. And as more of our codebases become conversations between humans and coding assistants, conventions that reduce ambiguity stop being style preferences and start being engineering decisions.
adamzwasserman•2h ago
This matters because every semantic signal you pass to the model improves code generation quality. When the model sees "user_authentication_token" as three distinct concepts, it reasons about them correctly. When it sees "userAuthenticationToken" as arbitrary subword chunks, you're relying on the model to reconstruct meaning from noise.
The author's point about tooling reliability applies directly: LLMs are just another tool that has to parse your identifiers. And as more of our codebases become conversations between humans and coding assistants, conventions that reduce ambiguity stop being style preferences and start being engineering decisions.