Using LLMs at Oxide

https://rfd.shared.oxide.computer/rfd/0576

711•steveklabnik•2mo ago

Comments

thatxliner•2mo ago

The empathy section is quite interesting

monkaiju•2mo ago

Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.

gghffguhvc•2mo ago

For the same quality and quantity output, if the cost of using LLMs + the cost of careful oversight is less than the cost of not using LLMs then the rational choice is to use them.

Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.

zihotki•2mo ago

And it doesn't factor seniority/experience. What's good for a senior developer is not necessarily same for a beginner

ahepp•2mo ago

It seems like this would be a really interesting field to research. Does AI assisted coding result in fewer bugs, or more bugs, vs an unassisted human?

I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".

My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.

sunshowers•2mo ago

I'm not sure about research, but I've used LLMs for a few things here at Oxide with (what I hope is) appropriate judgment.

I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.

I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.

AlexCoventry•2mo ago

Maybe it's not as necessary with a codebase as well-organized as Oxide's, but I found gemini 3 useful for a refactor of some completely test-free ML research code, recently. I got it to generate a test case which would exercise all the code subject to refactoring, got it to do the refactoring and verify that it leads to exactly the same state, then finally got it to randomize the test inputs and keep repeating the comparison.

Yeask•2mo ago

This companies have trillions and they are not doing that research. Why?

ahepp•2mo ago

I don't know. I guess the flip side applies too? Lots of people arguing either side, when it feels like it shouldn't be that difficult to provide some objective data.

devmor•2mo ago

The ultimate conclusion seems to be one that leaves it to personal responsibility - the user of the LLM is responsible for ensuring the LLM has done its job correctly. While this is the ethical conclusion to me, but the “gap” left to personal responsibility is so large that it makes me question how useful everything else in this document really is.

I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.

rgoulter•2mo ago

What do you find confusing about the document encouraging use of LLMs?

The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".

The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.

Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.

mathgeek•2mo ago

Junior engineers are the usual comparison folks make to LLMs, which is apt as juniors need lots of oversight.

ares623•2mo ago

I would think some of their engineers love using LLMs, it would be unfair to them to completely disallow it IMO (even as someone who hates LLMs)

sudomateo•2mo ago

Medication is littered with warning labels but humans still use it to combat illness. Social media can harm mental health yet people still use it. Pick whatever other example you'd like.

There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.

saagarjha•2mo ago

There’s a lot of code that doesn’t hit prod.

thundergolfer•2mo ago

A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:

> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.

I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.

Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.

The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

pests•2mo ago

> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.

aatd86•2mo ago

The issue is that it might look good but an LLM often inserts weird mistakes. Or ellipses. Or overindex on the training data. If someone is not careful it is easy to completely wreck the codebase by piling on seemingly innocuous commits. So far I have developed a good sense for when I need to push the llm to avoid sloppy code. It is all in the details.

But a junior engineer would never find/anticipate those issues.

I am a bit concerned. Because the kind of software I am making, a llm would never prompt on its own. A junior cannot make it, it requires research and programming experience that they do not have. But I know that if I were a junior today, I would probably try to use llms as much as possible and would probably know less programming over time.

So it seems to me that we are likely to have worse software over time. Perhaps a boon for senior engineers but how do we train junior devs in that environment? Force them to build slowly, without llms? Is it aligned with business incentives?

Do we create APIs expecting the code to be generated by LLMs or written by hand? Because the impact of verbosity is not necessarily the same. LLMs don't get tired as fast as humans.

AlexCoventry•2mo ago

> So it seems to me that we are likely to have worse software over time.

IMO, it's already happening. I had to change some personal information on a bunch of online services recently, and two out of seven of them were down. One of them is still down, a week later. This is the website of a major utilities company. When I call them, they acknowledge that it's down, but say my timing is just bad. That combined with all the recent outages has left me with the impression that software has been getting (even more) unreliable, recently.

agentultra•2mo ago

They are trained on code people had to make sacrifices for: deadlines, shortcuts, etc. And code people were simply too ignorant to be writing in the first place. Lots of code with hardly any coding standards.

So of course it’s going to generate code that has non-obvious bugs in it.

Ever play the Undefined Behaviour Game? Humans are bad at being compilers and catching mistakes.

I’d hoped… maybe still do, that the future of programming isn’t a shrug and, “good enough.” I hope we’ll keep developing languages and tools that let us better specify programs and optimize them.

ambicapter•2mo ago

If it's such a mind numbing problem it's easy to check it though, and the checking you do after the LLM will be much smaller than you writing every field (implicitly "checking" it when you write it).

Obviously if it's anything even minorly complex you can't trust the LLM hasn't found a new way to fool you.

pests•2mo ago

This is exactly it. There wasn't any complex logic. Just making sure the right fields were mapped, some renaming, and sometimes some more complex joins depending on the incoming data source and how it was represented (say multiple duplicate rows or a single field with comma delimited id's from somewhere else). I would have much rather scanned the LLM output line by line (and most would be simple, not very indented) then hand writing from scratch. I do admit it would take some time to review and cross reference, but I have no doubt it would have been a fraction of the time and effort.

aatd86•2mo ago

True. The counterpoint being that back in the days, they could have decided to write a parser if the data was structured and they would have then learnt things that they will never learn by relying on AI.

For a junior in the learning phase that can be useful time spent. Then again, I agree that at times certain menial code tasks are not worth doing and llms are helpful.

It's a bit like a kid not spending time memorizing their time tables since they can use a calculator. They are less likely to become a great mathematician.

zackerydev•2mo ago

I remember in the very first class I ever took on Web Design the teacher spent an entire semester teaching "first principles" of HTML, CSS and JavaScript by writing it in Notepad.

It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.

girvo•2mo ago

I miss Dreamweaver. Combining it with Fireworks was a crazy productive combo for me back in the mid 00’s!

My first PHP scripts and games were written using nothing more than Notepad too funnily enough

panzi•2mo ago

Back in the early 00s I brought gvim.exe on a floppy disk to school because I refused to write XSLT, HTML, CSS, etc without auto-indent or syntax highlighting.

frankest•2mo ago

DreamWeaver absolutely destroyed the code with all kinds of tags and unnecessary stuff. Especially if you used the visual editor. It was fun for brainstorming but plain notepad with clean understandable code was far far better (and with the browser compatibility issues the only option if you were going to production).

christophilus•2mo ago

After 25 or so years doing this, I think there are two kinds of developers: craftsmen and practical “does it get the job done” types. I’m the former. The latter seem to be what makes the world go round.

fragmede•2mo ago

It takes both.

ghurtado•2mo ago

If you've been doing it for that long (about as long as I have), then surely you remember all the times you had to clean up after the "git 'er done" types.

I'm not saying they don't have their place, but without us they would still be making the world go round. Only backwards.

thebruce87m•2mo ago

> all the times you had to clean up after the "git 'er done" types

It’s lovely to have the time to do that. This time comes once the other type of engineer has shipped the product and turned the money flow on. Both types have their place.

bigfatkitten•2mo ago

I work in digital forensics and incident response. The “git ‘er done” software engineers have paid my mortgage and are putting my kids through private schooling.

ambicapter•2mo ago

Well, going round in a circle does project to going forwards then backwards in a line :)

KronisLV•2mo ago

I think there's more dimensions that also matter a bunch:

  * a bad craftsman will get pedantic about the wrong things (e.g. SOLID/DRY as dogma) and will create architectures that will make development velocity plummet ("clever" code, deep inheritance chains, "magic" code with lots of reflection etc.)
  * a bad practician will not care about long term maintainability either, or even correctness enough not to introduce a bunch of bad bugs or slop, even worse when they're subtle enough to ship but mess up your schema or something

So you can have both good and bad outcomes with either, just for slightly different reasons (caring about the wrong stuff vs not caring).

I think the sweet spot is to strive for code that is easy to read and understand, easy to change, and easy to eventually replace or throw out. Obviously performant enough but yadda yadda premature optimization, depends on the domain and so on...

frankest•2mo ago

After becoming a founder and having to deal with my own code for a decade, I’ve learned a balance. Prototype fast with AI crap to get the insight than write slow with structure for stuff that goes to production. AI does not touch production code - ask when needed to fix a tiny bit, but keep the beast at arms distance.

tarsinge•2mo ago

I am both, I own a small agency when I have to be practical, and have fun crafting code on the hobby side.

I think what craftsmen miss is the different goals. Projects fall on a spectrum from long lived app that constantly evolve with a huge team working on it to not opened again after release. In the latter, like movie or music production (or most video games), only the end result matters, the how is not part of the final product. Working for years with designers and artists really gave me perspective on process vs end result and what matter.

That doesn’t mean the end result is messy or doesn’t have craftsmanship. Like if you call a general contractor or carpenter for a specific stuff, you care that the end result is well made, but if they tell you that they built a whole factory for your little custom made project (the equivalent of a nice codebase), not only it doesn’t matter for you but it’ll be wildly overpriced and delayed. In my agency that means the website is good looking and bug free after being built, no matter how messy is the temporary construction site.

In contrast if you work on a SaaS or a long lived project (e.g. an OS) the factory (the code) is the product.

So to me when people say they are into code craftsmanship I think they mean in reality they are more interested in factory building than end product crafting.

jfreds•2mo ago

I agree wholeheartedly. As for the why do craftsmen care so much about the factory instead of the product, I believe the answer is pride. It’s a bitter pill to swallow, but writing and shipping a hack is sometimes the high road

arevno•2mo ago

I also do third party software development, and my approach is always: bill (highly, $300+/hr) for the features and requirements, but do the manual refactoring and architecture/performance/detail work on your own time. It benefits you, it benefits the client, it benefits the relationship, and it handles the misunderstanding of your normie clients with regard to what constitutes "working".

Say it takes 2 hours to implement a feature, and another hour making it logically/architecturally correct. You bill $600 and eat $200 for goodwill and your own personal/organizational development. You're still making $200/hr and you never find yourself in meetings with normie clients about why refactoring, cohesiveness, or quality was necessary.

chrisweekly•2mo ago

The HTML generated by Dreamweaver's WYSIWYG mode might not have been ideal, but it was far superior to the mess produced by MS Front Page. With Dreamweave, it was at least possible to use it as a starting point.

BobbyTables2•2mo ago

MS FrontPage also went out of its way to do the same.

pram•2mo ago

It’s funny this came up, because it was kinda similar to the whole “AI frauds” thing these days.

I don’t particularly remember why, but “hand writing” fancy HTML and CSS used to be a flex in some circles in the 90s. A bunch of junk and stuff like fixed positioning in the source was the telltale sign they “cheated” with FrontPage or Dreamweaver lol

supriyo-biswas•2mo ago

My only gripe was that they tended to generate gobs of “unsemantic” HTML. You resized a table and expect it to be based on viewport width? No! It’s hardcoded “width: X px” to whatever your size the viewport was set to.

_joel•2mo ago

It might have been pretty horrible but I hold Frontpage 97 with fond memories, it started my IT career, although not for HTML reasons.

The _vti_cnf dir left /etc/passwd downloadable, so I grabbed it from my school website. One Jack the Ripper later and the password was found.

I told the teacher resposible for the IT it was insecure and that ended up getting me some work experience. Ended up working the summer (waiting for my GCSE results) for ICL which immeasurably helped me when it was time to properly start working.

Did think about defacing, often wonder that things could have turned out very much differently!

msephton•2mo ago

Judicious and careful use of Dreamweaver (its visual editor and properties bar) enabled me to write exactly the code I wanted. I used Dreamweaver foot table layouts and Home Site (later Top Style) for broader code edits. At that time I was famous with the company for being able to make any layout. Good times!

ghurtado•2mo ago

> glory that was Adobe Dreamweaver

Dreamweaver was to web development what ...

I just sat here for 5 minutes and I wasn't able to finish that sentence. So I think that's a statement in itself.

riffraff•2mo ago

..VB6 was to windows dev?

People with very little competence could and did get things done, but it was a mess underneath.

zackerydev•2mo ago

That was my point! Dreamweaver to web dev felt like what LLM's are to many disciplines!

pjmlp•2mo ago

I love how people speak about Dreamweaver in the past, while Adobe keeps getting money for it,

https://developer.adobe.com/dreamweaver/

And yes, as you can imagine for the kind of comments I do regarding high level productive tooling and languages, I was a big Dreamwever fan back in the 2000's.

keyle•2mo ago

> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.

This gives me somewhat of a knee jerk reaction.

When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.

There was this long argument that you should know things and not have to look it up all the time. Altavista was a joke, and Google was cheating.

Then syntax highlighting came around and there'd always be a guy going "yeah nah, you shouldn't need syntax highlighting to program, you screen looks like a Christmas tree".

Then we got stuff like auto-complete, and it was amazing, the amount of keystrokes we saved. That too, was seen as heresy by the purists (followed later by LSP - which many today call heresy).

That reminds me also, back in the day, people would have entire Encyclopaedia on DVDs collections. Did they use it? No. But they criticised Wikipedia for being inferior. Look at today, though.

Same thing with LLMs. Whether you use them as a powerful context based auto-complete, as a research tool faster than wikipedia and google, as rubber-duck debugger, or as a text generator -- who cares: this is today, stop talking like a fossil.

It's 2025 and junior developers can't work without LSP and LLM? It's fine. They're not in front of a 386 DX33 with 1 book of K&R C and a blue EDIT screen. They have massive challenged ahead of them, the IT world is complete shambles, and it's impossible to decipher how anything is made, even open source.

Today is today. Use all the tools at hand. Don't shame kids for using the best tools.

We should be talking about sustainability of such tools rather than what it means to use them (cf. enshittification, open source models etc.)

sifar•2mo ago

It is not clear though, which tools enable and which tools inhibit your development at the beginning of your journey.

keyle•2mo ago

Agreed, although LLMs definitely qualify as enabling developers compared to <social media, Steam, consoles, and other distractions> of today.

The Internet itself is full of distractions. My younger self spent a crazy amount of time on IRC. So it's not different than spending time on say, Discord today.

LLMs have pretty much a direct relationship with Google. The quality of the response has much to do with the quality of the prompt. If anything, it's the overwhelming nature of LLMs that might be the problem. Back in the day, if you had, say a library access, the problem was knowing what to look for. Discoverability with LLMs is exponential.

As for LLM as auto-complete, there is an argument to be made that typing a lot reinforces knowledge in the human brain like writing. This is getting lost, but with productivity gains.

girvo•2mo ago

Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.

Tools like Claude code with ask/plan mode seem to be better in my experience, though I absolutely do wonder about the lack of typing causing a lack of memory formation

A rule I set myself a long time ago was to never copy paste code from stack overflow or similar websites. I always typed it out again. Slower, but I swear it built the comprehension I have today.

zx8080•2mo ago

> but I swear it built the comprehension I have today.

For interns/junior engineers, the choice is: comprehension VS career.

And I won't be surprised if most of them will go with career now, and comprehension.. well thanks maybe tomorrow (or never).

christophilus•2mo ago

I don’t think that’s the dichotomy. I’ve been in charge of hiring at a few companies, and comprehension is what I look for 10 times out of 10.

sysguest•2mo ago

well you could get "interview-optimized" interviewees with impressive-looking mini-projects

xorcist•2mo ago

There are plenty of companies today where "not using AI enough" is a career problem.

It shouldn't be, but it is.

sevensor•2mo ago

I have worked with a lot of junior engineers, and I’ll take comprehension any day. Developing their comprehension is a huge part of my responsibility to them and to the company. It’s pretty wasteful to take a human being with a functioning brain and ask them to churn out half understood code that works accidentally. I’m going to have to fix that eventually anyway, so why not get ahead of it and have them understand it so they can fix it instead of me?

keyle•2mo ago

> Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.

That's not an LLM problem, they'd do the same thing 10 years ago with stack overflow: argue about which answer is best, or trust the answer blindly.

girvo•2mo ago

No, it is qualitatively different because it happens in-line and much faster. If it’s not correct (which it seems it usually isn’t), they spend more time removing whatever garbage it autocompleted.

menaerus•2mo ago

People do it with the autocomplete as well so I guess there's not that much of a difference wrt LLMs. It likely depends on the language but people who are inexperienced in C++ would be over-relying on autocomplete to the point that it looks hilarious, if you have a chance to sit next to them helping to debug something for example.

girvo•2mo ago

For sure, but these new tools spit out a lot more and a lot faster, and it’s usually correct “enough” that the compiler won’t yell. It’s been wild to see its suggestions be wrong far more often than they are right, so I wonder how useful they really are at all.

Normal auto complete plus a code tool like Claude Code or similar seem far more useful to me.

sevensor•2mo ago

> never copy paste code from stack overflow

I have the same policy. I do the same thing for example code in the official documentation. I also put in a comment linking to the source if I end up using it. For me, it’s like the RFD says, it’s about taking responsibility for your output. Whether you originated it or not, you’re the reason it’s in the codebase now.

zdragnar•2mo ago

I spent the first two years or so of my coding career writing PHP in notepad++ and only after that switched to an IDE. I rarely needed to consult the documentation on most of the weird quirks of the language because I'd memorized them.

Nowadays I'm back to a text editor rather than an IDE, though fortunately one with much more creature comforts than n++ at least.

I'm glad I went down that path, though I can't say I'd really recommend as things felt a bit simpler back then.

intended•2mo ago

LLMs are in a context where they are the promised solution for most of the expected economic growth on one end, a tool to improve programmer productivity and skill while also being only better than doom scrolling?

Thats comparison undermines the integrity of the argument you are trying to make.

aprilthird2021•2mo ago

> When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.

But I mean, you can get by without memorizing stuff sure, but memorizing stuff does work out your brain and does help out in the long run? Isn't it possible we've reached the cliff of "helpful" tools to the point we are atrophying enough to be worse at our jobs?

Like, reading is surely better for the brain than watching TV. But constant cable TV wasn't enough to ruin our brains. What if we've got to the point it finally is enough?

darkwater•2mo ago

I'm sure I'm biased by my age (mid 40s) but I think you are onto something there. What if this constant decline in how people learn (on average) is not just a grumpy old man feeling? What if it's something real, that was smoothened out by the sheer increase of the student population between 1960 and 2010 and the improvements of tooling?

Barrin92•2mo ago

>"in my days, we had books and we remembered things" which of course is hilarious

it isn't hilarious, it's true. My father (now in his 60s) who came from a blue collar background with very little education taught himself programming by manually copying and editing software out of magazines, like a lot of people his age.

I teach students now who have access to all the information in the world but a lot of them are quite literally so scatterbrained and heedless anything that isn't catered to them they can't process. Not having working focus and memory is like having muscle atrophy of the mind, you just turn into a vegetable. Professors across disciplines have seen decline in student abilities, and for several decades now, not just due to LLMs.

menaerus•2mo ago

Information 30 years ago was more difficult to obtain. It required manual labor but in todays' context there was not much information to be consumed. Today, we have the opposite - a huge vast of information that is easy to obtain but to process? Not so much. Decline is unavoidable. Human intelligence isn't increasing at the pace advancements are made.

pjmlp•2mo ago

Ah, but lets do leetcode on the whiteboard as interview, for an re-balancing a red-black tree, regardless of how long those people have been in the industry and the job position they are actually applying for.

discreteevent•2mo ago

> "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer

Reading books was never about knowledge. It was about knowhow. You didn't need to read all the books. Just some. I don't know how many developers I met who would keep asking questions that would be obvious to anyone who had read the book. They never got the big picture and just wasted everyone's time, including their own.

"To know everything, you must first know one thing."

uhfraid•2mo ago

Which books? Did they not read them?

dachris•2mo ago

For the other non-native speakers wondering, "fly" means your trouser zipper.

He surely has his fly closed when cutting through the hype with reflection and pragmatism (without the extreme positions on both sides often seen).

vaylian•2mo ago

I was also confused when I read that sentence. Wikipedia has an article on it: https://en.wikipedia.org/wiki/Fly_(clothing)

govping•2mo ago

Interesting tension between craft and speed with LLMs. I've been building with AI assistance for the past week (terminal clients, automation infrastructure) and found the key is: use AI for scaffolding and boilerplate, but hand-refine anything customer-facing or complex. The 'intellectual fly open' problem is real when you just ship AI output directly. But AI + human refinement can actually enable better craft by handling the tedious parts. Not either/or, but knowing which parts deserve human attention vs which can be delegated.

dicytea•2mo ago

It's funny that I've seen people both argue that LLMs are exclusively useful only to beginners who know next to nothing and also that they are only useful if you are a 50+ YoE veteran at the top of their craft who started programming with punch cards since they were 5-years-old.

I wonder which of these camps are right.

Mtinie•2mo ago

Both camps, for different reasons.

For novices, LLMs are infinitely patient rubber ducks. They unstick the stuck; helping people past the coding and system management hurdles that once required deep dives through Stack Overflow and esoteric blog posts. When an explanation doesn’t land, they’ll reframe until one does. And because they’re confidently wrong often enough, learning to spot their errors becomes part of the curriculum.

For experienced engineers, they’re tireless boilerplate generators, dynamic linters, and a fresh set of eyes at 2am when no one else is around to ask. They handle the mechanical work so you can focus on the interesting problems.

The caveat for both: intentionality matters. They reward users who know what they’re looking for and punish those who outsource judgment entirely.

govping•2mo ago

The craft vs practical tension with LLMs is interesting. We've found LLMs excel when there's a clear validation mechanism - for security research, the POC either works or it doesn't. The LLM can iterate rapidly because success is unambiguous.

Where it struggles: problems requiring taste or judgment without clear right answers. The LLM wants to satisfy you, which works great for 'make this exploit work' but less great for 'is this the right architectural approach?'

The craftsman answer might be: use LLMs for the systematic/tedious parts (code generation, pattern matching, boilerplate) while keeping human judgment for the parts that matter. Let the tool handle what it's good at, you handle what requires actual thinking.

jstrebel•2mo ago

I am certain that LLMs can help you with judgment calls as well. I spent the last month tinkering with spec-driven development of a new Web app and I must say, the LLM was very helpful in identifying design issues in my requirements document and actively suggested sensible improvements. I did not agree to all of them, but the conversation around high-level technical design decisions was very interesting and fruitful (e.g. cache use, architectural patterns, trade-offs between speed and higher level of abstraction).

smcameron•2mo ago

I found it funny that in a sentence that mentions "those who can recognize an LLM’s reveals", a few words later, there's an em-dash. I've often used em-dashes myself, so I find it a bit annoying that use of em-dashes is widely considered to be an AI tell.

bcantrill•2mo ago

The em-dash alone is not an LLM-reveal -- it's how the em-dash is used to pace a sentence. In my experience, with an LLM, em-dashes are used to even pacing; for humans (and certainly, for me!), the em-dash is used to deliberately change pacing -- to introduce a pause (like that one!), followed by a bit of a (metaphorical) punch. The goal is to have you read the sentence as I would read it -- and I think if you have heard me speak, you can hear me in my writing.

thundergolfer•2mo ago

Too much has been written about em-dashes and LLMs, but I'd highly recommend If it cites em dashes as proof, it came from a tool from Scott Smitelli if you haven't read it.

It's a brilliant skewering of the 'em dash means LLM' heuristic as a broken trick.

1. https://www.scottsmitelli.com/articles/em-dash-tool/

btbuildem•2mo ago

> The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so

This is a key difference. I've been writing software professionally for over two decades. It took me quite a long time to overcome certain invisible (to me) hesitations and objections to using LLMs in sdev workflows. At some point the realization came to me that this is simply the new way of doing things, and from this point onward, these tools will be deeply embedded in and synonymous with programming work. Recognizing this phenomenon for what it is somehow made me feel young again -- perhaps that's just the crust breaking around a calcified grump, but I do appreciate being able to tap into that all the same.

bryancoxwell•2mo ago

Find it interesting that the section about LLM’s tells when using it for writing is absolutely littered with emdashes

matt_daemon•2mo ago

I believe Bryan is a well known em dash addict

bryancoxwell•2mo ago

And I mean no disrespect to him for it, it’s just kind of funny

rl3•2mo ago

>I believe Bryan is a well known em dash addict

I was hoping he'd make the leaderboard, but perhaps the addiction took proper hold in more recent years:

https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

https://news.ycombinator.com/user?id=bcantrill

No doubt his em dashes are legit, of course.

minimaxir•2mo ago

You can stop LLMs from using em-dashes by just telling it to "never use em-dashes". This same type of prompt engineering works to mitigate almost every sign of AI-generated writing, which is one reason why AI writing heuristics/detectors can never be fully reliable.

dcre•2mo ago

This does not work on Bryan, however.

jgalt212•2mo ago

I guess, but if even in you set aside any obvious tells, pretty much all expository writing out of an LLM still reads like pablum without any real conviction or tons of hedges against observed opinions.

"lack of conviction" would be a useful LLM metric.

minimaxir•2mo ago

I ran a test for a potential blog post where I take every indicator of AI writing and tell the LLM "don't do any of these" and resulted in high school AP English quality writing. Which could be considered a lack of conviction level of writing.

bccdee•2mo ago

To be fair, LLMs usually use em-dashes correctly, whereas I think this document misuses them more often than not. For example:

> This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.

That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.

NobodyNada•2mo ago

LLMs also generally don't put spaces around em dashes — but a lot of human writers do.

kimixa•2mo ago

I think you're thinking of british-style "en-dashes" – which is often used for something that could have been separated by brackets but do have a space either side – rather than "em" dashes. They can also be used in a similar place as a colon – that is to separate two parts of a single sentence.

British users regularly use that sort of construct with "-" hyphens, simply because they're pretty much the same and a whole lot easier to type on a keyboard.

the_af•2mo ago

I don't know whether that use of the em-dash is grammatically correct, but I've seen enough native English writers use it like that. One example is Philip K Dick.

bccdee•2mo ago

Perhaps you have—or perhaps you've seen this construction instead, where (despite also using "or") the phrase on the other side of the dash is properly parenthetical and has its own subject.

anonnon•2mo ago

There was a comment recently by HN's most enthusiastic LLM cheerleader, Simon Willison, that I stopped reading almost immediately (before seeing who posted it), because it exuded the slop stench of an LLM: https://news.ycombinator.com/item?id=46011877

However, I was surprised to see that when someone (not me) accused him of using an LLM to write his comment, he flatly denied it: https://news.ycombinator.com/item?id=46011964

Which I guess means (assuming he isn't lying) if you spend too much time interacting with LLMs, you eventually resemble one.

Jweb_Guru•2mo ago

> if you spend too much time interacting with LLMs, you eventually resemble one

Pretty much. I think people who care about reducing their children's exposure to screen time should probably take care to do the same for themselves wrt LLMs.

Philpax•2mo ago

I don't know what to tell you: that really does not read like it was written by a LLM. You were perhaps set off by the very first sentence, which sounds like it was responding to a prompt?

jph00•2mo ago

It reads exactly like all his writing over many years afaict. Which is to say - it reads well. Just because someone is clear, thoughtful, and thorough, does not make them an AI. AI writing is actually quite different to this.

an_ko•2mo ago

I would have expected at least some consideration of public perception, given the extremely negative opinions many people hold about LLMs being trained on stolen data. Whether it's an ethical issue or a brand hazard depends on your opinions about that, but it's definitely at least one of those currently.

john01dav•2mo ago

He speaks of trust and LLMs breaking that trust. Is this not what you mean, but by another name?

> First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).

> Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another

> our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice

tolerance•2mo ago

I made the mistake of first reading this as a document intended for all in spite of it being public.

This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.

I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.

john01dav•2mo ago

> it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!)

This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).

worble•2mo ago

See: Kernighan's Law

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

https://www.laws-of-software.com/laws/kernighan/

DrewADesign•2mo ago

I think people misunderstand this quote. Cleverness in this context is referring to complexity, and generally stems from falling in love with some complex mechanism you dream up to solve a problem rather than challenging yourself to create something simpler and easier to maintain. Bolting together bits of LLM-created code is is far more likely to be “clever” rather than good.

SilverSlash•2mo ago

What an amazing quote!

tikhonj•2mo ago

That's because embarrassingly bad writing is useless, while embarrassingly bad code can still make the computer do (roughly) the right thing and lets you tick off a Jira ticket. So we end up having way more room for awful code than for awful prose.

Reading good code can be a better way to learn about something than reading prose. Writing code like that takes some real skill and insight, just like writing clear explanations.

zeroonetwothree•2mo ago

Some writing is functional, e.g. a letter notifying someone of some information. For that type of writing even bad quality can achieve its purpose. Indeed probably the majority of words written are for functional reasons.

jhhh•2mo ago

I've had the same thought about 'written' text with an LLM. If you didn't spend time writing it don't expect me to read it. I'm glad he seems to be taking a hard stance on that saying they won't use LLMs to write non-code artifacts. This principle extends to writing code as well to some degree. You shouldn't expect other people to peer review 'your' code which was simply generated because, again, you spent no time making it. You have to be the first reviewer. Whether these cultural norms are held firmly remains to be seen (I don't work there), but I think they represent thoughtful application of emerging technologies.

john01dav•2mo ago

> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.

My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:

1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project

2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.

3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.

4) I then tell it to generate the code

5) I skim & test the code to see if it's generally correct, and have it make corrections as needed

6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)

The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.

This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.

ryandrake•2mo ago

I've found that your step 6 takes the vast majority of the time I spend programming with LLMs. Like 10X+ the combined total of time steps 1-5 take. And that's if the code the LLM produced actually works. If it doesn't work (which happens quite often), then even more handholding and corrections are needed. It's really a grind. I'm still not sure whether I am net saving time using these tools.

I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?

Jaygles•2mo ago

I exclusively use the autocomplete in cursor. I hate reviewing huge chunks of llm code at one time. With the autocomplete, I’m in full control of the larger design and am able to quickly review each piece of llm code. Very often it generates what I was going to type myself.

Anything that involves math or complicated conditions I take extra time on.

I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence

zeroonetwothree•2mo ago

Maybe it subjectively feels like 2-3x faster but in studies that measure it we tend to see smaller improvements like in the range of 20-30% faster. It could be that you are an outlier, of course.

Jaygles•2mo ago

2-3x faster on getting the code written. Fully completing a coding task maybe only 20-30% faster, if we count chasing down requirements, reviews, waiting for CI to pass so I can merge etc.

NKjNkaka•2mo ago

This is my preferred way as well. And when you think about it, it makes sense. With advanced autocomplete you are:

1. Keeping the context very small 2. Keeping the scope of the output very small

With the added benefit of keeping you in the flow state (and in my experience making it more enjoyable).

To anyone that even hates LLMs give autocomplete a shot (with a keying to toggle it if it annoys you, sometimes it’s awful). It’s really no different than typing it manually wrt quality etc, so the speed up isn’t huge, but it feels a lot nicer.

hedgehog•2mo ago

You can have the tool start by writing an implementation plan describing the overall approach and key details including references, snippets of code, task list, etc. That is much faster than a raw diff to review and refine to make sure it matches your intent. Once that's acceptable the changes are quick, and having the machine do a few rounds of refinement to make sure the diff vs HEAD matches the plan helps iron out some of the easy issues before human eyes show up. The final review is then easier because you are only checking for smaller issues and consistency with the plan that you already signed off on.

It's not magic though, this still takes some time to do.

mythrwy•2mo ago

If it's stuff I have have been doing for years and isn't terribly complex I've found its generally quick to skim review. I don't need to read every line I can glance at it, know it's a loop and why, a function call or whatever. If I see something unusual I take that as an opportunity to learn.

I've seen LLMs write some really bad code a few times lately it seems almost worse than what they were doing 6 or 8 months ago. Could be my imagination but it seems that way.

ec109685•2mo ago

Don’t make manual corrections.

If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.

klauserc•2mo ago

The best way to get an LLM to follow style is to make sure that this style is evident in the codebase. Excessive instructions (whether through memories or AGENT.md) do not help as much.

Personally, I absolutely hate instructing agents to make corrections. It's like pushing a wet noodle. If there is lots to correct, fix one or two cases manually and tell the LLM to follow that pattern.

https://www.humanlayer.dev/blog/writing-a-good-claude-md

qudat•2mo ago

Insert before 4: make it generate tests that fail, review, then have it implement and make sure the tests pass.

Insert before that: have it creates tasks with beads and force it to let you review before marking a task complete

CerryuDu•2mo ago

How the heck it does not upset your engineering pride and integrity, to limit your own contribution to verifying and touching up machine slop, is beyond me.

You obviously cannot emotionally identify with the code you produce this way; the ownership you might feel towards such code is nowhere near what meticulously hand-written code elicits.

000ooo000•2mo ago

By this own article's standards, now there are 2 authors who don't understand what they've produced.

MobiusHorizons•2mo ago

This is exactly what the advice is trying to mitigate. At least as I see it, the responsible engineer (meaning author, not some quality of the engineer) needs to understand the intent of the code they will produce. Then if using an llm, they must take full owners of that code by carefully reviewing it or molding it until it reflects their intent. If at the end of this the “responsible” engineer does not understand the code the advice has not been followed.

rgoulter•2mo ago

> LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well.

I think this points out a key point.. but I'm not sure the right way to articulate it.

A human-written comment may be worth something, but an LLM-generated is cheap/worthless.

The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".

It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.

averynicepen•2mo ago

I'll give it a shot.

Text, images, art, and music are all methods of expressing our internal ideas to other human beings. Our thoughts are the source, and these methods are how they are expressed. Our true goal in any form of communication is to understand the internal ideas of others.

An LLM expresses itself in all the same ways, but the source doesn't come from an individual - it comes from a giant dataset. This could be considered an expression of the aggregate thoughts of humanity, which is fine in some contexts (like retrieval of ideas and information highly represented in the data/world), but not when presented in a context of expressing the thoughts of an individual.

LLMs express the statistical summation of everyone's thoughts. It presents the mean, when what we're really interested in are the data points a couple standard deviations away from the mean. That's where all the interesting, unique, and thought provoking ideas are. Diversity is a core of the human experience.

---

An interesting paradox is the use of LLMs for translation into a non-native language. LLMs are actively being used to better express an individual's ideas using words better than they can with their limited language proficiency, but for those of us on the receiving end, we interpret the expression to mirror the source and have immediate suspicions on the legitimacy of the individual's thoughts. Which is a little unfortunate for those who just want to express themselves better.

crabmusket•2mo ago

I think more people should read Naur's "programming as theory building".

A comment is an attempt to more fully document the theory the programmer has. Not all theory can be expressed in code. Both code and comment are lossy artefacts that are "projections" of the theory into text.

LLMs currently, I believe, cannot have a theory of the program. But they can definitely perform a useful simulacrum of such. I have not yet seen an LLM generated comment that is truly valuable. Of course, lots of human generated comments are not valuable either. But the ceiling for human comments is much, much higher.

teaearlgraycold•2mo ago

One thing I’ve noticed is that when writing something I consider insightful or creative with LLMs for autocompletion the machine can’t successfully predict any words in the sentence except maybe the last one.

They seem to be good at either spitting out something very average, or something completely insane. But something genuinely indicative of the spark of intelligence isn’t common at all. I’m happy to know that while my thoughts are likely not original, they are at least not statistically likely.

leobg•2mo ago

> I'd rather read the prompt.

That’s what I think when I see a news headline. What are you writing? Who cares. WHY are you writing it — that is what I want to know.

weitendorf•2mo ago

This is something that I feel rather conflicted about, because while I greatly dislike the LLM-slop-style writing that so many people are trying to abuse our attention with, I’ve started noticing that there are a large number of people (varying across “audiences”/communities/platforms”) who don’t really notice it, or at least that whoever is behind the slop is making the “right kind” of slop so that they don’t.

For example, I recently was perusing the /r/SaaS subreddit and could tell that most of the submissions were obviously LLM-generated, but often by telling a story that was meant to spark outrage, resonate with the “audience” (eg being doubted and later proven right), and ultimately conclude by validating them by making the kind of decision they typically would.

I also would never pass this off as anything else, but I’ve been finding it effective to have LLMs write certain kinds of documentation or benchmarks in my repos, just so that they/I/someone else have access to metrics and code snippets that I would otherwise not have time to write myself. I’ve seen non-native English speakers write pretty technically useful/interesting docs and tech articles by translating through LLMs too, though a lot more bad attempts than good (and you might not be able to tell if you can’t speak the language)…

Honestly the lines are starting to blur ever so slightly for me, I’d still not want someone using an LLM to chat with me directly, but if someone who could have an LLM build a simple WASM/interesting game and then write an interesting/informative/useful article about it, or steer it into doing so… I might actually enjoy it. And not because the prompt was good: instructions telling an LLM to go make a game and do a write up don’t help me as much or in the same way as being able to quickly see how well it went and any useful takeaways/tricks/gotchas it uncovered. It would genuinely be giving me valuable information and probably wouldn’t be something I’d speculatively try or run myself.

mcqueenjordan•2mo ago

As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:

> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.

Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.

Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".

I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.

dcre•2mo ago

In my experience, LLMs have been quite capable of producing code I am satisfied with (though of course it depends on the context — I have much lower standards for one-off tools than long-lived apps). They are able to follow conventions already present in a codebase and produce something passable. Whereas with writing prose, I am almost never happy with the feel of what an LLM produces (worth noting that Sonnet and Opus 4.5’s prose may be moving up from disgusting to tolerable). I think of it as prose being higher-dimensional — for a given goal, often the way to express it in code is pretty obvious, and many developers would do essentially the same thing. Not so for prose.

lukasb•2mo ago

One difference is that clichéd prose is bad and clichéd code is generally good.

joshka•2mo ago

Depends on what your prose is for. If it's for documentation, then prose which matches the expected tone and form of other similar docs would be clichéd in this perspective. I think this is a really good use of LLMs - making docs consistent across a large library / codebase.

minimaxir•2mo ago

I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.

girvo•2mo ago

The “change capture”/straight jacket style tests LLMs like to output drive me nuts. But humans write those all the time too so I shouldn’t be that surprised either!

mulmboy•2mo ago

What do these look like?

pmg101•2mo ago

  1. Take every single function, even private ones.
  2. Mock every argument and collaborator.
  3. Call the function.
  4. Assert the mocks were  called in the expected way.

These tests help you find inadvertent changes, yes, but they also create constant noise about changes you intend.

ornornor•2mo ago

Juniors on one of the teams I work with only write this kind of tests. It’s tiring, and I have to tell them to test the behaviour, not the implementation. And yet every time they do the same thing. Or rather their AI IDE spits these out.

senbrow•2mo ago

These tests also break encapsulation in many cases because they're not testing the interface contract, they're testing the implementation.

girvo•2mo ago

You beat me to it, and yep these are exactly it.

“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally

diamond559•2mo ago

If the goal is to document the code and it gets sidetracked and focuses on only certain parts it failed the test. It just further proves llm's are incapable of grasping meaning and context.

dcre•2mo ago

Docs also often don’t have anyone’s name on them, in which case they’re already attributed to an unknown composite author.

danenania•2mo ago

A problem I’ve found with LLMs for docs is that they are like ten times too wordy. They want to document every path and edge case rather focusing on what really matters.

It can be addressed with prompting, but you have to fight this constantly.

bigiain•2mo ago

I think probably my most common prompt is "Make it shorter. No more than ($x) (words|sentences|paragraphs)."

pxc•2mo ago

I've never been able to get that to work. LLMs can't count; they don't actually know how long their output is.

pxc•2mo ago

> A problem I’ve found with LLMs for docs is that they are like ten times too wordy

This is one of the problems I feel with LLM-generated code, as well. It's almost always between 5x and long and 20x (!) as long as it needs to be. Though in the case of code verbosity, it's usually not because of thoroughness so much as extremely bad style.

averynicepen•2mo ago

Writing is an expression of an individual, while code is a tool used to solve a problem or achieve a purpose.

The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.

Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".

We value diversity of thought in expression, but we value efficiency of problem solving for code.

There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".

>the code you actually want to ship is so far from what LLMs write

I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.

twodave•2mo ago

I argue that the intent of an engineer is contained coherently across the code of a project. I have yet to get an LLM to pick up on the deeper idioms present in a codebase that help constrain the overall solution towards these more particular patterns. I’m not talking about syntax or style, either. I’m talking about e.g. semantic connections within an object graph, understanding what sort of things belong in the data layer based on how it is intended to be read/written, etc. Even when I point it at a file and say, “Use the patterns you see there, with these small differences and a different target type,” I find that LLMs struggle. Until they can clear that hurdle without requiring me to restructure my entire engineering org they will remain as fancy code completion suggestions, hobby project accelerators, and not much else.

mac-attack•2mo ago

Very well stated.

themk•2mo ago

I recently published an internal memo which covered the same point, but I included code. I feel like you still have a "voice" in code, and it provides important cues to the reviewer. I also consider review to be an important learning and collaboration moment, which becomes difficult with LLM code.

AlexCoventry•2mo ago

> I think that the code you actually want to ship is so far from what LLMs write

It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.

mcqueenjordan•2mo ago

I guess to follow up slightly more:

- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.

- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.

- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.

IgorPartola•2mo ago

My suspicion is that this is a form of the paradox where you can recognize that the news being reported is wrong when it is on a subject in which you are an expert but then you move onto the next article on a different subject and your trust resumes.

Basically if you are a software engineer you can very easily judge quality of code. But if you aren’t a writer then maybe it is hard for you to judge the quality of a piece of prose.

knollimar•2mo ago

Gell-Mann amnesia

make_it_sure•2mo ago

try Opus 4.5, you'll be surprised. It might be true for past versions of LLMs, but they advanced a lot.

cheeseface•2mo ago

There are cases where I would start the coding process by copy-pasting existing code (e.g. test suites, new screens in the UI) and this is where LLMs work especially well and produce code that is majority of the time production-ready as-is.

A common prompt I use is approximately ”Write tests for file X, look at Y on how to setup mocks.”

This is probably not ”de novo” and in terms of writing is maybe closer to something like updating a case study powerpoint with the current customer’s data.

fallat•2mo ago

The problem with this text is it's a written anecdote. Could all be fake.

bgwalter•2mo ago

Cantrill jumps on every bandwagon. When he assisted in cancelling a Node developer (not a native English speaker) over pronouns he was following the Zeitgeist, now "Broadly speaking, LLM use is encouraged at Oxide."

He is a long way from Sun.

crabmusket•2mo ago

For those interested, here's a take from Bryan after that incident https://bcantrill.dtrace.org/2013/11/30/the-power-of-a-prono...

KronisLV•2mo ago

The change: https://github.com/joyent/libuv/pull/1015/files

> Sorry, not interested in trivial changes like that.

- bnoordhuis

As a not native English speaker, I think the change itself is okay (women will also occasionally use computers), but saying you're not interested in merging it is kinda cringe, for a lack of a better term - do you not realize that people will take issue with this and you're turning a trivial change into a messy discussion? Stop being a nerd and merge the damn changeset, it won't break anything either, read the room. Admittedly, I also view the people arguing in the thread to be similarly cringe, purely on the basis that if someone is uninterested/opposed to stuff like this, you are exceedingly unlikely to be able to make them care.

Feels the same as how allowlist/denylist reads more cleanly, as well as main for a branch name uses a very common word as well - as long as updating your CI config isn't too much work. To show a bit of empathy the other way as well, maybe people get tired of too many changes like that (e.g. if most of the stuff you review is just people poking the docs by rewording stuff to be able to say that they contributed to project X). Or maybe people love to take principled stances and to argue idk

> ...it’s not the use of the gendered pronoun that’s at issue (that’s just sloppy), but rather the insistence that pronouns should in fact be gendered.

Yeah, odd thing to get so fixated on when the they/them version is more accurate in this circumstance. While I don't cause drama when I see gendered ones (again, most people here have English as a second language), I wouldn't argue with someone a bunch if they wanted to correct the docs or whatever.

bgwalter•2mo ago

Also for those interested, here is Bryan's take on criticism of Sun:

https://landley.net/history/mirror/linux/kissedagirl.html

He wasn't fired or canceled. It is great to see Gen-Xers and Boomers having all the fun in the 1980s and 1990s and then going all prissy on younger people in the 2010s and trying to ruin their careers.

bcantrill•2mo ago

For those looking for context, this is a regrettable response of mine from nearly three decades ago, resurrected because people disagreed with the way my handling bad community behavior well over a decade ago. And for whatever it's worth, my explanation of all of this from a decade ago still stands[0].

[0] https://news.ycombinator.com/item?id=9041086

sunshowers•2mo ago

I didn't know about that incident before starting at Oxide, but if I'd known about it, it absolutely would have attracted me. I've written a large amount of technical content and not once in over a decade have I needed to use he/him pronouns in it. Bryan was 100% correct.

bgwalter•2mo ago

Joyent took funding from Peter Thiel. I have not seen attacks from Cantrill against Thiel for his political opinions, so he just punches down for street cred and goes against those he considers expendable.

What about Oxide? Oxide is funded by Eclipse ventures, which now installed a Trump friendly person:

https://www.reuters.com/business/finance/vc-firm-eclipse-tap...

kace91•2mo ago

The guide is generally very well thought, but I see an issue in this part:

It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.

I find two problems with this:

- there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.

- in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.

There is a significant risk in placing a translation layer between content and reader.

gpm•2mo ago

> Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.

I would consider this a failure in their tool use capabilities, not their reading ones.

To use them to read things (without relying on their much less reliable tool use) take the thing and put it in the context window yourself.

They still aren't perfect of course, but they are reasonably good.

Three whole books likely exceeds their context window size of course, I'd take this as a sign that they aren't up to a task of that magnitude yet.

kace91•2mo ago

>Three whole books likely exceeds their context window size of course

This was not “read all three books”, this was “check these three links with the (known) book synopsis/reviews there” and it made up the third one.

>I would consider this a failure in their tool use capabilities, not their reading ones.

Id give it to you if I got an error message, but the text being enhanced with wrong-but-plausible data is clearly a failure of reliability.

fastball•2mo ago

> It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.

I think you got this backwards, because I don't think the RFD said that at all. The point was about a social expectation for writing, not for reading.

kace91•2mo ago

This is what I’m referencing:

>using LLMs to assist comprehension should not substitute for actually reading a document where such reading is socially expected.

tonkinai•2mo ago

Based on paragraph length, I would assume that "LLMs as writers" is the most extensive use case.

forrestthewoods•2mo ago

> When debugging a vexing problem one has little to lose by using an LLM — but perhaps also little to gain.

This probably doesn't give them enough credit. If you can feed an LLM a list of crash dumps it can do a remarkable job producing both analyses and fixes. And I don't mean just for super obvious crashes. I was most impressed with a deadlock where numerous engineers and tried and failed to understand exactly how to fix it.

nrhrjrjrjtntbt•2mo ago

LLMs are good where there is a lot of detail but the answer to be found is simple.

This is sort of the opposite of vibe coding, but LLMs are OK at that too.

forrestthewoods•2mo ago

> LLMs are good where there is a lot of detail but the answer to be found is simple.

Oooo I like that. Will try and remember that one.

Amusingly, my experience is that the longer an issue takes me to debug the simpler and dumber the fix is. It's tragic really.

throwdbaaway•2mo ago

After the latest production issue, I have a feeling that opus-4.5 and gpt-5.1-codex-max are perhaps better than me at debugging. Indeed my role was relegated to combing through the logs, finding the abnormal / suspicious ones, and feeding those to the models.

bdangubic•2mo ago

> assurance that the model will not use the document to train future iterations of itself.

believing this in 2025 is really fascinating. this is like believing Meta won’t use info they (i)legally collected about you to serve you ads

AlexCoventry•2mo ago

I wonder if they would be willing to publish the "LLMs at Oxide" advice, linked in the OP [1], but currently publicly inaccessible.

[1] https://github.com/oxidecomputer/meta/tree/master/engineerin...

sudomateo•2mo ago

Disclaimer: Oxide employee here.

To be honest there's really no secret sauce in there. It's primarily how to get started with agents, when to abandon your context and start anew, and advice on models, monitoring cost, and prompting. This is not to diminish the value of the information as it's good information written by great colleagues. I just wanted to note that most of the information can be obtained from the official AI provider documentation and blog posts from AI boosters like Thorsten Ball.

AlexCoventry•2mo ago

Thanks.

sudomateo•2mo ago

You're welcome. My colleague published the text for it: https://gist.github.com/david-crespo/5c5eaf36a2d20be8a3013ba...

AlexCoventry•2mo ago

Cool, thanks again to both of you. :-)

StarterPro•2mo ago

Nobody has yet to explain how an LLM can be better than a well paid human expert.

nrhrjrjrjtntbt•2mo ago

The not needing to pay it well.

WhyOhWhyQ•2mo ago

A well paid human expert can find lots of uses of LLMs. I'm still not convinced that humans will ever be totally replaced, and what work will look like is human experts using LLMs as another tool in the toolbox, just like how an engineer would have used a slide rule or mechanical calculator back in the day. The kind of work they're good at doesn't cover the full range of necessary engineering tasks, but they do open up new avenues. For instance, yesterday I was able to get the basic gist of three solutions for a pretty complex task in about an hour. The result of that was me seeing that two of them were unlikely to work for what I'm doing, so that now I can invest actual effort in the third solution.

felipeerias•2mo ago

Tools can make individuals and teams more effective. This is just as true for LLM-based tools as it was for traditional ones.

The question is not whether one (1) LLM can replace one (1) expert.

Rather, it is how much farther an expert can get through better tooling. In my experience, it can be pretty far indeed.

0x0000000•2mo ago

> Ironically, LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation

Is there any evidence for this?

fearnot•2mo ago

sethops1•2mo ago

If anything my experience has been the opposite of this. LLM detection is guesswork for an LLM.

koolala•2mo ago

I disagree with LLM's as Editors. The about of — in the post is crazy.

keeda•2mo ago

Here's the only simple, universal law that should apply:

THOU SHALT OWN THE CODE THAT THOU DOST RENDER.

All other values should flow from that, regardless of whether the code itself is written by you or AI or by your dog. If you look at the values in the article, they make sense even without LLMs in the picture.

The source of workslop is not AI, it's a lack of ownership. This is especially true for Open Source projects, which are seeing a wave of AI slop PR's precisely because the onus of ownership is largely on the maintainers and not the upstart "contributors."

Note also that this does not imply a universal set of values. Different organizations may well have different values for what ownership of code means -- E.g. in the "move fast, break things" era of FaceBook, workslop may have been perfectly fine for Zuck! (I'd bet it may even have hastened the era of "Move fast with stable infrastructure.") But those values must be consistently applied regardless of how the code came to be.

cobertos•2mo ago

> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)

That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.

elAhmo•2mo ago

I would be surprised they have any data about this. There are so many ways LLMs can be involved, from writing everything, to making text more concise or just "simple proofreading". Detecting all this with certainty is not trivial and probably not possible with the current tools we have.

yard2010•2mo ago

I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.

I didn't test it and I'm far from an expert, maybe someone can challenge it?

jampekka•2mo ago

That seems somewhat similar to perplexity based detection, although you can just get the probabilities of each token instead of picking n-best, and you don't have to generate.

It kinda works, but is not very reliable and is quite sensitive to which model the text was generated with.

This page has nice explanations:

https://www.pangram.com/blog/why-perplexity-and-burstiness-f...

akoboldfrying•2mo ago

I expect that, for values of n for which this test consistently reports "LLM-generated" on LLM-generated inputs, it will also consistently report "LLM-generated" on human-generated inputs. But I haven't done the test either so I could be wrong.

bcantrill•2mo ago

I am really surprised that people are surprised by this, and honestly the reference was so casual in the RFD because it's probably the way that I use LLMs the most (so very much coming from my own personal experience). I will add a footnote to the RFD to explain this, but just for everyone's benefit here: at Oxide, we have a very writing-intensive hiring process.[0] Unsurprisingly, over the last six months, we have seen an explosion of LLM-authored materials (especially for our technical positions). We have told applicants to be careful about doing this[1], but they do it anyway. We have also seen this coupled with outright fraud (though less frequently). Speaking personally, I spend a lot of time reviewing candidate materials, and my ear has become very sensitive to LLM-generated materials. So while I generally only engage an LLM to aid in detection when I already have a suspicion, they have proven adept. (I also elaborated on this a little in our podcast episode with Ben Shindel on using LLMs to explore the fraud of Aidan Toner-Rodgers.[2])

I wasn't trying to assert that LLMs can find all LLM-generated content (which feels tautologically impossible?), just that they are useful for the kind of LLM-generated content that we seek to detect.

[0] https://rfd.shared.oxide.computer/rfd/0003

[1] https://oxide.computer/careers

[2] https://oxide-and-friends.transistor.fm/episodes/ai-material...

12300886574321•2mo ago

I debated not writing this, as I planned on re-applying again, as oxide is in many ways a dream company for me, and didn't want this to hurt my chances if I could be identified and it was seen as negative or critical (I hope not, I'm just relaying my experience, as honestly as I can!), but I felt like I needed to make this post (my first on HN, a longtime lurkerj). I applied in the last 6 months, and against my better judgement, encouraged by the perceived company culture, the various luminaries on the team, the varied technical and non-technical content on the podcasts, and my general (unfortunate) propensity for honesty, I was more vulnerable than normal in a tech application, and spent many hours writing it. (fwiw, it's not super relevant to what I'll get to, but you can and should assume I am a longtime Rust programmer (since 1.0) with successful open source libraries, even ones used by oxide, but also a very private person, no socials, no blogging, etc., so much to my chagrin, I assumed I would be a shoe-in :)) After almost 3 months, I was disappointed (and surprised if I'm being honest, hubris, indeed!) to receive a very bland, uninformative rejection email for the position, stating they received too many applications for the position (still not filled as of today!) and would not proceed at this time, and welcome to re-apply, etc. Let me state: this is fine, this is not my first rodeo! I have a well paying (taking the job would have been a significant paycut, but that's how much I wanted to work there!), albeit at the moment, unchallenging job at a large tech company. What I found particularly objectionable was that my writing samples (urls to my personal samples) were never accessed.

This is or could be signal for a number of things, but what was particularly disappointing was the heavy emphasis on writing in the application packet and the company culture, as e.g., reiterated by the founder I'm replying to, and yet my writing samples were never even read? I have been in tech for many years, seen all the bullshit in recruiting, hiring, performed interviews many times myself, so it wouldn't be altogether surprising that a first line recruiter throws a resume into a reject pile for <insert reasons>, but then I have so many other questions - why the 3 months delay if tossed quickly, and if it truly was read by the/a founder or heavily scrutinized, as somewhat indicated by the post, why did they not access my writing samples? There are just more questions now. All of this was bothersome, and if I'm being honest, made me question joining the company, but what really made me write this response, is that I am now worried, given the content of the post I'm replying to, whether my application was flagged as LLM generated? I don't think my writing style is particularly LLMish, but in case that's in doubt, believe me or not, my application, and this response does not have a single word from an LLM. This is all, sui generis, me, myself, and I. (This doesn't quite explain why my samples weren't accessed, but if I'm being charitable, perhaps the content of the application packet seemed of dubious provenance?) Irregardless, if it was flagged, I suppose the long and short of this little story is: are you sending applicants rejection letters noting this suspicion, at least as a courtesy? If I was the victim of a false positive, I would at least like to know. This isn't some last ditch attempt (the rejection was many months ago) to get re-eval'd; I have a job, I can reapply in my own time, and even if this was an oversight or mistake (although not accessing the writing samples at all is somewhat of a red flag for me), there is no way they can contact me through this burner account, it's just, like, the principle of it, and the words needed to be said :) Thank you, and PS, even through it all, I (perhaps now guiltily) still love your podcast :D

venturecruelty•2mo ago

I mean this nicely: please don't prostrate yourself for these companies. Please have some more respect for yourself.

dgroshev•2mo ago

Hey fellow failed applicant!

I had a very similar experience, except I got the automated email after two months, not three — you sound like a stronger candidate, so maybe that's why I got rejected sooner, which'd be fair enough. Still, spending about a week's worth of evenings between the suggested materials, reflecting, writing, and editing 15 pages for one job application and having zero human interaction feels uniquely degrading.

I disagree with your point about that being fine. I think it's not good enough to replicate the bare minimum of what the rest of the industry does while asking for so much more from candidates.

A standard custom, well researched cover letter takes an order of magnitude less effort. When it's cookie cutter rejected by someone spending a few seconds on the CV, it's at least understandable: the effort they'd spend writing a rejection (or replying back) is higher than the amount of effort they spent evaluating the application.

With Oxide however, Brian made a point that they "definitely read everyone's materials" [1]. Which means reading at the very least five pages per candidate. If that's still the case, having an actual human on the other side of the rejection would add a very small amount of time to the whole process, but the company decided to do the absolute least possible. It's a choice, and I think this choice goes against their own principle of decency:

"We treat others with dignity, be they colleague, customer, community or competitor."

I wish Oxide best of luck. They have lots of very smart, very driven people that I'd love to work with, and I love what they are doing. Hope this feedback helps them get better.

[1]: https://youtu.be/wN8lcIUKZAU?t=1400

P.S. Don't you dare, dear reader, consider the emdash above an LLM smell.

bcantrill•2mo ago

I understand your disappointment; we are very explicit about why we provide so little feedback.[0] I disagree that it's indecent; to the contrary, we allow anyone to shoot their shot, with the guarantee that they will be thoughtfully considered.

[0] https://rfd.shared.oxide.computer/rfd/0003#_rejection_of_non...

dgroshev•2mo ago

Indeed, I understand your reasoning, you talk about that in the podcast in the RFD. This is why I wasn't talking about the lack of feedback, but the lack of human interaction. While there is nothing constructive to be done about the disappointment of rejection, this part is very much in your power to change, and that's why I think it's constructive feedback and not just venting.

That said, the RFD does say this:

> Candidates may well respond to a rejection by asking for more specific feedback; to the degree that feedback can be constructive, it should be provided.

Even just replying with refusal to provide feedback would still be more humane and decent.

bcantrill•2mo ago

Please DM me and I'll let you know if there's constructive feedback to be provided.

bcantrill•2mo ago

Your materials were absolutely read (and indeed, RFD 576 makes clear that LLMs are not a substitute for reading materials). If you have writing samples that were external links, I can't guarantee that they were clicked through though: in part because the materials themselves constitute a galactic writing sample, we may have not clicked through because we were already at a decision point before reading your external writing. As for more specific feedback, if you can DM me, I'll see if I can give you more specific feedback -- but as we explicitly indicate in RFD 3[0], we are very limited in what we can provide.

As for your application getting flagged as LLM-generated: we in fact don't flag such applications (we just reject them), and it's very unlikely that we felt that yours were LLM-generated. (We really, really give applicants the benefit of the doubt on that.)

All of that said: absolutely no one is a shoe-in at Oxide. If you genuinely thought that (and if your materials reflected that kind of overconfidence), it may have well guided our decision. We are very selective in terms of hiring -- and we are very oversubscribed. Bluntly: it's very hard to get a job at Oxide. I know this seems harsh and/or unjust or unfair, but this is the reality. As we told you in the letter we sent you, we already have people at Oxide who prevailed on subsequent applications, because they found a job that's a better fit for them, or they have vastly improved materials (or both). Finally, you can also take solace in knowing that your post here in no way hurts your future chances at Oxide, and we look forward to reading your materials should you choose to apply in the future.

[0] https://rfd.shared.oxide.computer/rfd/0003#_rejection_of_non...

cobertos•2mo ago

I still don't quite get this reasoning. A statistical model for detecting a category (like is this written hiring material LLM generated or not, is this email spam or not, etc) is most metricized by its false positive and false negative rate. But it doesn't sound like anyone measures this, it just gets applied after a couple times of "huh, that worked" and we move on. There's a big difference between a model that performs successfully 70% of the time vs one that performs 99% but I'm not sure we can say which this is?

Maybe if LLMs were aligned for this specific task it'd make more sense? But they're not. Their alignment tunes them to provide statistically helpful responses for a wide variety of things. They prefer positive responses to negative ones and are not tuned directly as a detection tool for arbitrary categorization. And maybe they do work well, but maybe it's only a specific version of a specific model against other specific models hiring material outputs? There's too many confounding things here to not have to study this in a rigorous way to come to the conclusion that felt... not carefully considered.

Maybe you have considered this more than I know. It sounds like you work a lot with this data. But the off-handedness set off my skepticism.

csb6•2mo ago

Strange to see no mention of potential copyright violations found in LLM-generated code (e.g. LLMs reproducing code from Github verbatim without respecting the license). I would think that would be a pretty important consideration for any software development company, especially one that produces so much free software.

dboreham•2mo ago

Is there current generation LLMs do this? I suppose I mean "do this any more than human developers do".

theresistor•2mo ago

A very recent example: https://github.com/ocaml/ocaml/pull/14369

phyzome•2mo ago

...what a remarkable thread.

menaerus•2mo ago

Right? If this is really true, that some random folk without compiler engineering experience, implemented a completely new feature in ocaml compiler by prompting the LLM to produce the code for him, then I think it really is remarkable.

ccortes•2mo ago

Oh wow, is that what you got from this?

It seems more like a non experienced guy asked the LLM to implement something and the LLM just output what and experienced guy did before, and it even gave him the credit

rcxdude•2mo ago

Copyright notices and signatures in generative AI output are generally a result of the expectation created by the training data that such things exist, and are generally unrelated to how much the output corresponds to any particular piece of training data, and especially to who exactly produced that work.

(It is, of course, exceptionally lazy to leave such things in if you are using the LLM to assist you with a task, and can cause problems of false attribution. Especially in this case where it seems to have just picked a name of one of the maintainers of the project)

menaerus•2mo ago

Did you take a look at the code? Given your response I figure you did not because if you did you would see that the code was _not_ cloned but genuinely compiled by the LLM.

kfajdsl•2mo ago

It’s one thing for you (yes, you, the user using the tool) to generate code you don’t understand for a side project or one off tool. It’s another thing to expect your code to be upstreamed into a large project and let others take on the maintenance burden, not to mention review code you haven’t even reviewed yourself!

Note: I, myself, am guilty of forking projects, adding some simple feature I need with an LLM quickly because I don’t want to take the time to understand the codebase, and using it personally. I don’t attempt to upstream changes like this and waste maintainers’ time until I actually take the time myself to understand the project, the issue, and the solution.

menaerus•2mo ago

What are you talking about? It was ridiculously useful debugging feature that nobody in their sanity would block because "added maintenance". MR was rejected purely because of political/social reasons.

yard2010•2mo ago

>> Here's my question: why did the files that you submitted name Mark Shinwell as the author?

> Beats me. AI decided to do so and I didn't question it. I did ask AI to look at the OxCaml implementation in the beginning.

This shows that the problem with AI is philosophical, not practical

don-bright•2mo ago

Also since LLM generated content is not copyrightable what happens to code you publish as Copyleft license? The entire copyleft system is based on the idea of a human holding copyright to copyleft code. Is a big chunk of it, the LLM part, basically public domain? How do you ensure theres enough human content to make it copyrightable and hence copyleftable….

IshKebab•2mo ago

> since LLM generated content is not copyrightable

That's not how it works. If you ask an LLM to write Harry Potter and it writes something that is 99% the same as Harry Potter, it isn't magically free of copyright. That would obviously be insane.

The legal system is still figuring out exactly what the rules are here but it seems likely that it's going to be on the LLM user to know if the output is protected by copyright. I imagine AI vendors will develop secondary search thingies to warn you (if they haven't already), and there will probably be some "reasonable belief" defence in the eventual laws.

Either way it definitely isn't as simple as "LLM wrote it so we can ignore copyright".

rcxdude•2mo ago

I think the poster is looking at it from the other way: purely machine-generated content is not generally copryrightable, even if it can violate copyright. So it's more a question of can a coplyleft license like GPL actually protect something that's original but primarily LLM generated? Should it do so?

(From what I understand, the amount of human input that's required to make the result copyrightable can be pretty small, perhaps even as little as selecting from multiple options. But this is likely to be quite a gray area.)

rafterydj•2mo ago

>it seems likely that it's going to be on the LLM user to know if the out is protected by copyright.

To me, this is what seems more insane! If you've never read Harry Potter, and you ask an LLM to write you a story about a wizard boy, and it outputs 80% Harry Potter - how would you even know?

> there will be probably be some "reasonable belief" defence in eventual laws.

This is probably true, but it's irksome to shift all blame away from the LLM producers, using copy-written data to peddle copy-written output. This simply turns the business into copyright infringement as a service - what incentive would they have to actually build those "secondary search thingies" and build them well?

> it definitely isn't as simple as "LLM wrote it so we can ignore copyright".

Agreed. The copyright system is getting stress tested. It will be interesting to see how our legal systems can adapt to this.

IshKebab•2mo ago

> how would you even know?

The obvious way is by searching the training data for close matches. LLMs need to do that and warn you about it. Of course the problem is they all trained on pirated books and then deleted them...

But either way it's kind of a "your problem" thing. You can't really just say "I invented this great tool and it sometimes lets me violate copyright without realising. You don't mind do you, copyright holders?"

fastball•2mo ago

Has anything like this worked its way through the courts yet?

adastra22•2mo ago

Yes, training is considered fair use, and output is non-copyrightable / public domain. With many asterix and footnotes, of course.

Madmallard•2mo ago

Don't see how output being public domain makes sense when they could be outputting copyrighted code.

Shouldn't the right's extend forward and simply require the LLM code to be deleted?

menaerus•2mo ago

First, you have to prove it that it produced the copyrighted code. The question is what copyrighted code is in the first place? Literal copy-paste from source is easy but I think 99% of the time this isn't the case.

adastra22•2mo ago

With many asterix and footnotes. One of which being that if it literally output the exact code, of course that would be copyright infringement. Something that greatly resembled but with minor changes would be a gray area.

Those kinds of cases, although they do happen, are exceptional. In a typical output that doesn't not line-for-line resemble a single training input, it is considered a new, but non-copyrightable work.

vegardx•2mo ago

(I'm not a lawyer)

You should be careful about speaking in absolute terms when talking about copyright.

There is nothing that prevents multiple people from owning copyright to identical works. This is also why copyright infringement is such a mess to litigate.

I'd also be interested in knowing why you think code generated by LLMs can't be copyrighted. That's quite a statement.

There's also the problem with copyright law and different jurisdictions.

adastra22•2mo ago

It is the official stance of the US copyright office.

It was upheld by Thaler v. Perlmutter.

Bartz v. Anthropic and Kadrey v. Meta confirmed with similar rulings.

cdaringe•2mo ago

Perhaps, given the target audience and the state of the world, “it goes without saying” applies, or is wrapped up implicitly already thru the mentioned checks and balances (human firmly in the loop, etc etc)

fearnot•2mo ago

I fully disagree with 1) the stance, 2) the conclusions.

Simplita•2mo ago

Oxide’s approach is interesting because it treats LLMs as a tool inside a much stricter engineering boundary. Makes me wonder how many teams would avoid chaos if they adopted the same discipline.

hexo•2mo ago

"LLMs are amazingly good at writing code" that one was good. I cant stop laughing.

Madmallard•2mo ago

I wrote an entire multiplayer game in XNA that I've tried repeatedly to get LLMs to translate to javascript

it's just utterly hopeless how bad they are at doing it

even if I break it down into parts once you get into the stuff that actually matters i.e. the physics, event handling, and game logic induced by events, it just completely falls apart 100% of the time

azemetre•2mo ago

I felt this the other day. I wouldn't even consider my example exotic, p2p systems using electron? It just couldn't figure out how to work with YJS correctly.

These things aren't hard if you're familiar with the documentation and have made them before, but what there is is an extreme dearth of information about it compared to web dev tutorials.

leecommamichael•2mo ago

I agree with your sentiment, but I do find it amazing that the underlying techniques of inference can emit code that is as apparently coherent as it is. (This does not imply actual coherence.)

philippta•2mo ago

> LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it.

To extend that: If the LLM is the author and the responsible engineer is the genuine first reviewer, do you need a second engineer at all?

Typically in my experience one review is enough.

bananapub•2mo ago

yes, obviously?

anyone who is doing serious enough engineering that they have the rule of "one human writes, one human reviews" wants two humans to actually put careful thought in to a thing, and only one of them is deeply incentivised to just commit the code.

your suggestion means less review and worse incentives.

Yeask•2mo ago

anyone who is doing serious enough engineering is not using LLMS.

ares623•2mo ago

Yeesss this is what I’ve been (semi-sarcastically) thinking about. Historically it’s one author and one reviewer before code gets shipped.

Why introduce a second reviewer and reduce the rumoured velocity gained by LLMs? After all, “it doesn’t matter what wrote the code” right.

I say let her rip. Or as the kids say, code goes brrr.

sevensor•2mo ago

I disagree. Code review has a social purpose as well as a technical one. It reinforces a shared understanding of the code and requires one person to assure another that the code is ready for review. It develops consensus about design decisions and agreement about what the code is for. With only one person, this is impossible. “Code goes brrr” is a neutral property. It can just as easily take you to the wrong destination as the right one.

K0nserv•2mo ago

More eyes are better, but more importantly code review is also about knowledge dissemination. If only the original author and the LLM saw the code you have a bus factor of 1. If another person reviews the bus factor is closer to 2.

Madmallard•2mo ago

"LLMs can be quite effective writing code de novo."

Maybe for simple braindead tasks you can do yourself anyway.

Try doing it on something actually hard or complex and they get it wrong 100/100 if they don't have adequate training data, and 90/100 if they do.

Iridescent_•2mo ago

> Oxide employees bear responsibility for the artifacts we create, whatever automation we might employ to create them.

Yes, allow the use of LLMs, encourage your employees to use them to move faster by rewarding "performance" regardless of risks, but make sure to place responsibility of failure upon them so that when it happens, the company culture should not be blamed.

tizzy•2mo ago

The idea that LLMs are amazing at comprehension but we are expected to read original documents seems contradictory to me? I’m also wary of using them as editors and losing the writers voice as that feels heavily prompt dependent and whether or not the writer does a final pass without any LLM. Asking someone else to re-write is losing your voice if you don’t have an opinion on how the re-write turns out

atmosx•2mo ago

Nothing new here. Antirez for once has taken a similar stance on his YouTube video channel which has material on the topic. But it's worthwhile having a document like this publicly available by a company that the tech crowd seems to respect.

<offtopic> The "RFD" here stands for "Reason/Request for Decision" or something else? (Request for Decision doesn't have a nice _ring_ on it tbh). I'm aware of RFCs ofc and the respective status changes (draft, review, accepted, rejected) or ADR (Architectural Decision Record) but have not come across the RFD acronym. Google gave several different answers. </offtopic> </offtopic>

__jonas•2mo ago

It stands for ‘Request for Discussion’:

https://rfd.shared.oxide.computer/rfd/0001

atmosx•2mo ago

Thanks.

peheje•2mo ago

I know I'm walking into a den of wolves here and will probably get buried in downvotes, but I have to disagree with the idea that using LLMs for writing breaks some social contract.

If you hand me a financial report, I expect you used Excel or a calculator. I don't feel cheated that you didn't do long division by hand to prove your understanding. Writing is no different. The value isn't in how much you sweated while producing it. The value is in how clear the final output is.

Human communication is lossy. I think X, I write X' (because I'm imperfect), you understand Y. This is where so many misunderstandings and workplace conflicts come from. People overestimate how clear they are. LLMs help reduce that gap. They remove ambiguity, clean up grammar, and strip away the accidental noise that gets in the way of the actual point.

Ultimately, outside of fiction and poetry, writing is data transmission. I don't need to know that the writer struggled with the text. I need to understand the point clearly, quickly, and without friction. Using a tool that delivers that is the highest form of respect for the reader.

growse•2mo ago

> The value is in how clear the final output is.

Clarity is useless if it's inaccurate.

Excel is deterministic. ChatGPT isn't.

kstrauser•2mo ago

While I understand the point you’re making, the idea that Excel is deterministic is not commonly shared among Excel experts. It’s all fun and games until it guesses that your 10th separator value, “SEP-10”, is a date.

grufkork•2mo ago

I think the main problem is people using the tool badly and not producing concise material. If what they produced was really lean and correct it'd be great, but you grow a bit tired when you have to expend time reviewing and parsing long, winding and straight wrong PRs and messages from _people_ who have not put in the time.

mft_•2mo ago

I’m with you, and further, I’d apply this (with some caveats) to images created by generative AI too.

I’ve come across a lot of people recently online expressing anger and revulsion at any images or artwork that have been created by genAI.

For relatively mundane purposes, like marketing materials, or diagrams, or the sort of images that would anyway be sourced from a low-cost image library, I don’t think there’s an inherent value to the “art”, and don’t see any problem with such things being created via genAI.

Possible consequences:

1) Yes, this will likely lead to loss/shifts in employment, but wasn’t progress ever like this? People have historically reacted strongly against many such shifts when advancing technology threatens some sector, but somehow we always figure it out and move on.

2) For genuine art, I suspect this will in time lead to a greater value being placed in demonstrably human-created originals. Related, there’s probably of money to be made by whoever can create a trusted system somehow capturing proof of human work, in a way that can’t be cheated or faked.

Libidinalecon•2mo ago

Totally agree. The output is what matters.

At this point, who really cares what the person who sees everything as "AI slop" thinks?

I would rather just interact with Gemini anyway. I don't need to read/listen to the "AI slop hunter" regurgitate their social media feed and NY Times headlines back to me like a bad language model.

Yeask•2mo ago

If the output is what matters by definition using a non deterministic does not sound like a good idea.

throw4847285•2mo ago

Something only a bad writer would write.

rcxdude•2mo ago

I think often, though, people use LLMs as a substitute for thinking about what they want to express in a clear manner. The result is often a large document which locally looks reasonable and well written but overall doesn't communicate a coherant point because there wasn't one expressed to the LLM to begin with, and even a good human writer can only mind-read so much.

MobiusHorizons•2mo ago

The point made in the article was about social contract, not about efficacy. Basically if you use an llm in such a way that the reader detects the style, you lose the trust of the reader that you as the author rigorously understand what has been written, and the reader loses the incentive pay attention easily.

I would extend the argument further to say it applies to lots of human generated content as well. Especially sales and marketing information which similarly elicit very low trust.

cvcderringer•2mo ago

I had trouble getting past the Early Modern English tinge of the language used in this. It’s fun, but it distracts from the comprehension in attempt to just sound epic. It’s fine if you’re writing literature, but it comes off sounding uppity in a practical doc for devs. Writing is not just about conveying something in a mood you wish to set. Study how Richard Feynman and Warren Buffett communicated to their audiences; part of their success is that they speak to their people in the language all can easily understand.

batney•2mo ago

Here it is, rewritten in accessible English:

Using Large Language Models (LLMs) at Oxide

This document explains how we should think about using LLMs (like ChatGPT or similar tools) at Oxide.

What are LLMs?

LLMs are very advanced computer programs that can understand and generate text. They've become a big deal in the last five years and can change how we work. But, like any powerful tool, they have good and bad sides. They are very flexible, so it’s hard to give strict rules about how to use them. Still, because they are changing so fast, we need to think carefully about when and how we use them at Oxide.

What is Important When Using LLMs

We believe using LLMs should follow our core values:

Responsibility:

We are responsible for the work we produce. Even if we use an LLM to help, a human must make the final decisions. The person using the LLM is responsible for what comes out.

Rigor (Care and Precision):

LLMs can help us think better or find mistakes, but if we use them carelessly, they can cause confusion. We should use them to improve our work, not to cut corners.

Empathy:

Remember, real people read and write what we produce. We should be kind and respectful in our language, whether we are writing ourselves or letting an LLM help.

Teamwork:

We work as a team. Using LLMs should not break trust among team members. If we tell others we used an LLM, it might seem like we’re avoiding responsibility, which can hurt trust.

Urgency (Doing Things Quickly):

LLMs can help us work faster, but we shouldn’t rush so much that we forget responsibility, care, and teamwork. Speed is good, but not at the cost of quality and trust.

How We Use LLMs

LLMs can be used in many ways. Here are some common uses:

1. As Readers

LLMs are great at quickly understanding documents, summaries, or answering questions about texts.

Important: When sharing documents with an LLM, make sure your data is private. Also, remember that uploading files might allow the LLM to learn from your data unless you turn that off.

Note: Use LLMs to help understand documents, but don’t skip reading them yourself. LLMs are tools, not replacements for reading carefully.

2. As Editors

LLMs can give helpful feedback on writing, especially after you’ve written a draft. They can suggest improvements in structure and wording.

Caution: Sometimes, LLMs may flatter your work too much or change your style if used too early. Use them after you’ve done some work yourself.

3. As Writers

LLMs can write text, but their writing can be basic or obvious. Sometimes, they produce text that shows it was made by a machine.

Why be careful? If readers see that the writing is from an LLM, they might think the author didn’t put in enough effort or don’t truly understand the ideas.

Our rule: Usually don’t let LLMs write your final drafts. Use them to help, but own your words and ideas.

4. As Code Reviewers

LLMs can review code and find problems, but they can also miss issues or give bad advice. Use them as a helper, not a replacement for human review.

5. As Debuggers

LLMs can sometimes help find solutions to tricky problems. They might give helpful hints. But don’t rely on them too much—use them as a second opinion.

6. As Programmers

LLMs are very good at writing code, especially simple or experimental code. They can be useful for quick tasks like writing tests or prototypes.

Important: When an LLM writes code, the person responsible must review it carefully. Responsibility for the code stays with the human.

Teamwork: If you use an LLM to generate code, make sure you understand and review it yourself first.

How to Use LLMs Properly

There are detailed guidelines and tips in the internal document called "LLMs at Oxide."

In general:

Using LLMs is encouraged, but always remember your responsibilities—to your product, your customers, and your team.

dcre•2mo ago

Thanks, this will be perfect for the team of 6 year olds that work for me.

mwcampbell•2mo ago

Simple language isn't just for children. It's also good for non-native speakers. Besides, even for those who can understand complex grammar and obscure words, parsing unnecessarily complex language takes extra effort.

In this specific case, I don't think the rewritten version of the document is infantilizing.

dcre•2mo ago

It's not useless, but the original document is aimed at expert craftspeople and there's a lot of content in the texture of it.

MobiusHorizons•2mo ago

What do you mean? The document seemed incredibly digestible to me.

Are you speaking about words like “shall”? I didn’t notice them, but In RFCs those are technical terms which carry precise meaning.

dcre•2mo ago

Feynman at the 1965 Nobel banquet: “Each joy, though transient thrill, repeated in so many places amounts to a considerable sum of human happiness. And, each note of affection released thus one upon another has permitted me to realize a depth of love for my friends and acquaintances, which I had never felt so poignantly before.”

https://www.nobelprize.org/prizes/physics/1965/feynman/speec...

petetnt•2mo ago

Funny how the article states that "LLMs can be excellent editors" and then the post repeats all the mistakes that no editor would make:

1. Because reading posts like this 2. Is actually frustrating as hell 3. When everything gets dragged around and filled with useless anecdotes and 3 adjective mumbojumbos and endless emdashes — because somehow it's better than actually just writing something up.

Which just means that people in tech or in general have no understanding what an editor does.

xondono•2mo ago

Funnily enough, the text is so distinctively Cantrillian that I have no doubts this is 100% an “organic intelligence” product.

CerryuDu•2mo ago

> LLMs are superlative at reading comprehension, able to process and meaningfully comprehend documents effectively instantly.

I couldn't disagree more. (In fact I'm shocked that Bryan Cantrill uses words like "comprehension" and "meaningfully" in relation to LLMs.)

Summaries provided by ChatGPT, conclusions drawn by it, contain exaggerations and half-truths that are NOT there in the actual original sources, if you bother enough to ask ChatGPT for those, and to read them yourself. If your question is only slightly suggestive, ChatGPT's tuning is all too happy to tilt the summary in your favor; it tells you what you seem to want to hear, based on the phrasing of your prompt. ChatGPT presents, using confident and authoritative language, total falsehoods and deceptive half-truths, after parsing human-written originals, be the latter natural language text, or source code. I now only trust ChatGPT to recommend sources to me, and I read those -- especially the relevant-looking parts -- myself. ChatGPT has been tuned by its masters to be a lying sack of shit.

I've recently asked ChatGPT a factual question: I asked it about the identity of a public figure (an artist) whom I had seen in a video on youtube. ChatGPT answered with "Person X", and even explained why Person X's contribution was so great to the piece of art in question. I knew the answer was wrong, so I retorted only with: "Source?". Then ChatGPT apologized, and did the exact same thing, just with "Person Y"; again explaining why Person Y was so influental in making that piece of art so great. I knew the answer was wrong still, so I again said: "Source?". And at third attempt, ChatGPT finally said "Person Z", with a verifiable reference to a human-written document that identified the artist.

FUCK ChatGPT.

OptionOfT•2mo ago

I think the review by the prompt writer should be at a higher level than another person who reviews the code.

If I know how to do something, it is easier for me to avoid mistakes while doing it. When I'm reviewing it it requires different pathways in my brain. Since there is code out there I'm drawn to that path, and I might not not always spot the problem points. Or code might be written in a way that I don't recognize, but still exhibits the same mistake.

In the past, as a reviewer I used to be able to count on my colleagues' professionalism to be a moat.

The size of the moat is inverse to the amount of LLM generated code in a PR / project. At a certain moment you can no longer guarantee that you stand behind everything.

Combine that with the push to do more faster, with less, meaning we're increasing the amount of tech debt we're taking on.

leecommamichael•2mo ago

What is the downside of using them to prototype? to generate throwaway code? What do we lose if we default to that behavior?

j2kun•2mo ago

Read the article, which discusses this already, and maybe respond to that.

leecommamichael•2mo ago

I faithfully gave it another read and the article does not explore my question.

gpm•2mo ago

Time wasted on failed prototypes? Understanding that could have been generated by the act of prototyping?

Doesn't mean you shouldn't ever do so, but there are tradeoffs that become obvious as soon as you start attempting it.

davexunit•2mo ago

> Large language models (LLMs) are an indisputable breakthrough of the last five years

Actually a lot people dispute this and I'm sure the author knows that!

Tiny C Compiler

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Eigen: Building a Workspace

Al Lowe on model trains, funny deaths and working with Disney

The F Word

Start all of your commands with a comma (2009)

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

The AI boom is causing shortages everywhere else

Selection rather than prediction

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Learning from context is harder than we thought

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

72M Points of Interest

Tiny C Compiler

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Craftplan – Elixir-based micro-ERP for small-scale manufacturers

Eigen: Building a Workspace

Al Lowe on model trains, funny deaths and working with Disney

The F Word

Start all of your commands with a comma (2009)

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

The AI boom is causing shortages everywhere else

Selection rather than prediction

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Learning from context is harder than we thought

Where did all the starships go?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Hackers (1995) Animated Experience

72M Points of Interest

Using LLMs at Oxide

Comments