UP: It lets us state intent in plain language, specs, or examples. We can ask the model to invent code, tests, docs, diagrams—tasks that previously needed human translation from intention to syntax.
BUT SIDEWAYS: Generation is a probability distribution over tokens. Outputs vary with sampling temperature, seed, context length, and even with identical prompts.
I think the tricky part is that we tend to think that prompts with similar semantic meaning will give the same outputs (like a human), while LLMs can give vastly different outputs if you have one spelling mistake for example, or used "!" instead of "?", the effect varies greatly per model.
To your second part I wouldn't make that assumption - I can see how a non-technical person might, but surely programmers wouldn't? I've certainly produced very different output from that which I intended in boring old C with a mis-placed semi-colon after all!
Trust me, this response would have been totally different if I were in a different mood.
I don't think that's how you should think about these things being non-deterministic though.
Let's call that technical determinism, and then introduce a separate concept, practical determinism.
What I'm calling practical determinism is your ability as the author to predict (determine) the results. Two different prompts that mean the same thing to me will give different results, and my ability to reason about the results from changes to my prompt is fuzzy. I can have a rough idea, I can gain skill in this area, but I can't gain anything like the same precision as I have reasoning about the results of code I author.
I do hope he takes the time to get good with them!
I dunno, sometimes it's helpful to learn about the perspectives of people who've watched something from afar as well, especially if they already have broad knowledge and context that is adjacent to the topic itself, and have lots of people around them deep in the trenches that they've discussed with.
A bit like historians still can provide valuable commentary on wars, even though they (probably) haven't participated in the wars themselves.
It's also a huge barrier to adoption by mainstream businesses, which are used to working to unambiguous business rules. If it's tricky for us developers it's even more frustrating to end users. Very often they end up just saying, f* it, this is too hard.
I also use LLM's to write code and for that they are a huge productivity boon. Just remember to test! But I'm noticing that use of LLM's in mainstream business applications lags the hype quite a bit. They are touted as panaceas, but like any IT technology they are tricky to implement. People always underestimate the effort necessary to get a real return, even with deterministic apps. With indeterministic apps it's an even bigger problem.
Counting tokens is the only reliable defence i found to this.
It would make sense to me for the chat context to raise an exception. Maybe i should read the docs further…
Not every endpoint works the same way, I'm pretty sure LM Studio's OpenAI-compatible endpoints will silently (from the clients perspective) truncate the context, rather than throw an error. It's up to the client to make sure the context fits in those cases.
OpenAI's own endpoints do show an error and refuses if you exceed the context length though. I think I've seen others use the "finish_reason" attribute too to signal the context length was exceeded, rather than setting an error status code on the response.
Overall, even "OpenAI-compatible" endpoints often aren't 100% faithful reproductions of the OpenAI endpoints, sadly.
What do you do if you want to support multiple models in your LLM gateway? Do you throw an error if a user sets temperature for o3, thus dumping the problem on them? Or just ignore it, but potentially creating confusion because temperature will seem to not work for some models?
Not actually true. Fuzzing and mutation testing have been here for a while.
Otherwise yeah, there are a bunch of non-deterministic technologies, processes and workflows missing, like what Machine Learning folks been doing for decades, which is also software and non-deterministic, but also off-topic from context of the article, as I read it.
This is not the first rodeo of our profession with non-determinism.
Languages are created to support both computers as well as humans. And to most humans, abstractions such as those presented by, say, Hibernate annotations, are as non-deterministic as can be. To the computer it is all the same, but that is increasingly becoming less relevant, given that software is growing and has to be maintained by humans.
So, yes, LLMs are interesting, but not necessarily that much of a game-changer when compared to the mess we are already in.
oytis•3d ago
Even if we assume there is value in it, why should it replace (even if in part) the previous activity of reliably making computers do exactly what we want?
dist-epoch•4h ago
darkwater•3h ago
dist-epoch•1h ago
A contrived example: there are only 100 MB of disk space left, but 1 GB of logs to write. LLM discards 900 MB of logs and keeps only the most important lines.
Sure, you can nitpick this example, but it's the kind of edge case handling that LLMs can "do something resonable" that before required hard coding and special casing.
oytis•3h ago
dist-epoch•1h ago
So you trade reliability to get to that extra 20% of hard cases.
pydry•49m ago
When I watch juniors struggle they seem to think that it's because they dont think hard enough whereas it's usually because they didnt build enough infrastructure that would prevent them from needing to think too hard.
As it happens, when it comes to programming, LLM unreliabilities seem to align quite closely with ours so the same guardrails that protect against human programmers' tendencies to fuck up (mostly tests and types) work pretty well for LLMs too.
furyofantares•24m ago
Maybe that does add up to solving harder higher level real world problems (business problems) from a practical standpoint, perhaps that's what you mean rather than technical problems.
Or maybe you're referring to producing software which utilizes LLMs, rather than using LLMs to program software (which is what I think the blog post is about, but we should certainly discuss both.)
kookamamie•1h ago
Insanity•1h ago
(Attaching too much value to the person instead of the argument is more of an ‘argument from authority’)
kookamamie•22m ago