An LLM is not unbiased, and you would know that if you tested LLMs.
Apart from biases, an LLM is not a reliable oracle, you would know that if you tested LLMs.
The reliabilities and unreliabilities of LLMs vary in discontinuous and unpredictable ways from task to task, model to model, and within the same model over time. You would know this if you tested LLMs. I have. Why haven’t you?
Ideas like this are promoted by people who don’t like testing, and don’t respect it. That explains why a concept like this is treated as equivalent to a tested fact. There is a name for it: wishful thinking.
Given the economic component of LLM wishes, we can look at prior instances of wishing-at-scale, https://en.wikipedia.org/wiki/Tulip_mania
I don't think I made the point very clear in the blog (I will rectify that), but I am saying that because LLMs are so easily biased by their prompting that they sometimes perform better when doing black box testing tasks than they do when performing white box testing.
I don't want to have a big argument about this right at this moment. But-- truly-- thank you for replying!
Also agree on the specification formality. Even a less formal spec provides a clearer boundary for the LLM during code generation, which should improve code generation results.
They are biased by the training dataset, which probably also reflects the biases of the people who select the training dataset
They are biased by the system prompts that are embedded into every request to keep them on the rails
They are even biased by the prompt that you write into them, which can lead them to incorrect conclusions if you design the prompt to lead them to it
I think it is a very careless mistake to think of LLMs as unbiased or neutral in any way
When I talk about "unbiased oracles" I am speaking in the context of black box testing. I'm not suggesting they are free from all forms of bias. Instead, the key distinction I'm trying to draw is their lack of implementation-level bias towards the specific code they are testing.
First of all you'll note that all people are also biased by the Exact same reasoning. You know this. Everyone knows that all people are biased. This isn't something you don't know.
So if every single intelligence, human or not is biased. What is this article truly talking about? The article is basically saying LLMs are LESS biased then humans. Why are LLMs less biased then humans? Well maybe because the training set in an LLM is less biased then the training set given to a human. This makes sense right? A human will be made more biased by his individual experience and his parents biases while an LLM is literally inundated with as many sources of textual information as possible with no attempt at bias due to the sheer volume of knowledge they are trying to shove in there.
The article is basically referring to this.
But you will note interestingly that LLMs bias towards textual data more. They understand the world as if they have no eyes and ears and only text. So the way they think reflects this bias. But in terms of textual knowledge I think we can all agree, they are Less biased then humans.
Evidence: an LLM is not an atheist or a theist or an agnostic. But you, reader, are at the very least one of those three things.
Jensson•6h ago
If one of these tests are wrong though it will ruin the whole thing. And LLM are much more likely to make a math error (which would result in a faulty test) than to implement a math function the wrong way, so this probably wont make it better at generating code.
MarcoDewey•2h ago
The bet that I am making is that the system reduces its error rate by splitting a broad task into two more focused tasks.
However, it is possible that generating meaningful test cases is a harder problem (with a higher error rate) than producing code. If this is the case, then this idea I am presenting would compound the error rate.