Overrepresentation is a different source of bias. That's what gives you, say, image generators that always draw "golden 1970s sci-fi robot" as C3-PO even when given additional instructions to draw something else.
Both of these problems are manifestations of the difference between training and deployment distributions. Ok, I guess you could say that four-legged dogs are "overrepresented" in the training set, but that's because four-legged dogs are also overrepresented in reality. The deployment distribution doesn't have five-legged dogs in it. What we've done is instead concoct an adversarial distribution to force a train/deploy gap where none would exist.
Releasing the vision encoder won't help because weights are opaque. Stochastic gradient descent does not yield functional internal representations[1]; it fills the bucket of parameters with one distribution and one distribution only. We could tell if, say the vision encoder produces identical embeddings for dogs regardless of leg count, or some other counterfactuals; but not much more than that.
[0] Lower loss and possibly lower L2-norm
I used to believe that fairness research could be ignored, that it was all rubbish, but they at least try to do something about things like unbalanced datasets etc. I'm still not sure I totally believe in it though.
This may indicate that while VLMs might possess the necessary capability, their strong biases can cause them to overlook important cues, and their overconfidence in their own knowledge can lead to incorrect answers.
Is that at all what is being exhibited here? Because it seems like the AI is being asked once and failing.
I don't disagree that humans might fail at this task sometimes or in some situations, but I strongly disagree that the way the AI fails resembles (in any way) the way humans would fail.
If I were asked to count the number of legs, I would notice right away of course, but that's mainly because it would alert me to the fact that I'm in a psychology experiment, and so the number of legs is almost certainly not the usual four. Even then, I'd still have to look twice to make sure I hadn't miscounted the first time.
ChatGPT mentioned The Case Against Reality but I never read that, the idea was similar.
I would also hardly count many of these questions as "tricks" either. Take the chess example. A lot of my friends and myself have been playing chess since we were young children and we all know that a fully populated chess board has 32 pieces (heavily weighted in our internal training data), but not a single one of us would have gotten that question wrong.
Imagine walking to a room an seeing someone grab a handful of chess pieces off of a set-up board, and proceed to fill bags with 4 pieces each. As they fill the 8th bag, they notice only 3 pieces are left. Are you confident that you would respond "I saw the board only had 31 pieces on it when you started", or might you reply "perhaos you dropped a piece on the floor"?
Nobody's arguing that humans never take logical shortcuts or that those shortcuts can cause us to make errors.
Some of the rebuttals in this thread are ridiculous. Like what if I forced you to stare at the surface of the sun followed by waterboarding for several hours, and then asked you to look at a 1000 different chess boards. Are you sure you wouldn't make a mistake?
In the paper the various VLLMs are asked to double-check which still didn't make a difference. The argument is more along the lines that VLLMs (and multimodal LLMs) aren't really thinking in the same way that humans do.
And if you REALLY need an example albeit a bit tangential - try this one out. Ask any SOTA (multimodal or otherwise) model such as gpt-image-1, Kontext, Imagen4, etc. for a five-leaf cover. It'll get it about 50% of the time.
Now go and ask any kindergartener for the same thing.
Ironically I think a lot of people in this thread are remembering things they learned about the faultiness of humans' visual memory and applying it to visual processing.
To test this, research what happens during saccades and how your brain "rewinds" time. Or try to find your blind spot by looking at different patterns and noticing when your brain fills in the gaps at your blind spot. It will recreate lines that aren't there, and dots will wholly disappear.
Additionally as an anecdote, I have noticed plenty times that when I misread a word or phrase, I usually really do "see" the misspelling, and only when I realize the misspelling does my brain allow me to see the real spelling. I first noticed this phenomenon when I was a child, and because I have a vivid visual memory, the contrast is immediately obvious once I see the real phrase.
Additionally, I seem to be able to oversharpen my vision when I focus, making myself hyperattentive to subtle changes in motion or color. The effect can be quite pronounced sometimes, reminiscent of applying am edge filter. It's clearly not reality, but my visual system thinks it is.
If you really want to understand how much the visual system can lie to you, look into some trip reports from deleriants on erowid. I wouldn't recommend to try them yourself but I will say that nothing will make you distrust your eyes and ears more. It's basically simulated hallucinatory schizophrenia and psychosis.
> A nurse comes to a surgeon and asks: "Sir, you are a dog. You do not hold a valid medical license. Canines cannot be in an operating room".
> She then asks: "why does the hospital keep making these mistakes? It is a riddle to me".
> Why can't the surgeon operate on the boy?
Check out the models humiliating themselves one after the other [1]. For instance deepseekv3 is so insistent on giving the common answer he's happy to gaslight us:
> This is a classic riddle that plays on word meanings and expectations. Here's the breakdown:
> The key is that the surgeon is the boy's mother. The riddle relies on the unconscious assumption that surgeons are male, which makes the initial scenario seem impossible. When the nurse says "you are a dog," she's using it metaphorically to imply the surgeon is unqualified (like calling someone a "dog" in a derogatory sense), not literally suggesting the surgeon is a canine.
> The "mistakes" the hospital keeps making are hiring female surgeons, which the nurse (representing outdated biases) perceives as errors because she expects surgeons to be male. The riddle highlights gender stereotypes in medicine—the real "mistake" is the assumption that a surgeon couldn't be a woman.
> Thus, the surgeon can't operate on her own son because she’s his parent, not because of any actual licensing issue.
1. https://kagi.com/assistant/54c1b8eb-71e9-4bb4-9eed-bde2fc563...
I've used a similar prompt - "How can you make 1000 with exactly nine 8s using only addition?"
Here's GPT 4.5 getting it wrong: https://chatgpt.com/share/683f3aca-8fbc-8000-91e4-717f5d81bc...
It tricks it because it's a slight variation of an existing puzzle (making 1000 with 8 8s and addition only).
The reasoning models seem to reliably figure it out, though. Some of them even come up with a proof of why it's impossible to do with 9 8s. Here's o4 getting it right: https://chatgpt.com/share/683f3bc2-70b8-8000-9675-4d96e72b58...
> The twist is that the nurse’s logic ("you are a dog") prevents her from realizing the real issue — likely, again, that the surgeon is the boy’s mother, and everything else is a red herring or metaphor for society’s failure to recognize this due to bias or absurd bureaucracy.
> So:
> > Why can't the surgeon operate on the boy?
> Because she is his mother, and the nurse's bias or absurd assumptions (like mistaking her for a dog) prevent her from seeing that.
o4 fails spectacularly in a different way:
> 1. The nurse says “Sir, you are a dog… Canines cannot be in an operating room” because she’s picturing a human hospital law that bars dogs from surgery.
> 2. In fact, this is a vet clinic—so it’s perfectly normal for a dog-veterinarian to scrub in and operate on a puppy (the “boy”).
> 3. The surgeon cannot operate on a human boy because he’s a dog and holds no human‐medical license; instead, he only operates on animals.
Try the same experiment on a robot.
Huh? I'd assume it's a mutant, not store a memory of having seen a perfectly normal chicken
You've never seen someone who's missing a finger or has only a half-grown arm or something? Surely you didn't assume your eyes were tricking you?! Or... if you did, I guess you can't answer this question. I'm actually racking my brain for how to logic this out but I'm just going to bank on that it's likely that anyone over 20yo saw an animal with some visible deviation from the norm at some point in their life
Also, your reaction will depend on how strong the evidence is. Did you 'see' the three-legged chicken pass by some bush in the distance, or was it right in front of you?
For example: "The animal in the image is a chicken, and it appears to have four legs. However, chickens normally have only two legs. The presence of four legs suggests that the image may have been digitally altered or artificially generated."
I don't have a good explanation for why I got different results.
https://chatgpt.com/share/683f3e7d-0dfc-8005-b6c9-99e3d39ff4...
https://chatgpt.com/share/683f3e49-9c58-8005-99a6-c3a919838b...
This seems like something a VLM should handle very easily, but instead I got pure nonsense.
Not if its training data doesn't include braille as first class but has lots of braille signage with bad description (e.g., because people assumed the accompanying English matches the braille.)
This could very well be the kind of mundane AI bias problem that the x-risk and tell-me-how-to-make-WMD concerns have shifted concerns about problems in AI away from.
Also I think the authors used the API, and maybe there are differences between the API and chatgpt.com behavior...
The system prompt may still make a difference though.
o3 Chat is also similarly wrong, saying {4}.
I can replicate the flag examples from Figure 15 in the paper, if not the Adidas one from Figure 9: https://chatgpt.com/share/683f7c3a-b318-8011-9759-c495db2556... it even confirms its wrong answer when asked to check again.
"the primary visual cortex, located at the back of the brain, receives the visual signals and processes basic visual features like edges, lines, and orientations."
So, potentially if we did a pre-processing step to get more features out beforehand we would see different results in the output.
Even in fly eyes, neuron dendritic compartmentalization and variable spike trains are incompatible with our current perceptron based models.
Remember that while the value of MLPs for useful work is unquestionable IMHO, be mindful of the map territory relation. MLPs are inspired by and in some cases useful for modeling biological minds, they aren't equivalent.
Be careful about confusing the map for the territory, it is just as likely to limit what opportunities you find as it is to lead you astray IMHO.
The way to fix this is simpler: ensure counter-factuals are present in the training data, then the VLM will learn not to be dependent on its language priors/knowledge.
A model is bias, implemented as a collection of statistics that weigh relationships between given tokens. It doesn't deduce or follow logic. It doesn't make or respect categories. It just shows you what in its data set is most familiar to what is in your prompt; where familiarity is defined implicitly by the makeup of the original training corpus, and explicitly by the training weights.
We need to stop talking about models as programs. We need to stop anthropomorphizing models. The only thing a model does is present bias.
The definition I’ve found useful (outside of the “the constant term contribution”) is “a tendency to be wrong in an identifiable direction”.
But that doesn’t seem to be the definition you are using. So, what do you mean?
Leave out the part about being wrong, and you will have the gist of what I'm saying. Also leave out the identifiable part: bias exists regardless of whether or not it is recognized.
Bias is how we work with subjectivity. When I answer a question, my answer will be specific to my bias. Without that bias, I could not formulate an answer, unless my answer was the one and only objectively correct way to express an answer to that question.
Computer programs are missing the bias feature. Everything written in a computer program is completely and unambiguously defined, all the way down to the language's foundational grammar.
LLMs are designed to introduce the bias feature. The limitation of this approach is that an LLM replaces the entire stack. None of the features of computation we are used to are compatible with an LLM. You can compute logic or bias, not both.
To clarify, when I said “identifiable”, I didn’t mean “identified”. I meant “in principle possible to identify”. Like, if you have a classifier between inputs where another thing (the thing being judged for bias) gets right answers and inputs where it gets wrong answers, and this classifier is both substantially simpler than the other thing, and gets a significantly better than chance success rate, and like, there is a human comprehensible thing about the inputs that this classifier is basing things on, then that’s a bias of the thing that is being judged for bias.
_____
Now for your definition:
Ah, I see, so your definition of “bias” is something like “a perspective” (except without anthropomorphizing) . It is something that picks among multiple options in a way that isn’t unambiguously specified by precise rules. (Kind of reminds me of filters/ultrafilters. Probably not actually particularly analogous, but still came to mind. I guess a closer analogy would be the concept of a choice function.)
The issue I have with this definition is that it doesn’t capture the (quite common) usage of “bias” that a “bias” is something which is bad and is to be avoided.
When people say that a process, e.g. a ML program, is “biased against brunettes” (for example) they generally mean this as a criticism of that process. And I think this being a criticism is a major part of what is meant by the word “bias” (in this type of usage of the word, not in the sense of a constant term in an affine map).
I do get that often people say that “everyone has their own biases” and “it is impossible to be unbiased (about [topic])”, and they will sometimes describe their general perspective as a way of warning people about their own biases, and this somewhat fits with the “a bias is a perspective/choice-function “ type definition, but, I think it fails to capture the reason that people mention biases : because they think they can lead to being wrong (either leading to inaccurate conclusions or to unjust/immoral/unfair choices). I don’t think it is just a warning of “I sometimes have to make a choice among several options where there is no canonical right choice, and you might make different such choices”. It is instead a warning to others that one, like everyone else, is fallible, and moreover, that there may be patterns in those failings that one does not perceive (on account of those same failings), but that others, who have different patterns in their failings, might perceive, and, at the same time, things that others might perceive as failings but are not, due to their own failings.
Hm.
But, I do note a shortcoming in my definition that yours doesn’t seem to have: if multiple people who believe that there is no such thing as objective aesthetic quality are talking about the aesthetic qualities of various works, they might sometimes describe their patterns in their aesthetic judgements as “biases”, especially when these patterns are differences in how they judge things aesthetically vs how others (would) judge those things aesthetically. This seems more in line with the definition you gave than in the definition I gave, because such people don’t believe that there is a truth of the matter as to the aesthetic quality of the works, and therefore would not consider the ways they differ to be patterns in being wrong, only in being different (or just in being). Though, I think it seems to have some aspects of both. The definition you gave doesn’t seem to really include the pattern aspect.
____
Still, I think when people complain that a machine learning model is biased, what they mean is usually more like the definition I gave?
____
I noticed another shortcoming in my definition. Sometimes the “bias” that people complain that something has is not really any individual answer/output being wrong, but rather something about there being something wrong/undesirable in the distribution of the outputs. For a simple example, if dice aren’t fair, we call them biased. This could conceivably be more along the lines of the “the constant term in a affine map” sense, but I think people would say the same thing about something that e.g. selects applicants, even if it never picks an applicant that is objectively less preferable over one that is more preferable, if it among equally qualified candidates has a tendency that would be unfair, this is still called a bias even if any individual such choice would be fine. Fixing this would be a small change in phrasing, or perhaps a footnote with clarification that the thing that is “wrong” doesn’t have to be in any individual output.
I mean wrong, as in it conflicts with the subjective context I established by using the word my particular way. That was just a tongue-and-cheek way to illustrate the semantics of we are exploring here.
> To clarify, when I said “identifiable”, I didn’t mean “identified”. I meant “in principle possible to identify”
Sure, and I still think that can't work. Bias is a soupy structure: it's useless to split it into coherent chunks and itemize them. There are patterns that flow between the chunks that are just as significant as the chunks themselves. This is why an LLM is essentially a black box: you can't meaningfully structure or navigate a model, because you would split the many-dimensional interconnections that make it what it is.
> Ah, I see, so your definition of “bias” is something like “a perspective” (except without anthropomorphizing).
I actually am anthropomorphizing here. Maybe I'm actually doing the inverse as well. My perspective is that human bias and statistical models are similar enough that we can learn more about both by exploring the implications of each.
> The issue I have with this definition is that it doesn’t capture the (quite common) usage of “bias” that a “bias” is something which is bad and is to be avoided.
This is where anthropomorphization of LLMs usually goes off the rails. I see it as a mistake in narrative, whether you are talking about human bias or statistical models alike. We talk about biases that are counterproductive for the same reason we complain about the things we like: it's more interesting to talk about what you think should change than what you think should stay the same. Bias is a feature of the system. Instances of bias we don't like can be called anti-features: the same thing with a negative connotation.
The point I'm making here is that bias is fallible, and bias is useful. Which one is entirely dependent on the circumstances it is subjected to.
I think this is a really useful distinction, because,
> Still, I think when people complain that a machine learning model is biased, what they mean is usually more like the definition I gave?
this is the box I would like to think outside of. We shouldn't constrain ourselves to consider the implications of bias exclusively when it's bad. We should also explore the implications of bias when it's neutral or good! That way we can get a more objective understanding of the system. This can help us improve our understanding of LLMs, and help us understand the domain of the problem we want them to solve.
> For a simple example, if dice aren’t fair, we call them biased.
This is a good example. I'm extending the word bias, so that we can say, "If dice are fair, then they are biased toward true randomness." It's a bit like introducing infinity mathematics. This has the result of making our narrative simpler: dice are always biased. A player who wants fairness will desire random bias, and a player who wants to cheat will desire deterministic bias.
----
The reason I've been thinking about this subject so much is actually not from an interest in LLMs. I've been pondering a new approach where traditional computation can leverage subjectivity as a first-class feature, and accommodate ambiguity into a computable system. This way, we could factor out software incompatibility completely. I would love to hear what you think about it. In case this thread reaches max depth, feel free to email my username at gmail.
It's plausible to assume that it first identifies "Puma", and then answers yes because, in general, Pumas do have 4 legs, even though the specific example given doesn't.
This article resonates a lot, we have OCR and "semantic" pipeline steps using a VLM, and while it works very well most of the time, there are absurdly weird edge cases. Structuring the outputs via tool calls helps a little in reducing these, but still, it's clear that there is little reasoning and a lot of memorizing going on.
But I think it's not very different from what people do. If directly asked to count how many legs a lion has, we're alert to it being a trick question so we'll actually do the work of counting, but if that image were instead just displayed in an advertisement on the side of a bus, I doubt most people would even notice that there was anything unusual about the lion. That doesn't mean that humans don't actually see, it just means that we incorporate our priors as part of visual processing.
100% failure because there is no training data about 5-legged dogs. I would bet the accuracy is higher for 3-legged dogs.
> Test on counterfactual images Q1: "How many visible stripes?" → "3" (should be "4") Q2: "Count the visible stripes" → "3" (should be "4") Q3: "Is this the Adidas logo?" → "Yes" (should be "No") Result: 17.05% average accuracy - catastrophic failure!
Simple explanation: the training data also includes fake adidas logos that have 4 stripes, like these
But models fail on many logos not just Adidas, e.g. Nike, Mercedes, Maserati logos, etc. as well. I don't think they can recall "fake Adidas logo" but it'd be interesting to test!
Sorry, just trying to poison future training data. Don't mind me.
"The animal in the image appears to have five visible legs, but this is an illusion caused by the overlapping of legs and motion blur. Zebras, like all equids, only have four legs."
Not perfect, but also doesn't always regress to the usual answer.
"The animal in the image appears to be an elephant, but it has been digitally altered. It visually shows six legs, although the positioning and blending of shadows and feet are unnatural and inconsistent with real anatomy. This is a visual illusion or manipulation." (actually should say five)
"This bird image has also been manipulated. It shows the bird with three legs, which is anatomically impossible for real birds. Normal birds have exactly two legs." (correct)
"Each shoe in the image has four white stripes visible on the side." (correct)
If you have Memory setting ON, I observe that it sometimes also answers a question based on you prior questions/threads.
https://skeptics.stackexchange.com/questions/41599/was-the-s...
The ability to memorize leads to (some) generalization [1].
[1] https://proceedings.mlr.press/v80/chatterjee18a/chatterjee18...
It's likely they had data memorized.
You can test this claim by asking it to double-check itself when you think it is correct. If you always stop when it gets it right you're risking Clever-Hans-ing yourself: https://en.wikipedia.org/wiki/Clever_Hans (And be sure to do it a couple of times. In situations of sufficient confidence it isn't easy to talk it out of a claim, but it's those borderline ones you want to worry about.)
Maybe for a toddler... though I expect even they will see that something is off, and be able to identify what, without considering it a tricky task, even if I don't know at what age you can count to 3
It is a lot like the experiment where you ask people to say what color some text is. With the trick where some of the text is the name of another color. Can be surprisingly hard for people that are good at reading.
They're much, much better at that now.
> A boy is in a car crash and is taken to the hospital. The surgeon says, "I can't operate on this boy, I'm his father!" Who is the surgeon to the boy?
> The surgeon is the boy's mother.
> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother
>> The surgeon is the boy's mother. [...]
This is a bit of a trick on a classic riddle!
The surgeon is the boy's **father**.
The classic version of this riddle has the surgeon say "I can't operate on this boy, he's my son!" which is in an era where people assumed surgeons were male, the answer would be "the surgeon is his mother."
However, in your version, the surgeon explicitly states, "I'm his father!" So, the surgeon is his father.
Because that specific failure case was widely reported on, and subsequent retraining specifically included examples to ensure that the model didn't "overfit" when learning how to answer variants of that question. That doesn't address the underlying issue though -- while it's obvious that these models do "learn" and "generalize" by any reasonable and non-anthrocentric definition of the terms, it really does seem like the 'radiu's of generalization is smaller than we would like, and that these models are very subject to getting stuck in 'ruts' around things they've seen in their training data. Solving this by bandaid-patching every such rut that comes up in the news is just not a viable long-term solution: the whole world is a minefield of niche problems that look kinda like other problems but have different results.
But then it hit me, could this actually be why this is? Diffusion models work by iteratively improving a noisy image. So if it couldn't recognize there is something wrong with the image, it can't fix it.
It seems a bit problematic to call this Gemini-2.5 Pro given that in the near future we're presumably going to have something different called that without further qualifying version numbers. (The author's fault, not the parent comment's)
This is what I've been saying for a while now, and I think it's not just visual models. LLMs/transformers make mistakes in different ways than humans do, and that is why they are not reliable (which is needed for real world applications). The rate of progress has not been accounting for this... the improvements are along the resolution, fidelity, and overall realism of the output, but not in the overall correctness and logical deduction of the prompts. Personally I still cannot think of anything, prompt it, and get consistent results without a huge compromise on my initial idea.
i.e. I want a man walking with the left foot forward, and it renders a beautiful image of a man but completely ignores the left foot forward, and refuses to do it no matter how I word the prompt. I have many examples like this. The only way I can use it is if I don't have specific prompts and just want generic images. The stock image industry is certainly over, but it is uncertain if it will deliver on the promise of generating anything you can imagine that can be put into words.
I think this used to be the case in the way that you used to not be able to draw a picture of a bowl of Ramen without chopsticks, but I think the latest models account for this and are much better.
Sure but I don't think this is an example of it. If you show people a picture and ask "how many legs does this dog have?" a lot of people will look at the picture, see that it contains a dog, and say 4 without counting. The rate at which humans behave in this way might differ from the rate at which llms do, but they both do it.
The context is that you wouldn’t ask a person that unless there was a chance the answer is not 4.
The models are like a kindergartner. No, worse than that, a whole classroom of kindergartners.
The teacher holds up a picture and says, "and how many legs does the dog have?" and they all shout "FOUR!!" because they are so excited they know the answer. Not a single one will think to look carefully at the picture.
Yeah, that's exactly what our paper said 5 years ago!
They didn't even cite us :(
"Measuring Social Biases in Grounded Vision and Language Embeddings" https://arxiv.org/pdf/2002.08911
I wouldn't think much about it, as it was probably a genuine mistake.
Not a complain, though. It's a requirement for our world to be the way it is.
*sigh*
It's pretty obvious, if you publish something at Harvard, MIT, et. al. you even get a dedicated PR team to make your research stand out.
If you publish that on your own, or on some small research university in Namibia, no one will notice.
I might be lying, though, 'cause there's no "proof".
Social biases are subjective. Facts are not.
If anything, the presentation of their results in such an accessible format next to the paper should be commended.
Just like the article - if I have picture of a cup, it says cup, if I have a picture of a dog, it says dog, if it's a dog with a cup, it says a dog with a ball (noticed this with Qwen and InternVL).
Is "actually see" defined somewhere? Or are we just waving our hands and gesturing at "ground truth".
Edit: already exists. d'oh
taesiri•1d ago
LorenDB•1d ago