This was with a relatively small neural network fine-tuned on a relatively tiny dataset of 33k images of faces from a dating profile site.
If I had a million dollars I'd gladly wager it that some company with a deep dataset, like Google, could create a 99% or better profiler that goes just off a video of someone's face (not a single still image, but I'd bet that single image profiler could beat 90%)
Transformers allow for a nearly arbitrary vector length for feature space - if sexuality correlates at all to any of a million different facial features, then neural networks will be able to detect it. If you're doing a binary "straight or not" test, without distinguishing between all the values of "not-straight" , then you could use a very shallow, very wide transformer architecture with a million features, and train it on a consumer card, and get accuracy in the 90% range.
That initial study had technical flaws, not least of which was the binary classification of gay and straight, and only using white people. Technically, they used a base model, VGG-Face, which had a 4096 feature model and 17 convolutional layers.
Human accuracy was rated about 50%, and was effectively a coin toss with a slight accuracy advantage for women.
That's less powerful than something like nano-gpt. GPT-2 is orders of magnitude more complex and has a much higher degree of capability.
If you did this with nuance and skill and high technical savvy, with a sophisticated model of sexual preferences (not the 1950's notion of straight or not straight) you could get a very accurate and deeply creepy piece of software.
This works for emotions, nonverbal communications, truthfulness, etc. Biometrics can provide a terrifyingly deep analysis of things you consider private and hidden but which nonetheless present in unintended evidence available for analysis.
If you had a few hundred of these types of analyzers - say, for psychological factors, fitness, health issues, sexuality, political preference, etc, etc, then you could not only get a highly accurate snapshot of people through deanonymized bulk surveillance data freely available on the market, you could then create LLM models tuned specifically to the features and preferences of each individual, and then use A/B testing on your virtual populations to maximize engagement, force specific reactions and behaviors in response to media (timing, pacing, content, framing) , and so on, and so forth.
We're not nearly as inscrutable, private, or resilient as many people think, and there's all sorts of data being misused already. Maybe we should get that universal digital bill of rights thing going before BlackRock or Honeywell or the DNC decide to go all in on AI.
edit: To clarify, I'm not cheering this stuff on. No university would allow the study, and most companies would open themselves up to significant legal scrutiny if such a thing was ever used and they got caught, but this is a weekend project for a quant at a big firm - it'll cost them 20 hours and a case of red bull, with all the AI infrastructure out there, and the time, knowledge, effort, and cost to achieve things like this are dropping fast.
key conditional embedded deeply in that comment.
It's not a physiogonomy trope or a statement that straight people have different fundamental facial features, that they grow differently - it's the macro and micro expressions, the behaviors, the style and presentation choices, and how those intentional active features play out on the substrate of the individual's facial structure. A small video snip is going to communicate a very large amount of information. TikTok could do this - and then create another model that inferred psychology and sexuality based on watch patterns, and yet another model that describes how different types interact and network, and yet another model that describes how information propagates through various networks, and so on, and so forth. Through differential analysis and repeated refinement of various models, you get to some very intrusive and scary places.
Anyway, /ramble. We need a digital bill of rights.
Wonder why they mention "while male job candidates" specifically? Seems a bit odd.
The paper: https://insights.som.yale.edu/sites/default/files/2025-01/AI...
Ah yes, Yale going back to its eugenics roots https://www.antieugenicscollective.org I am somehow not surprised.
> Yale faculty, alumni and administrators helped found the American Eugenics Society in the 1920s and brought its headquarters to the New Haven Green in 1926.
Not odd at all; it is to remove an obvious bias of recognizing race.
I am supportive of the effort, but this seems to snipe at a trait that is (to me) intended to remove a point where bias would clearly enter.
It is odd because that means they already had to separate the dataset into various races, and we know how well that works. What specific shade of skin are they picking for their threshold. Are they measuring skull sizes to pick and choose? Isn't that back to "phrenology" and eugenics. Then, how do they define "men" and and "women"? Maybe someone is neither but now they are stuck labeled in a category they do not want to be in.
Correlation… does not mean… causation.
Pretty people generally do better in life, because people are nicer, more receptive, and more trusting of good looking people.
This of course correlates to earnings.
This does not, however, correlate to performance - earnings are a poor proxy for performance in general.
So if this paper is taken seriously, even computers will be biased towards pretty people, and the spiral tightens.
Average CEO height is six feet, so that must mean tall applicants must inherently have a better chance at doing well, right?
I have questions. How do facial expression, clothes, and hairstyle impact the model’s predictions? How about Facetune and insta filters? Would putting a clickbaity YouTube thumbnail at the top of my resume make me more employable?
This lines up with what I once heard “second hand” from faculty at a business school about publishing in academic business journals. It was something along the lines of being a bunch of dancing monkeys pumping out entertaining, to readers of HBR and such, content.
Otherwise they would've have just lead with the "fact" instead of speculation (which is most of what legacy news traffics in these days).
mouse_•4h ago
CUViper•3h ago
hshdhdhehd•2h ago
Questions:
1. Should they do it (morally)? No
2. Should they do it (proft motive)? No
3. Can they do it (legally)? probably not.
4. Can they do it (technologically)? Yes
5. If they do it is it accurate? (See palm reading...)