Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)"

https://simonwillison.net/2025/Jul/11/grok-musk/

70•simonw•5h ago

Comments

rasengan•5h ago

In the future, there will need to be a lot of transparency on data corpi and whatnot used when building these LLMs lest we enter an era where 'authoritative' LLMs carry the bias of their owners moving control of the narrative into said owners' hands.

mingus88•4h ago

Not much different than today’s media, tbh.

rideontime•4h ago

One interesting detail about the "Mecha-Hitler" fiasco that I noticed the other day - usually, Grok would happily provide its sources when requested, but when asked to cite its evidence for a "pattern" of behavior from people with Ashkenazi Jewish surnames, it would remain silent.

xnx•5h ago

> It’s worth noting that LLMs are non-deterministic,

This is probably better phrased as "LLMs may not provide consistent answers due to changing data and built-in randomness."

Barring rare(?) GPU race conditions, LLMs produce the same output given the same inputs.

msgodel•5h ago

I run my local LLMs with a seed of one. If I re-run my "ai" command (which starts a conversation with its parameters as a prompt) I get exactly the same output every single time.

xnx•5h ago

Yes. This is what I was trying to say. Saying "It’s worth noting that LLMs are non-deterministic" is wrong and should be changed in the blog post.

boroboro4•4h ago

You’re correct in batch size 1 (local is one), but not in production use case when multiple requests get batched together (and that’s how all the providers do this).

With batching matrix shapes/request position in them aren’t deterministic and this leads to non deterministic results, regardless of sampling temperature/seed.

unsnap_biceps•4h ago

Isn't that true only if the batches are different? If you run exactly the same batch, you're back to a deterministic result.

If I had a black box api, just because you don't know how it's calculated doesn't mean that it's non-deterministic. It's the underlaying algorithm that determines that and a LLM is deterministic.

boroboro4•4h ago

Providers never run same batches because they mix requests between different clients, otherwise GPUs are gonna be severely underutilized.

It’s inherently non deterministic because it reflects the reality of having different requests coming to the servers at the same time. And I don’t believe there are any realistic workarounds if you want to keep costs reasonable.

Edit: there might be workarounds if matmul algorithms will give stronger guarantees then they are today (invariance on rows/columns swap). Not an expert to say how feasible it is, especially in quantized scenario.

lgessler•4h ago

In my (poor) understanding, this can depend on hardware details. What are you running your models on? I haven't paid close attention to this with LLMs, but I've tried very hard to get non-deterministic behavior out of my training runs for other kinds of transformer models and was never able to on my 2080, 4090, or an A100. PyTorch docs have a note saying that in general it's impossible: https://docs.pytorch.org/docs/stable/notes/randomness.html

Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk

simonw•4h ago

I don't think those race conditions are rare. None of the big hosted LLMs provide a temperature=0 plus fixed seed feature which they guarantee won't return different results, despite clear demand for that from developers.

xnx•4h ago

Fair. I dislike "non-deterministic" as a blanket llm descriptor for all llms since it implies some type of magic or quantum effect.

dekhn•3h ago

I see LLM inference as sampling from a distribution. Multiple details go into that sampling - everything from parameters like temperature to numerical imprecision to batch mixing effects as well as the next-token-selection approach (always pick max, sample from the posterior distribution, etc). But ultimately, if it was truly important to get stable outputs, everything I listed above can be engineered (temp=0, very good numerical control, not batching, and always picking the max probability next token).

dekhn from a decade ago cared a lot about stable outputs. dekhn today thinks sampling from a distribution is a far more practical approach for nearly all use cases. I could see it mattering when the false negative rate of a medical diagnostic exceeded a reasonable threshold.

basch•30m ago

I agree its phrased poorly.

Better said would be: LLM's are designed to act as if they were non-deterministic.

tanewishly•9m ago

Errr... that word implies some type of non-deterministic effect. Like using a randomizer without specifying the seed (ie. sampling from a distribution). I mean, stuff like NFAs (non-deterministic finite automata) isn't magic.

kcb•4h ago

FP multiplication is non-commutative.

boroboro4•4h ago

It doesn’t mean it’s non-deterministic though.

But it does when coupled with non-deterministic requests batching, which is the case.

labrador•4h ago

Musk has a good understanding of what people expect from AI from a science, tech and engineering perspective, but it seems to me he has little understanding of what people expect from AI from a social, cultural, political or personal perspective. He seems to have trouble with empathy, which is necessary to understand the feelings of other people.

If he did have a sense of what people expect, he would know nobody wants Grok to give his personal opinion on issues. They want Grok to explain the emotional landscape of controversial issues, explaining the passion people feel on both sides and the reasons for their feelings. Asked to pick a side with one word, the expected response is "As an AI, I don't have an opinion on the matter."

He may be tuning Grok based on a specific ideological framework that prioritizes contrarian or ‘anti-woke’ narratives to instruct Grok's tuning. That's turning out to be disastrous. He needs someone like Amanda Askell at Anthropic to help guide the tuning.

alfalfasprout•4h ago

> Musk has a good understanding of what people expect from AI from a science, tech and engineering perspective, but it seems to me he has little understanding of what people expect from AI from a social, cultural, political or personal perspective. He seems to have trouble with empathy, which is necessary to understand the feelings of other people.

Absolutely. That said, I'm not sure Sam Altman, Dario Amodei, and others are notably empathetic either.

labrador•4h ago

Dario Amodei has Amanda Askell and her team. Sam has a Model Behavior Team. Musk appears to be directing model behavior himself, with predictable outcomes.

dankai•4h ago

This is so in character for Musk and shocking because he's incompetent across so many topics he likes to give his opinion on. Crazy he would nerf the model of his AI company like that.

sorcerer-mar•4h ago

Megalomania is a hell of a drug

simonw•4h ago

I think the wildest thing about the story may be that it's possible this is entirely accidental.

LLM bugs are weird.

mac-attack•4h ago

Curious if there is a threshold/sign that would convince you that the last week of Grok snafus are features instead of a bugs, or warrant Elon no longer getting the benefit of the doubt.

Ignoring the context of the past month where he has repeatedly said he plans on 'fixing' the bot to align with his perspective feels like the LLM world's equivalent of "to me it looked he was waving awkwardly", no?

simonw•3h ago

He's definitely trying to make it less "woke". The way he's going about it reminds me of Sideshow Bob stepping on rakes.

wredcoll•2h ago

What do you mean, the way he's going about it? He wanted it to be less woke, it started praising hitler, that's literally the definition of less woke.

bix6•4h ago

[flagged]

bananalychee•4h ago

[flagged]

philistine•3h ago

[flagged]

wredcoll•2h ago

[flagged]

marcusb•4h ago

This reminds me in a way of the old Noam Chomsky/Tucker Carlson exchange where Chomsky says to Carlson:

  "I’m sure you believe everything you’re saying. But what I’m saying is that if you believed something different, you wouldn’t be sitting where you’re sitting."

Simon may well be right - xAI might not have directly instructed Grok to check what the boss thinks before responding - but that's not to say xAI wouldn't be more likely to release a model that does agree with the boss a lot and privileges what he has said when reasoning.

chatmasta•8m ago

I'm confused why we need a model here when this is just standard Lucene search syntax supported by Twitter for years... is the issue that its owner doesn't realize this exists?

Not only that, but I can even link you directly [0] to it! No agent required...

[0] https://x.com/search?q=from%3Aelonmusk%20(Israel%20OR%20Pale...

Kapura•7m ago

How is "i have been incentivised to agree with the boss, so I'll just google his opinion" reasoning? Feels like the model is broken to me :/

dupsik•5m ago

That quote was not from a conversation with Tucker Carlson: https://www.youtube.com/watch?v=1nBx-37c3c8

lr0•2h ago

Why is that flagged? The post does not show any concerns about the ongoing genocide in Gaza, it's purely analyzing the LLM response in a technical perspective.

MallocVoidstar•1h ago

It makes Musk/X look bad, so it gets flagged.

chambo622•2h ago

Not sure why this is flagged. Relevant analysis.

ramblerman•24m ago

@dang why is this flagged?

Simonw is a long term member with a good track record, good faith posts.

And this post in particular is pretty incredible. The notion that Grok literally searches for "from: musk" to align itself with his viewpoints before answering.

That's the kind of nugget I'll go to the 3rd page for.

tomhow•15m ago

Users flagged it but we've turned off the flags and restored it to the front page.

Difftastic: A structural diff that understands syntax

Who Up?

Blocking mobile internet on smartphones improves brain health

Ask HN: Good non tech companies to work at

The struggle for control of the Arctic is accelerating – and riskier

Competitor's Influencer List Finder

Do we need deterministic AI?

Show HN: I rebuilt few years old project and now it covers my expenses

The Secret Group Chats Where the Rich Score Seats on Private Jets

Free YouTube Thumbnail Tester

PLDI 2025 coverage released: over 200 talks

Ask HN: What techniques do you use to remember complex concepts and theories?

SEO Is Dead. Long Live Geo

The day someone created 184 billion Bitcoin (2020)

Mill Build Tool v1.0.0 Release Highlights

Pufferlib: Simplifying reinforcement learning for complex game environments

Energy costs are rising. This state says tech companies must pay more

Giant, flightless bird is next target for de-extinction company Biosciences

Kira (Short AI-Assisted Film on Human Cloning)

America's fastest-growing suburbs are about to get expensive

Longform text has become iconic — almost like an emoji

Bitcoin Mining Costs Soar as Hashrate Hits Records: TheMinerMag

The Cost of AI Reliance

Phishing for Gemini: Google Gemini G-Suite Prompt Injection Vulnerability

Show HN: Keep up with the world without information overload

Bitwarden MCP server secures AI access to your passwords

ImageNet: A large-scale hierarchical image database (2009)

We should have more billionaires

Realbasic (2000)

Show HN: Built an agent to keep up with research

Difftastic: A structural diff that understands syntax

Who Up?

Blocking mobile internet on smartphones improves brain health

Ask HN: Good non tech companies to work at

The struggle for control of the Arctic is accelerating – and riskier

Competitor's Influencer List Finder

Do we need deterministic AI?

Show HN: I rebuilt few years old project and now it covers my expenses

The Secret Group Chats Where the Rich Score Seats on Private Jets

Free YouTube Thumbnail Tester

PLDI 2025 coverage released: over 200 talks

Ask HN: What techniques do you use to remember complex concepts and theories?

SEO Is Dead. Long Live Geo

The day someone created 184 billion Bitcoin (2020)

Mill Build Tool v1.0.0 Release Highlights

Pufferlib: Simplifying reinforcement learning for complex game environments

Energy costs are rising. This state says tech companies must pay more

Giant, flightless bird is next target for de-extinction company Biosciences

Kira (Short AI-Assisted Film on Human Cloning)

America's fastest-growing suburbs are about to get expensive

Longform text has become iconic — almost like an emoji

Bitcoin Mining Costs Soar as Hashrate Hits Records: TheMinerMag

The Cost of AI Reliance

Phishing for Gemini: Google Gemini G-Suite Prompt Injection Vulnerability

Show HN: Keep up with the world without information overload

Bitwarden MCP server secures AI access to your passwords

ImageNet: A large-scale hierarchical image database (2009)

We should have more billionaires

Realbasic (2000)

Show HN: Built an agent to keep up with research

Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)"

Comments