Google pulls AI model after senator says it fabricated assault allegation

https://www.theverge.com/news/812376/google-removes-gemma-senator-blackburn-hallucination

69•croemer•5h ago

Comments

croemer•5h ago

This is about Gemma, Google's open weights model. And specifically availability through AI studio. I don't think they'll make the weights unavailable.

jqpabc123•5h ago

AI is going to be a lawyer's wet dream.

Imagine the ads on TV: "Has AI lied about you? Your case could be worth millions. Call now!"

Handy-Man•5h ago

I mean even lawyers are getting sanctioned for using them without verifying their outputs lol.

- https://www.msba.org/site/site/content/News-and-Publications...

- https://www.reuters.com/legal/government/judge-disqualifies-...

- https://calmatters.org/economy/technology/2025/09/chatgpt-la...

ghjv•4h ago

strange to talk about it in the future tense. it's here and yep, it's an object of fascination for the legal system

rchaud•4h ago

Unlike medicine, AI isn't regulated, so lawyers won't have anything to go after.

advisedwang•4h ago

I think the parent post imagines defamation cases will be worthwhile. I'm sure there will be some, but an AI simply lying in a query doesn't = damages worth suing over.

jqpabc123•3h ago

It depends on who is the lie was about and what was said.

Simple example: A prospective employer refuses to hire you because of some blatant falsehood generated by an LLM.

filoleg•4h ago

And also (an incompetent and lazy) lawyer’s worst nightmare.

At least once a week, there is another US court case where the judge absolutely rips apart an attorney for AI-generated briefs and statements featuring made-up citations and nonexistent cases. I am not even following the topic closely, and yet I just encounter at least once a week.

Here are a couple most recent ones I spotted: Mezu v. Mezu (oct 29)[0], USA v. Glennie Antonio McGee (oct 10)[1].

0. https://acrobat.adobe.com/id/urn:aaid:sc:US:a948060e-23ed-41...

1. https://storage.courtlistener.com/recap/gov.uscourts.alsd.74...

lesuorac•35m ago

Meh, the lawyer got a scarlet letter. Wake me up when somebody gets disbarred.

hnuser123456•5h ago

There should probably be a little more effort towards making small models that don't just make things up when asked a factual question. All of us who have played with small models know there's just not as much room for factual info, they are middle schoolers who just write anything. Completely fabricated references are clearly an ongoing weakness, and easy to validate.

rewilder12•5h ago

LLMs by definition do not make facts. You will never be able to eliminate hallucinations. It's practically impossible.

Big tech created a problem for themselves by allowing people to believe the things their products generate using LLMs are facts.

We are only reaching the obvious conclusion of where this leads.

hnuser123456•4h ago

To be sure, a lot of this can be blamed on using AI studio to ask a small model a factual question. It's the raw LLM output of a highly compressed model, it's not meant to be everyday user facing like the default Gemini models, and doesn't have the same web search and fact checking behind the scenes.

On the other hand, training a small model to hallucinate less would be a significant development. Perhaps with post-training fine-tuning, after getting a sense of what depth of factual knowledge the model has actually absorbed, adding a chunk of training samples with a question that goes beyond the model's fact knowledge limitations, and the model responding "Sorry, I'm a small language model and that question is out of my depth." I know we all hate refusals but surely there's room to improve them.

th0ma5•4h ago

All of these techniques just push the problems around so far. And anything short of 100% accurate is a 100% failure in any single problematic instance.

kentm•3h ago

A talk I went to made the point that LLMs don't sometimes hallucinate. They always hallucinate -- its what they're made to do. Usually those hallucinations align with reality in some way, but sometimes they don't.

I always thought that was a correct and useful observation.

s1mplicissimus•5h ago

Given that the current hypewave is already going on for a couple years, I think it's plausible to assume that there really are fundamental limitations with LLMs on these problems. More compute didn't solve it as promised, so my bets are on "LLMs will never not do hallucinations"

skywhopper•4h ago

If you disable making things up, LLMs will not work. Making stuff up is literally how they work.

hnuser123456•4h ago

I am aware, but think about right after a smaller is done training. The researchers can then quiz it to get a sense of the depth of knowledge it can reliably cite, then fine-tune with examples of questions beyond the known depth of the model being refused with "Sorry, I'm a small model and don't have enough info to answer that confidently."

Obviously it's asking for a lot to try to cram more "self awareness" into small models, but I doubt the current state of the art is a hard ceiling.

porridgeraisin•4h ago

> then fine-tune with examples of questions beyond the known depth of the model being refused with "Sorry, I'm a small model and don't have enough info to answer that confidently."

This has already been tried, llama pioneered it (as far as I can infer from public knowledge, maybe openai did it years ago I don't know).

They looped through a bunch of wikipedia pages, made questions out of the info given there, posed them to the LLM and then whenever the answer did not match what was in wikipedia, they went ahead and finetuned on "that question: Sorry I don't know ...".

Then, we went one step ahead, and finetuned it to use search in these cases instead of saying I don't know. Finetune it on the answer toolCall("search", "that question", ...) or whatever.

Something close to the above is how all models with search tool capability are fine tuned.

All these hallucinations are despite those efforts, it was much worse before.

This whole method depends on the assumption that there is actually a path in the internal representation that fires when it's gonna hallucinate. The results so far tell us that it is partially true. No way to quantify it of course.

tensor•4h ago

I don't think there is any math showing that it's the models size that limits "fact" storage, to the extent these models store facts. And model size definitely does not change the fact that all LLMs will write things based on "how" they are trained, not on how much training data they have. Big models will produce nonsense just as readily as small models.

To fix that properly we likely need training objective functions that incorporate some notion of correctness of information. But that's easier said than done.

MyOutfitIsVague•4h ago

From her letter:

> The consistent pattern of bias against conservative figures demonstrated by Google’s AI systems is even more alarming. Conservative leaders, candidates, and commentators are disproportionately targeted by false or disparaging content.

That's a little rich given the current administration's relationship to the truth. The present power structure runs almost entirely on falsehoods and conspiracy theories.

mikkupikku•4h ago

It might be rich, but is it false? Are Google models more likely to defame conservatives than not?

I think plausiblibly they might, through no fault of Google, if only because scandals involving conservatives might be statistically more likely.

th0ma5•4h ago

A lot facts that people deem liberal or leftist or something are simply statistically consistent with the world literature as a whole and problematic for conservative ideals.

criddell•4h ago

Back in 2006 at the White House Correspondent's Dinner, Stephen Colbert said "reality has a well known liberal bias" which I thought was a pretty funny line.

https://en.wikipedia.org/wiki/Stephen_Colbert_at_the_2006_Wh...

huhkerrf•4h ago

Yeah, we're all aware. Midwits have been repeating it ever since.[0]

0: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

th0ma5•3h ago

I think about facets of this a lot. The conservative ideals of thinking only in zero-sum characterizations of political problems, that someone must go without in order for someone to gain, or that a conservative ideal is to be led by some authority both don't comport with how knowledge can also be gained in society through peer to peer relationships or the very idea that wealth can be created. That the world doesn't have to follow conservative ideals is how a statement like that becomes so funny since it is the current conspiratorial reflex of the right.

mikkupikku•2h ago

A claim can be both statistically consistent with a given corpus and also simply wrong. Saying that Ted Cruz was caught giving blowjobs in an airport bathroom for instance. That headline wouldn't surprise anybody, but it's also wrong. I just made it up now.

th0ma5•1h ago

So then is that leftist to point out the fact that it shouldn't be surprising or just an accurate description of party members as a whole?

mikkupikku•1h ago

> So then is that leftist to point out the fact that it shouldn't be surprising

I don't think so, no.

> just an accurate description of party members as a whole?

It wouldn't be. While enough republicans have gotten caught being gay to remove the element of surprise and plausibly be the basis of LLM hallucinations, most of them haven't been, so such an LLM hallucination wouldn't actually be accurate, merely unsurprising.

petre•3h ago

If they train it on conservative media is going to defame everybody else just like MS' Tay chatbot.

rchaud•4h ago

Terrifying to think that some techbro is out there right now concocting plans for an "AI background check" startup.

confounder•4h ago

Like this one? https://checkr.com/our-technology/ai-powered

pwlm•4h ago

"False or misleading answers from AI chatbots masquerading as facts still plague the industry and despite improvements there is no clear solution to the accuracy problem in sight."

One potential solution to the accuracy problem is to turn facts into a marketplace. Make AIs deposit collateral for the facts they emit and have them lose the collateral and pay it to the user when it's found that statements they presented were false.

AI would be standing behind its words by having something to lose, like humans. A facts marketplace would make facts easy to challenge and hard to get right.

Working POC implementation of facts marketplace in my submissions.

mikkupikku•4h ago

I doubt that could ever work. It's trivial to get these models to output fabrications if that's what you want; just keep asking it for more details about a subject than it could reasonably have. This works because the models are absolutely terrible at saying "I don't know", and this might be a fundamental limitation of the tech. Then of course you have the mess of figuring out what the facts even are, there are many contested subjects our society cannot agree on, many of which don't lend themselves to scientific inquiry.

chimeracoder•4h ago

LLMs have serious problems with accuracy, so this story is entirety believable - we've all seen LLMs fabricate far more outlandish stuff.

Unfortunately, it's also worth pointing out that neither Marsha Blackburn nor Robby Starbuck are reliable narrators historically; nor are they even impartial actors in this particular story.

Blackburn has a long history of fighting to regulate Internet speech in order to force them to push ideological content (her words, not mine), so it's not surprising to see that this story originated as part of an unrelated lawsuit over First Amendment rights on the Internet and that Blackburn's response to it is to call for it all to be shut down until it can be regulated according to her partisan agenda (again, her words, not mine) - something which she has already pushed for via legislation that she has coauthored.

swivelmaster•4h ago

At some point we have to be willing to call out, at a societal level, that LLMs have been fundamentally oversold. The response to "It made defamatory facts up" of "You're using it wrong" is only going to fly for so long.

Yes, I understand that this was not the intended use. But at some point if a consumer product can be abused so badly and is so easy to use outside of its intended purposes, it's a problem for the business to solve and not for the consumer.

TZubiri•4h ago

Maybe someone else actually made up the defamatory fact up, and it was just parroted.

But fundamentally the reason ChatGPT became so popular as opposed to its incumbents like Google or Wikipedia, is that it dispensed with the idea of attributing quotes to sources. Even if 90% of the things it says can be attributed, it's by design that it can say novel stuff.

The other side of the coin is that for things that are not novel, it attributes the quote to itself rather than sharing the credit with sources, which is what made the thing so popular in the first place, as if it were some kind of magic trick.

These are obviously not fixable, but part of the design. I have the theory that the liabilities will be equivalent if not greater to the revenue recouped by OpenAI, but the liabilities will just take a lot longer to realize, considering not only the length of trials, but the length of case law and even new legislation to be created.

In 10 years, Sama will be fighting to make the thing an NFP again and have the government bail it out of all the lawsuits that it will accrue.

Maybe you can't just do things

im3w1l•4h ago

Businesses can't just wave a magic wand and make the models perfect. It's early days with many open questions. As these models are a net positive I think we should focus on mitigating the harms rather than some zero tolerance stance. We shouldn't allow the businesses to be neglectful, but I don't see evidence of that.

mindslight•3h ago

> We shouldn't allow the businesses to be neglectful, but I don't see evidence of that.

Calling it "AI", shoving it into many existing workflows as if it's competently answering questions, and generally treating it like an oracle IS being neglectful.

HacklesRaised•3h ago

It can't be perfect right? I mean the models require some level of entropy?

ares623•3h ago

> As these models are a net positive

Uhhh… net positive for who exactly?

water-data-dude•3h ago

For the shareholders of a few companies (in the short term).

im3w1l•2h ago

Chatgpt has 800 million weekly active users. I think it's a net positive for them.

IAmBroom•2h ago

Well, since that's 10x the number of weekly active opioid users, it's at least 10x more positive than fentynal.

Or am I not following your logic correctly?

im3w1l•1h ago

You are not arguing in good faith.

vel0city•43m ago

You seem to be missing the obvious point: popularity of a product doesn't ensure the benefit of said product. There are tons of wildly popular products which have extremely negative outcomes for the user and society at large.

Let's take a weaker example, some sugary soda. Tons of people drink sugary sodas. Are they truly a net benefit to society, or a net negative social cost? Just pointing out that there are a high number of users doesn't mean it inherently has a high amount of positive social outcomes. For a lot of those drinkers, the outcomes are incredibly negative, and for a large chunk of society the general outcome is slightly worse. I'm not trying to argue sugary sodas deserve to be completely banned, but its not a given they're beneficial just because a lot of people bothered to buy them. We can't say Coca-Cola is obviously good for people because its being bought in massive quantities.

Do the same analysis for smoking cigarettes. A product that had tons of users. Many many hundreds of millions (billions?) of users using it all day every day. Couldn't be bad for them, right? People wouldn't buy something that obviously harms them, right?

AI might not be like cigarettes and sodas, sure. I don't think it is. But just saying "X has Y number of weekly active users, therefore it must be a net positive" as some example of it truly being a positive in their lives is drawing a correlation that may or may not exist. If you want to show its positive for those users, show those positive outcomes, not just some user count.

ares623•1h ago

Net positive to me, means that the negative aspects are outweighed by the positive aspects.

How confident are you that 800M people know what the negative aspects are to make it a net positive for them?

derbOac•55m ago

Here on HN we talk about models, and rightfully so. Elsewhere though people talk about AI, which has a different set of assumptions.

It's worth noting too that how we talk about and use AI models is very different from how we talk about other types of models. So maybe it's not surprising people don't understand them as models.

Razengan•4h ago

Just a day ago I asked Gemini to search for Airbnb rooms in an area and give me a summarized list.

It told me it can't and I could do it myself.

I told it again.

Again it told me it can't, but here's how I could do it myself.

I told it it sucks and that ChatGPT etc. can do it for me.

Then it went and I don't know, scrapped Airbnb or used a previous search it must have had, to pull up rooms with an Airbnb link to each.

…

After using a bunch of products I now think a common option they all need to have is a toggle between "Monkey's Paw" mode: Do As I Say, vs a "Do What I Mean" mode.

Basically where the user takes responsibility and where the AI does.

If it can't do or isn't allowed to do something when in Monkey Paw mode then just stop with a single sentence. Don't go on a roundabout gaslighting trip.

renewiltord•4h ago

Placing a model behind a “Use `curl` after generating an API key using `gcloud auth login` and accepting the terms of service” is probably a good idea. Anything but the largest models equipped with search to ground generation is going to hallucinate at a high enough rate that a rando can see it.

You need to gate away useful technology from the normies, usually. E.g. kickstarter used to have a problem where normies would think they were pre-ordering a finished product and so they had to pivot to being primarily a pre-order site.

Anything that is actually experimental and has less than very high performance needs to be gated away from the normies.

AmbroseBierce•4h ago

The current president makes fabricated allegations almost every single day, and many politicians in general but "oh no the machine did it a handful of times so we need to crucify the technology that just imitates humans including the aforementioned, and the billions of dollars invested in creating it"

mikeholler•3h ago

Perhaps we should be trying to make the machines perform correctly, instead of saying that creating fabrications is OK for anyone or anything.

dsr_•3h ago

I'm okay with condemning all of them. Bad behavior on one part doesn't excuse the bad behavior of the other.

swivelmaster•3h ago

Just because the media has failed doesn’t mean we should accept that kind of failure everywhere.

kmfrk•3h ago

One of the things that really has me worried at the moment are people asking chatbots who to vote for ahead of upcoming elections.

Especially in parliamentary democracies where people already take political quizzes to make sense of all the parties and candidates on the ballot.

black6•3h ago

If you're asking a machine which human you should vote for, you probably shouldn't be voting.

mikkupikku•2h ago

I think we don't do democracy because we think the masses are informed and make good decisions, but rather because it's the best system for ensuring peaceful transitions of power, thereby creating social stability which is conducive to encouraging investment in the future.

So uninformed people participating isn't an unfortunate side effect, but rather the point: making everybody feel included in the decision making processes, to make people more likely to accept political change.

IAmBroom•2h ago

Are you saying?...

"I think we do democracy not because we think the masses are informed and make good decisions, but rather because it's the best system for ensuring peaceful transitions of power, thereby creating social stability which is conducive to encouraging investment in the future.

mikkupikku•1h ago

Yes.

lesuorac•38m ago

I think people argue this but I don't think its true.

The lack of warlords leads to peaceful transitions. Trump can feel all he wants about the 2020 election but his sphere of influence was too small to take control.

This isn't the case for all those power struggles when a monarch dies. Each Lord had their own militia they could mobilize to take control and leads to stuff like War of the Roses.

We had this same issue going into the Civil War where the US army was mostly militias so it's pretty easy to grab the southern ones together and go fight the north. This isn't going to work so well post-1812 where a unified federal army exists. Of course, if you start selectively replacing generals with loyalists then you start creating a warlord.

seanmcdirmid•2h ago

For local elections, I have to frantically google on the day my ballot is due to figure out how to vote for. My criteria is pretty fixed: I want to vote for moderates but beyond a few high profile races I don't have a clue who the moderate option is. I can see using AI to summarize positions for more obscure candidates.

netsharc•1h ago

But... it's like asking a knowledgeable person. How are you sure she's giving you answers as your criteria demands, or whether she's been influenced to skew the answers to favor a candidate..

"Let me ask Grok who I should vote for..."

GuinansEyebrows•57m ago

> For local elections, I have to frantically google on the day my ballot is due to figure out how to vote for.

what on earth??

practically every metropolitan area and tons of smaller communities have multiple news sources that publish "voting guides" in addition to voter pamphlets that go out before elections which detail candidates positions, ballot initiatives etc.

barring that you can also just... do your "frantic googling" before the election. it's not a waste of your time to put a little of it toward understanding the political climate of your area and maybe once in a while forming an opinion instead of whatever constitutes a "moderate" position during the largest rightward shift of the overton window in decades.

amarcheschi•11m ago

With the added bonus that a llm might not even be updated to the last developments of what happened politically and have outdated views or might not know about the candidate well enough to provide accurate info (or at least, more accurate than any voting phamplets or guides)

redleader55•1h ago

Do you think the other ways in which people vote are better: selling your vote, picking a candidate for being presentable, picking a candidate for having the right complexion/religion/sexual orientation, picking a candidate for being married or having kids, picking a candidate because they are "smart", or poor or ... I could go on. Giving the right prompt which you find on the internet might give you a better choice than you might decide for on your own.

Thought-Provoking Sports Training

Assume Culture/Stories/News has failed as politics adopted ARGs as a format

Big Tech Needs $2T in AI Revenue by 2030

Today I Learned: Binfmt_misc

R interface to Apple's MLX library

Scraper+AI devs: Apify launches $1M reward challenge for new automation tools

The Origins of the Pirate Accent

Control structures in programming languages: from goto to algebraic effects

Perplexity's new AI tool aims to simplify patent research

Show HN: Secret Management for Local Development

Agent-shell 0.17 improvements and MELPA

Antarctic glacier saw the fastest retreat in modern history

How the American Dream Became a Nightmare

Walking Down to the Rhine's Riverbed

An AI company CEO could take over the world

Elon Musk hypes Tesla's 8th gen AI chip, still hasn't delivered self-driving

Comparing C++/Qt Data Serialization Formats: Code, Size, and Performance

Apple's App Store Full Front End Source Code

I built ScreenStacka – a simple, ad-free tool to compare TV and monitor sizes

Apple App Store Web Has Exposed Its Source Code

Israeli military lawyer arrested leaking video of Palestinian detainee abuse

GM's EV push will cost it $1.6B in Q3 with end of the tax credit

Why fall colors in Maine are less vibrant this year

The Kraken: When Myth Encounters Science (2014|PDF)

Kubernetes and Ceph: Your Freedom from the Cloud Cartel

Microsoft's hiring shift: Fewer generalists, more AI-driven roles

The Agent Development Lifecycle (ADLC) – A new way to build reliable Agents

DeepMind's AI Learns to Create Original Chess Puzzles, Praised by GMs

Fund the future Trillion-dollar Industrial monopoly, just got validated

Effect of cooling white rice on resistant starch content and glycemic response