Gemini 2.5: Our most intelligent models are getting even better

https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/

47•meetpateltech•4h ago

Comments

russfink•4h ago

Why don’t companies publish hashes of emitted answers so that we, eg teachers, could verify if the AI produced this result?

staticman2•4h ago

It would be pretty trivial to paraphrase the output wouldn't it?

fenesiistvan•4h ago

Change one character and the hash will not match anymore...

Atotalnoob•3h ago

There are the issues others mentioned, but also you could write something word for word of what an LLM says.

It’s statistically unlikely, but possible

perdomon•3h ago

Hashes of every answer to every question and every variation of that question? If that were possible, you’d still need to account for the extreme likelihood of the LLM providing a differently worded answer (it virtually always will). This isn’t how LLMs or hashing algorithms work. I think the answer is that teachers need to adjust to the changing technological landscape. It’s long overdue, and LLMs have almost ruined homework.

fuddy•2h ago

Hashing every answer you ever give is the kind of thing that is done with hashing algorithms, the trouble is that the user can trivially make an equally good variant with virtually any (well an unlimited number of possible) change, and nothing has hashed it.

haiku2077•3h ago

Ever heard of the meme:

"can I copy your homework?"

"yeah just change it up a bit so it doesn't look obvious you copied"

evilduck•3h ago

Local models are possible and nothing in that area of development will ever publish a hash of their output. The huge frontier models are not reasonably self-hosted but for normal K-12 tasking a model that runs on a decent gaming computer is sufficient to make a teacher's job harder. Hell, a small model running on a newer phone from the last couple of years could provide pretty decent essay help.

haiku2077•3h ago

Heck, use a hosted model for the first pass, send the output to a local model with the prompt "tweak this to make it sound like it was written by a college student instead of an AI"

BriggyDwiggs42•3h ago

There’s an actual approach where you have the LLM generate patterns of slightly less likely words and then can detect it easily from years ago. They don’t want to do any of that stuff because cheating students are their users.

subscribed•3h ago

This is exactly where users of English as second language are being accused of cheating -- we didn't grew with the live language, but learnt from movies, classic books, and in school (the luckiest ones).

We use rare or uncommon words because of how we learned and were taught. Weaponising it against us is not just a prejudice, it's idiocy.

You're postulating using a metric that shows how much someone deviates from the bog standard, and that will also discriminate against the smart, homegrown erudites.

This approach is utterly flawed.

haiku2077•2h ago

I remember when my parents sent me to live with my grandparents in India for a bit, all the English language books available were older books, mostly British authors. I think the newest book I read that summer that wasn't a math book was Through the Looking Glass.

BriggyDwiggs42•1h ago

I’m referencing a paper I saw in passing multiple years ago, so forgive me if I didn’t elaborate the exact algorithm. The LLM varies its word selection in a patterned way, eg most likely word, 2nd most, 1st, 2nd, and so on. It’s statistically impossible for an esl person to happen to do this on accident.

dietr1ch•3h ago

I see the problem you face, but I don't think it's that easy. It seems you can rely on hashes being noisy and alter questions or answers a little bit to get around the LLM homework naughty list.

silisili•3m ago

Just ctrl-f for an em dash and call it a day.

cye131•2h ago

The new 2.5 Pro (05-06) definitely does not have any sort of meaningful 1 million context window, as many users have pointed out. It does not even remember to generate its reasoning block at 50k+ tokens.

Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.

High Levels of Arsenic Found in Rice Sold Across the U.S.

Instagram Addiction

Galileo AI is joining Google

Land under the country's largest cities is sinking

The consent you never gave: cookie pop-ups ruled unlawful under GDPR

My Tony Robbins Experience

Schrödinger lays off 60 amid uncertain times and challenging economic conditions

What's New in Flutter 3.32

OpenAI's Stargate Megafactory with Sam Altman (Bloomberg) [video]

A Guide to Prompting

Show HN: I made an app that lets founders chat with AI personas of their users

Texas poised to ban minors from social media

Fortnite is now available again on the US App Store

Frontier Models are Capable of In-context Scheming

I solved almost all of free problems on LeetCode using AI

Good American Speech

Mailr

FreeBSD and NetBSD Zig Cross-Compilation Support

Automated discovery of reprogrammable nonlinear dynamic metamaterials (2024)

Metamaterial Origami Robots [video]

Radiology explainer demo

FakeMyRun – Create custom running routes

Show HN: Vibe Coding Security Scanner and Tester

Every TV news report on the economy in one [video] (2015)

Semantic search engine for ArXiv, biorxiv and medrxiv

RSV vaccine and antibody treatment leads to drop in US hospitalisations

The First DOS Machine (2007)

The Trump Administration Is Tempting a Honeybee Disaster

Show HN: YouTube/Article/Reel –> Tweet Thread

Ask HN: How the hell haven't we solved phishing emails yet?