Has anyone else found Google's AI overview to be oddly error prone?

15•ckemere•4h ago

I've been quite impressed by Google's AI overviews. This past week, though, I was interested in what I thought was a fairly simple question - to calculate compound interest.

Specifically, I was curious about how Harvard's endowment has grown from its initial £780 in 1638, so I asked Google to calculate compound interest for me. A variety of searches all yield a reasonable formula which is then calculated to be quite wrong. For example: {calculate the present value of $100 compounded annually for 386 years at 3% interest} yields $0.736. {how much would a 100 dollar investment in 1638 be worth in 2025 if invested} yields $3,903.46. {100 dollars compounded annually for 386 years at 3 percent} yields "The future value of the investment after 386 years is approximately $70,389." And my favorite: {100 dollars compounded since 1638} tells me a variety of outcomes for different interest rates: "A = 100 * (1 + 0.06)^387 A ≈ 8,090,950.14 A = 100 * (1 + 0.05)^387 A ≈ 10,822,768.28 A = 100 * (1 + 0.04)^387 A ≈ 14,422,758.11"

How can we be so reasonable and yet so bad!?

Comments

joegibbs•3h ago

It's terrible. Gemini 2.5 Pro is great, but the AI overviews must be using a smaller model. I hate it when I look up something niche and it smugly tells me that I must be mistaken because there is no such thing. Also it gives annoyingly family-friendly responses to questions that it would be better off not responding to. The other day I was trying to find a Sopranos quote about two kinds of businesses being recession-proof, one of which being "certain aspects of entertainment" (i.e. prostitution) and it was telling me the certain aspects were filmmaking and music because they make people happy.

cma•1h ago

Why wouldn't they use 2.5 flash first, and then if an identical query is made by lots of people rerun it with 2.5 pro? Sometimes it seems much more error prone than 2.5 pro or even 2.0 even on common searches.

cratermoon•3h ago

LLMs can't do math.

3np•2h ago

This. People need to manage their expectations.

Spivak•2h ago

We're giving them calculators though, surely Google could provide a limited set of tools given Search already has a fairly sophisticated calculator.

I've been having my AI stuff successfully do math since early gp3 days with this method— even before "tool calling."

ianks•2h ago

LLMs and tempered expectations, like oil and water

scarface_74•1h ago

LLMs can’t do math. But that’s a solved problem. ChatGPT has had a built in Python runtime that can do math for years - at least the paid version.

zacksiri•2h ago

I recently used Gemini and Google search (with overview) to confirm whether a snack i bought from japan has expired. Used gemini to take a picture of the label written in japanese

One item said 25/7/25 the other one said 25/7/24 as you can imagine I was sure the first one was safe but the second one was confusing.

It told me that it's safe to eat because japanese date format is Year / Month / Date.

I looked up japanese date format in google (with overview) just to confirm. I guess we'll find out. Will report back soon.

elicksaur•2h ago

I think I’d call these examples “predictable” failures instead of “odd”.

mergy•1h ago

They are awful often for me. Examples - recommending installation of packages and software that doesn't exist, or settings changes that don't exist I In applications, etc. They fill the page but it's sadly noise so it cheapens the whole experience when I would have just preferred a link to a page from a person that knows what the hell they are talking about.

Open VSX Registry Is Down

Woman who tricked her way into men-only Magic Circle allowed in

Efficient Code Search with Nvidia DGX

TikTokification and Enshitification

Australian Navy's newest boats made in China

Unifying Framework for Representation Learning

Show HN: HN Alerts – Get email alerts about high-velocity Hacker News stories

AI Perception. Meta-Model Bridging Physics, AI, and Ethics

Kakistocracy: Rule by the Worst

All Meta Ray-Ban Smart Glasses Users Get Live Translation, Live AI Coming Soon

Max Cracking Down on Password Sharing with New 'Extra Member' Feature

Thai Hoa Palace cultural heritage site in Vietnam meets green building standards

20 years ago, the first videos uploaded to YouTube were short and sweet

PingCAP transformed TiDB into a serverless DBaaS using S3 and EBS

Mandelbulb, Sierpinski Pyramid and 5th Order Menger Sponge

European Parliament in 'final stages' of talks with China to remove sanctions

Legend talk on Search Infra by Jeff Dean in 2009

GPT-Image-1

What It's Like to Be a Software Engineer

Ubuntu 25.10 Replaces GNU Coreutils with Rust Uutils

My peanut allergy nearly killed me – now I eat them every day for breakfast

OriginStamp Closing Timestamp Dashboard

RoomProvement – Because there's always room for improvement

OpenAI says it would buy Chrome if Google is forced to sell

The VTech Socratic Method

Bluesky Community Verifications

Show HN: MCP MCP – an MCP server to list MCP servers

Repobird.ai – GitHub App Creates PRs from Issues with AI Bots

Algorithmic Authority: The Case of Bitcoin

Sales Consultant