Only a matter of time before someone does it though.
Unlike code where it's all on display, with all these formulas are hidden in each cell, you won't see the problem unless click on the cell so you'll have a hard time finding the cause.
Little stuff like splitting text more intelligently or following the formatting seen elsewhere would be very satisfying.
AFAIK there is no 'git for Excel to diff and undo', especially not built-in (aka 'for free' both cost-wise and add-ons/macros not allowed security-wise).
My limited experience has been that it is difficult to keep LLMs from changing random things besides what they're asked to change, which could cause big problems if unattackable in Excel.
Same deal there -- the original author was a genius and was the only person who knew how it was set up or how it worked.
What I’m saying is that if you really believed we were 2, maybe 3 years tops from AGI or the singularity or whatever you would spend 0 effort serving what already seems to be a domain that is already served by 3rd parties that are already using your models! An excel wrapper for an LLM isn’t exactly cutting edge AI research.
They’re desperate to find something that someone will pay a meaningful amount of money for that even remotely justifies their valuation and continued investment.
Being able to select a few rows and then use plain language to describe what I want done is a time saver, even though I could probably muddle through the formulas if I needed to.
It is an entire agent loop. You can ask it to build a multi sheet analysis of your favorite stock and it will. We are seeing a lot of early adopters use it for financial modeling, research automation, and internal reporting tasks that used to take hours.
-stop using the free plan -don't use gemini flash for these tasks -learn how to do things over time and know that all ai models have improved significantly every few months
I would’ve expected “make a vlookup or pivot table that tells me x” or “make this data look good for a slide deck” to be easier problems to solve.
For easy spreadsheet stuff (which 80% of average white collars workers are doing when using excel) I’d imagine the same approach. Try to do what I want, and even if you’re half wrong the good 50% is still worth it and a better starting point.
Vibe coding an app is like vibe coding a “model in excel”. Sure you could try, but most people just need to vibe code a pivot table
Thousands of unreported COVID cases: https://news.ycombinator.com/item?id=24689247
Thousands of errors in genetics research papers: https://news.ycombinator.com/item?id=41540950
Wrong winner announced in national election: https://news.ycombinator.com/item?id=36197280
Countries across the world implement counter-productive economic austerity programs: https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Metho...
I don’t think the frontier labs have the bandwidth or domain knowledge (or dare I say skills) to do tier 5 tasks well. Even their chat UIs leave a lot to be desired and that should be their core competency.
However I would think more of elite data centers rather than commodity data centers. That's because I see Tier 4 being deeply involved in their data centers and thinking of buying the chips to feed their data centers. I wouldn't be so inclined to throw in my opinion immediately if I found an article showing this ordering of the tiers, but being a tweet of a podcast it might have just been a rough draft.
Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)
This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"
My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.
The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.
I hate smartsheet…
Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)
For cases where that is not available, we should use a human and never an LLM.
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
Now, granted, that can also happen because Alex fat-fingered something in a cell, but that's something that's much easier to track down and reverse.
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
It's the nature of everything. They agree to pay you for something. It's nothing specific to "profit motive" in the sense you mean it.
Source?
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
Claude Excel is leaning deeply into this garbage.
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
Neat formatting didn't save any model from having the wrong formula pasted in.
Being neat was never a substitute for being well rested, or sufficiently caffeinated.
Have you seen how AI functions in the hands of someone who isn't a domain expert? I've used it for things I had no idea about, like Astro+ web dev. User ignorance was magnified spectacularly.
This is going to have Jr Analysts dumping well formatted junk in email boxes within a month.
Who are these teams that can get value from Anthropic? One MCP and my context window is used up and Claude tells me to start a new chat.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
Until Microsoft does its anti-competitive thing and find a way to break this in the file format, because this is exactly what copilot in excel does.
That said, Copilot in Excel is pretty much hot garbage still so anything will be better than that.
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
Massive win indeed
not sure if it binary like that but as startups we will probably collect the scraps leftover indeed instead
lies, damn lies, statistics, and then Excel deciding cell data types.
Its obviously not the same experience for everyone. ( If you are one of those energized while working in a chat window, you might be in a minority - given what we see from the ongoing massacre of brains in education. )
Paraphrasing something I read here "people don't use ChatGPT to do learn more, they use it to study less".
Maybe some folk would be better off.
This is what I want AI to do, not generate wrong answers and hallucinate girlfriends.
Disclosure: My company builds ingestion pipelines for large multi-tab Excel files, PDFs, and CSVs.
https://www.anthropic.com/news/advancing-claude-for-financia...
- Get answers about any cell in seconds: Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas: Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors: Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates: Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
cube00•2h ago
sdsd•2h ago
cube00•2h ago
LLMs are not deterministic.
I'd argue over the short term humans are more deterministic. I ask a human the same question multiple times and I get the same answer. I ask an LLM and each answer could be very different depending on its "temperature".
krzyk•2h ago
worldsayshi•2h ago
But I agree with the sentiment. It seems it is more important than ever to agree on what it means to understand something.
qwertox•1h ago
NDizzle•2h ago
baal80spam•2h ago
dang•1h ago
https://news.ycombinator.com/newsguidelines.html