frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Project Vend: Can Claude run a small shop? (And why does that matter?)

https://www.anthropic.com/research/project-vend-1
74•gk1•3h ago

Comments

seidleroni•2h ago
As much as I love AI/LLM's and use them on a daily basis, this does a great job revealing the gap between current capabilities and what the massive hype machine would have us believe the systems are already capable of.

I wonder how long it will take frontier LLM's to be able to handle something like this with ease without it using a lot of "scaffolding".

roxolotl•1h ago
I don’t quite know why we would think they’d ever be able to without scaffolding. LLM are exactly what the name suggests, language models. So without scaffolding they can use to interact with the world with using language they are completely powerless.
mdrzn•2h ago
Seems that LLM-run businesses won't fail because the model can't learn, they'll fail because we gave them fuzzy objectives, leaky memories and too many polite instincts. Those are engineering problems and engineering problems get solved.

Most mistakes (selling below cost, hallucinating Venmo accounts, caving to discounts) stem from missing tools like accounting APIs or hard constraints.

What's striking is how close it was to working. A mid-tier 2025 LLM (they didn't even use Sonnet 4) plus Slack and some humans nearly ran a physical shop for a month.

kashunstva•2h ago
> Can Claude run a small shop?

Good luck running anything where dependability on Claude/Anthropic is essential. Customer support is a black hole into which the needs of paying clients needs disappear. I was a Claude Pro subscriber, using primarily for assistance in coding tasks. One morning I logged in, while temporarily traveling abroad, and… I’m greeted with a message that I have been auto-banned. No explanation. The recourse is to fill out a Google form for an appeal but that goes into the same black hole into which all Anthropic customer service goes. To their credit they refunded my subscription fee, which I suppose is their way of escaping from ethical behaviour toward their customers. But I wouldn’t stake any business-critical choices on this company. It exhibits the same capricious behaviour that you would expect from the likes of Google or Meta.

fhd2•1h ago
Give them a year or two. Once they figured out how to run a small shop, I'm sure it'll just take a bit of additional scaffolding to run a large infrastructure provider.
bitwize•2h ago
"I have fun renting and selling storage."

https://stallman.org/articles/made-for-you.html

C-f Storolon

hamdouni•1h ago
"Sarah" and "Connor" in the same text about an AI that claims to be a real person... Asta la vista;-)
gavinray•1h ago
The identity crisis bit was both amusing and slightly worrying.
gausswho•1h ago
The article claimed Claudius wasn't having a go for April Fools - that it claimed to be doing so after the fact as a means of explaining (excusing?) its behavior. Given what I understand about LLMs and intent, I'm unsure how they could be so certain.
ElevenLathe•1h ago
The "April Fools" incident is VERY concerning. It would be akin to your boss having a psychotic break with reality one day and then resuming work the next. They also make a very interesting and scary point:

> ...in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects—especially if multiple agents based on similar underlying models tend to go wrong for similar reasons.

This is a pretty large understatement. Imagine a business that is franchised across the country with each "franchisee" being a copy of the same model, which all freak out on the same day, accuse the customers of secretly working for the CIA and deciding to stop selling hot dogs at a profit and instead sell hand grenades at a loss. Now imagine 50 other chains having similar issues while AI law enforcement analysts dispatch real cops with real guns to the poor employees caught in the middle schlepping explosives from the UPS store to a stand in the mall.

I think we were expecting SkyNet but in reality the post-AI economy may just be really chaotic. If you thought profit-maximizing capitalist entrepreneurs were corrosive to the social fabric, wait until there are 10^10 more of them (unlike traditional entrepreneurs, there's no upper limit and there can easily be more of them than there are real people) and they not-infrequently act like they're in late stage amphetamine psychosis while still controlling your paycheck, your bank, your local police department, the military, and whatever is left that passes for the news media.

Deeper, even if they get this to work with minimal amounts of of synthetic schizophrenia, do we really want a future where we all mainly work schlepping things back and forth at the orders of disembodied voices whose reasoning we can't understand?

lukaspetersson•1h ago
We are working on it! /Andon Labs
lukaspetersson•1h ago
Now we just need to make it safe.
deepdarkforest•1h ago
What irks me about anthropic blog posts, is that they are vague about details that are important to be able to (publicly) draw any conclusions they want to fit their narrative.

For example, I do not see the full system prompt anywhere, only an excerpt. But most importantly, they try to draw conclusions about the hallucinations in a weird vague way, but not once do they post an example of the notetaking/memory tool state, which obviously would be the only source of the spiralling other than the SP. And then they talk about the need of better tools etc. No, it's all about context. The whole experiment is fun, but terribly ran and analyzed. Of course they know this, but it's cooler to treat claudius or whatever as a cute human, to push the narrative of getting closer to AGI etc. Saying additional scaffolding is needed a bit is a massive understatement. Context is the whole game. That's like if a robotics company says "well, our experiment with a robot picking a tennis ball of the ground went very wrong and the ball is now radioactive, but with a bit of additional training and scaffolding, we expect it to compete in Wimbledon by mid 2026"

Similar to their "claude 4 opus blackmailing" post, they intentionally hid a bit the full system prompt, which had clear instructions to bypass any ethical guidelines etc and do whatever it can to win. Of course then the model, given the information immediately afterwards would try to blackmail. You literally told it so. The goal of this would to go to congress [1] and demand more regulations, specifically mentioning this blackmail "result". Same stuff that Sam is trying to pull, which would benefit the closed sourced leaders ofc and so on.

[1]https://old.reddit.com/r/singularity/comments/1ll3m7j/anthro...

beoberha•35m ago
I read the article before reading your comment and was floored at the same thing. They go from “Claudius did a very bad job” to “middle managers will probably be replaced” in a couple paragraphs by saying better tools and scaffolding will help. Ok… prove it!

I will say: it is incredibly cool we can even do this experiment. Language models are mind blowing to me. But nothing about this article gives me any hope for LLMs being able to drive real work autonomously. They are amazing assistants, but they need to be driven.

deadbabe•1h ago
You guys know AI already run shops right? Vending machines track their own levels of inventory, command humans to deliver more, phase out bad products, order new product offerings, set prices, notify repairmen if there are issues… etc… and with not a single LLM needed. Wrong tool for the job.

And that’s before we even get into online shops.

But yea, go ahead, see if an LLM can replace a whole e-commerce platform.

Animats•58m ago
Is there an underlying model of the business? Like a spreadsheet? The article says nothing about having an internal financial model. The business then loses money due to bad financial decisions.

What this looks like is a startup where the marketing people are running things and setting pricing, without much regard for costs. Eventually they ran through their startup capital. That's not unusual.

Maybe they need multiple AIs, with different business roles and prompts. A marketing AI, and a financial AI. Both see the same financials, and they argue over pricing and product line.

logifail•50m ago
> an internal financial model

Written on the back an envelope?

Way back when, we ran a vending machine at school as a project. Decide on the margin, buy in stock from the cash-and-carry, fill the machine, watch the money roll in.

Then we were robbed - twice! - the second time ended our project, the machine was too wrecked to be worthwhile repairing. The thieves got away with quite a lot of crisps and chocolate, and not a whole lot of cash (and what they did get was in small denomination coins), we made sure the machine was emptied daily...

dist-epoch•43m ago
It's a vending machine, not a multinational company with 1000 employees.

In another post they mentioned a human rand the shop with pen and paper to get a a baseline (spoiler: human did better, no blunders)

Show HN: Thockfactory – An Online Configurator for Custom Keycaps Enthusiasts

https://thockfactory.com/us
1•ehov•18s ago•0 comments

Munich Open Source

https://opensource.muenchen.de/
1•smartmic•4m ago•0 comments

Seamless Interaction

https://ai.meta.com/research/seamless-interaction/?_fb_noscript=1
2•jkw•4m ago•0 comments

A highly efficient CRISPR-Cas9-based gene-editing system in oats

https://onlinelibrary.wiley.com/doi/10.1111/pbi.70146
1•PaulHoule•4m ago•0 comments

PEP 795 – Deep Immutability in Python

https://pep-previews--4468.org.readthedocs.build/pep-0795/
1•ayhanfuat•5m ago•0 comments

Watch the World Getting Older

https://www.react-graph-gallery.com/example/population-pyramid
1•gmays•6m ago•0 comments

Industrial Archeology Image Archive

https://www.indarch.mtu.edu/
1•bookofjoe•7m ago•0 comments

Show HN: Shouldiuse.dev – software dependency health checker

https://shouldiuse.dev/
1•louis_w_gk•8m ago•0 comments

AAA eligibility errors in plain English

https://www.stedi.com/blog/aaa-eligibility-errors-in-plain-english
1•mooreds•8m ago•0 comments

Trump Says U.S. Ending Trade Talks with Canada

https://www.nytimes.com/2025/06/27/business/trump-ends-canada-trade-talks.html
3•ChrisArchitect•11m ago•1 comments

Fed chair Powell says AI will make significant changes to economy, labor market

https://www.theregister.com/2025/06/27/powell_ai_coming_for_your_job/
1•rntn•13m ago•0 comments

Ex-Doge employee 'Big Balls' gets new Trump administration position

https://www.theguardian.com/technology/2025/jun/27/doge-big-balls-trump-administration
1•mitchbob•16m ago•0 comments

Claude Code Commands Directory

https://claudecodecommands.directory/
2•ananddtyagi•17m ago•1 comments

Explainable Git diff for your ML models [OSS]

https://github.com/adrida/tarmac
1•adam_rida•18m ago•0 comments

7 People Now Have Elon Musk's Neuralink Brain Implant

https://www.pcmag.com/news/7-people-now-have-elon-musks-neuralink-brain-implant
1•edreib•21m ago•0 comments

Ask HN: Is this the future of operating systems?

1•amichail•22m ago•3 comments

Canada's Digital Services Tax Stays in Place Despite G-7 Deal

https://www.bloomberg.com/news/articles/2025-06-27/canada-s-digital-services-tax-stays-in-place-despite-g-7-deal
4•baby-yoda•28m ago•1 comments

Facebook will suggest AI-edited versions of the photos in your camera roll

https://techcrunch.com/2025/06/27/facebook-is-asking-to-use-meta-ai-on-photos-in-your-camera-roll-you-havent-yet-shared/
3•bundie•30m ago•2 comments

What Makes Europe Better Than America?

https://www.thefp.com/p/what-makes-europe-better-than-america
3•danielam•31m ago•0 comments

No Time to Learn React (2024)

https://www.keithcirkel.co.uk/i-dont-have-time-to-learn-react/
2•ingve•33m ago•2 comments

SVG Optimization and Accessibility Basics

https://dbushell.com/2025/06/25/svg-optimization-and-accessibility-basics/
1•tatersolid•33m ago•0 comments

Show HN: Split Vim Markdown Preview – Terminal-Based Markdown Preview for Vim

https://github.com/drewipson/glowing-vim-markdown-preview
1•drewipson•38m ago•0 comments

Run Coverage on Tests

https://hugovk.dev/blog/2025/run-coverage-on-tests/
2•todsacerdoti•40m ago•0 comments

Where Did CP852 Come From Again?

https://www.os2museum.com/wp/where-did-cp852-come-from-again/
1•ingve•40m ago•0 comments

Rails 8 Improvements

https://rubyonrails.org/2025/6/6/this-week-in-rails
2•andrewstetsenko•44m ago•0 comments

FigureMark: Simple syntax for marking up figures in Markdown documents

https://mattgemmell.scot/figuremark/
1•ingve•44m ago•0 comments

HP Wolf Security Threat Insights June 2025

https://threatresearch.ext.hp.com/hp-wolf-security-threat-insights-report-june-2025/
1•dexter_it•45m ago•0 comments

Show HN: GPU market is absurd! So I built a dashboard of pricing/restock trends

https://gpuisfine.singhkays.com
1•singhkays•48m ago•0 comments

Reaction wheels or control moment gyroscopes for your next space mission?

https://blog.satsearch.co/2025-06-06-reaction-wheels-or-control-moment-gyroscopes-which-should-you-choose-for-your-next-space-mission
1•kartikkumar•50m ago•0 comments

NASA's been pulling out of major astronomy meetings

https://www.space.com/astronomy/nasas-been-pulling-out-of-major-astronomy-meetings-and-scientists-are-feeling-the-effects
3•taubek•51m ago•1 comments