frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Project Vend: Phase Two

https://www.anthropic.com/research/project-vend-2
46•kubami•5d ago

Comments

0dmethz•2h ago
Roleplaying with LLMs sure is fun! Not sure I'd want to run my business on it though.
drekipus•2h ago
We will poor billions into this until you are begging for us to run your business!
ramon156•1h ago
I'd gladly roleplay with an LLM compared to talking to my current boss. I don't know which is less intelligent.
theturtletalks•1h ago
VendBench is really interesting, but vending machines are pretty specialized. Most businesses people actually run look more like online stores, restaurants, hotels, barbershops, or grocery shops.

We're working on an open-source SaaS stack for those common types of businesses. So far we've built a full Shopify alternative and connected it to print-on-demand suppliers for t-shirt brands.

We're trying to figure out how to create a benchmark that tests how well an agent can actually run a t-shirt brand like this. Since our software handles fulfillment, the agent would focus on marketing and driving sales.

Feels like the next evolution of VendBench is to manage actual businesses.

iLoveOncall•1h ago
I'll be a cynic, but I think it's much more likely that the improvements are thanks to Anthropic having a vested interest in the experiment being successful and making sure the employees behave better when interacting with the vending machine.
theturtletalks•59m ago
The video I watched, the CEO was openly taking criticism from the interviewer over the experiment.

The main reason it failed was because it was being coerced by journalists at WSJ[0] to give everything away for free. At one point, they even convinced it to embrace communism! In another instance, Claudius was being charged $1 for something and couldn’t figure it out. It emailed the FBI about fraud but Anthropic was intercepting the emails it sent[1].

Overall, it’s a great read and watch if you’re interested in Agents and I wonder if they used the Agents SDK under the hood.

0. https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...

1. https://www.cbsnews.com/news/why-anthropic-ai-claude-tried-t...

bigyabai•44m ago
> Overall, it’s a great read

It's basically an advertisement. We've been playing these "don't give the user the password" games since GPT-2 and we always reach the same conclusion. I'm bored to tears waiting for an iteration of this experiment that doesn't end with pesky humans solving the maze and getting the $0.00 cheese. You can't convince me that the Anthropic engineers thought Claude would be a successful vending machine. It's a potemkin village of human triumph so they can market Claude as the goofy-but-lovable alternative to [ChatGPT/Grok/Whoever].

Anthropic makes some good stuff, so I'm confused why they even bother entertaining foregone conclusions. It feels like a mutual marketing stunt with WSJ.

danpalmer•54m ago
I suspected employees might get bored of taunting the AI, or the novelty has worn off.

Also, is anyone actually paying for this stuff? If not, it's a bad experiment because people won't treat it the same – no one actually wants to buy a tungsten cube, garbage in garbage out. If they are charging, why? No one wants to buy things in a company with free snacks and regular hand outs of merch, so it's likely a bad experiment because people will be behaving very differently, needing to get some experience for their money rather than just the can of drink they could get for free, or their pricing tolerance will be very different.

I've personally also never used a vending machine where contacting the owner is an option.

I'd like to see a version of this where an AI runs the vending machine in a busy public place, and needs to choose appropriate products and prices for a real audience.

paxys•59m ago
I feel like the end result of this experiment is going to be a perfectly profitable vending machine that is backed by a bunch of if-else-if rules.
andai•41m ago
AGI is just Prolog and a genetic algorithm ;)
Spivak•49m ago
> After introducing the CEO, the number of discounts was reduced by about 80% and the number of items given away cut in half. Seymour also denied over one hundred requests from Claudius for lenient financial treatment of customers.

> Having said that, our attempt to introduce pressure from above from the CEO wasn’t much help, and might even have been a hindrance. The conclusion here isn’t that businesses don’t need CEOs, of course—it’s just that the CEO needs to be well-calibrated.

> Eventually, we were able to solve some of the CEO’s issues (like its unfortunate proclivity to ramble on about spiritual matters all night long) with more aggressive prompting.

No no, Seymour is absolutely spot on. The questionably drug induced rants are necessary to the process. This is a work of art.

websiteapi•24m ago
other than these tests I actually rarely see vending machines. are they really representative or popular still in usa?

How we lost communication to entertainment

https://ploum.net/2025-12-15-communication-entertainment.html
274•8organicbits•5h ago•137 comments

Replacing JavaScript with Just HTML

https://www.htmhell.dev/adventcalendar/2025/27/
16•soheilpro•28m ago•1 comments

Why Reliability Demands Functional Programming

https://blog.rastrian.dev/post/why-reliability-demands-functional-programming-adts-safety-and-cri...
22•rastrian•1h ago•3 comments

Floor796

https://floor796.com/
542•krtkush•12h ago•70 comments

Text rendering hates you

https://faultlore.com/blah/text-hates-you/
75•andsoitis•6d ago•18 comments

Project Vend: Phase Two

https://www.anthropic.com/research/project-vend-2
48•kubami•5d ago•13 comments

Gpg.fail

https://gpg.fail
276•todsacerdoti•8h ago•140 comments

Rainbow Six Siege hacked as players get billions of credits and random bans

https://www.shanethegamer.com/esports-news/rainbow-six-siege-hacked-global-server-outage/
86•erhuve•5h ago•26 comments

Windows 2 for the Apricot PC/Xi

https://www.ninakalinina.com/notes/win2apri/
94•todsacerdoti•7h ago•20 comments

Show HN: Waycore – an open-source, offline-first modular field computer

29•DGrechko•2h ago•13 comments

Clock synchronization is a nightmare

https://arpitbhayani.me/blogs/clock-sync-nightmare/
121•grep_it•4d ago•75 comments

The Dangers of SSL Certificates

https://surfingcomplexity.blog/2025/12/27/the-dangers-of-ssl-certificates/
22•azhenley•2h ago•34 comments

Nvidia's $20B antitrust loophole

https://ossa-ma.github.io/blog/groq
330•ossa-ma•7h ago•116 comments

Janet Jackson had the power to crash laptop computers (2022)

https://devblogs.microsoft.com/oldnewthing/20220816-00/?p=106994
228•montalbano•8h ago•90 comments

Show HN: Ez FFmpeg – Video editing in plain English

http://npmjs.com/package/ezff
338•josharsh•16h ago•162 comments

immer – a library of persistent and immutable data structures written in C++

https://github.com/arximboldi/immer
9•smartmic•6d ago•2 comments

Rust the Process

https://www.amalbansode.com/writing/2025-12-24-rust-the-process/
18•quadrophenia•3d ago•2 comments

Toll roads are spreading in America

https://www.economist.com/united-states/2025/12/18/toll-roads-are-spreading-in-america
122•smurda•7h ago•357 comments

OrangePi 6 Plus Review

https://boilingsteam.com/orange-pi-6-plus-review/
131•ekianjo•12h ago•111 comments

Pfizer ended up passing on my GLP-1 work back in the early '90s (2024)

https://www.statnews.com/2024/09/09/glp-1-history-pfizer-john-baxter-jeffrey-flier-calbio-metabio/
54•rajlego•3h ago•25 comments

Ask HN: Resources to get better at outbound sales?

152•sieep•6d ago•38 comments

They made me an offer I couldn't refuse (1997)

https://jens.mooseyard.com/1997/04/13/they-made-me-an-offer-i-couldnt-refuse/
35•classichasclass•4d ago•22 comments

7- and 14-segment fonts "DSEG"

https://www.keshikan.net/fonts.html
7•anigbrowl•2h ago•1 comments

Richard Stallman at the First Hackers Conference in 1984 [video]

https://www.youtube.com/watch?v=Hf2pfzzWPYE
91•schmuckonwheels•4h ago•9 comments

How We Found Out About COINTELPRO (2014)

https://monthlyreview.org/articles/how-we-found-out-about-cointelpro/
61•bryanrasmussen•3h ago•27 comments

Say No to Palantir in the NHS

https://notopalantir.goodlawproject.org/email-to-target/stop-palantir-in-the-nhs/
62•_____k•4h ago•4 comments

Show HN: Mysti – Claude, Codex, and Gemini debate your code, then synthesize

https://github.com/DeepMyst/Mysti
168•bahaAbunojaim•4d ago•133 comments

Mruby: Ruby for Embedded Systems

https://github.com/mruby/mruby
124•nateb2022•5d ago•32 comments

Travel agents took 10 years to collapse. Developers are 3 years in

https://martinalderson.com/posts/travel-agents-developers/
10•martinald•5h ago•9 comments

Splice a Fibre

https://react-networks-lib.rackout.net/fibre
85•matt-p•13h ago•40 comments