Professors Staffed a Fake Company with AI Agents, Guess What Happened?

https://futurism.com/professors-company-ai-agents

12•Capstanlqc•2h ago

Comments

vintagedave•1h ago

Clickbait headline, and it's reporting something from Business Insider (itself IMO a terrible website these days), but:

> the results were dismal. The best-performing model was Anthropic's Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it. The study's authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task.

and other AIs were worse.

sokoloff•50m ago

$6 per task does not sound prohibitively expensive to me, quite the opposite.

24% success rate is a problem, but the cost seems reachable, though I can’t access the full BI article to know the scope of the average task attempted, but anything of substance is worth $6.

mapt•1h ago

It ended humanity's existence? No?

Not yet? Okay. Good. In fact, great! I like existing.

For now.

"Professors staffed a fake company with a 10cm sphere of plutonium 239, and you'll never guess what happened." Egg on their face, I'm sure.

Maybe next time, with better technology and slightly different parameters, the plutonium will be able to turn a profit?

CommenterPerson•1h ago

> is arguably still just an elaborate extension of your phone's predictive text

Nailed it. It seems to be doing a good job of helping coders and document writers. It seems to be great at solving protein folding. Other than that, I'm not so sure.

saithound•50m ago

CMU professors can't build AI agents, and decide to brag about it. That's the article.

"We tried something, and we couldn't make it work. Therefore it must be impossible to do."

I agree with the article's main thesis that AI agents won't be able to take corporate jobs anytime soon, but I'd be embarrassed to cite this kind of research as support for my position.

foldr•22m ago

It’s not entirely clear from the write up in the article, but it sounds like this was intended as a test of existing “off the shelf” AI agent models. In other words, the aim is to find out what happens if you try to use the existing commercially available technology (which of course is what most people would be doing).

jgalt212•30m ago

Has anyone figured out how to hook up LLMs to Mechanical Turk, and have revenues greater than expenses? Or is this akin to the net energy problem in fusion?

Making the Centrifuge

The Rise of AI in Factories [video]

Augmentation / Replacement

Ask HN: Does machine learning make sense if GenAI keep evolving?

The state of Kubernetes jobs in 2025 Q1

Side-Effects Are the Complexity Iceberg • Kris Jenkins • YOW 2024 [video]

The Ukraine War and the Kill Market

CrashFixer: A crash resolution agent for the Linux kernel

Show HN: I built CLI to migrate Lovable React apps to Next.js–no rewrites needed

Internet usage pattern during power outage in Spain and Portugal

'I want to give my island business away for free'

The Death of Daydreaming: What we lose when phones take away boredom

U.S. State Autism Databases

Third-party trackers and data-as-payment in government infrastructure

I built Mood2Video – turn your mood into a short AI video (inspired by Marc Lou)

Unexpectedly high prevalence of familial Mediterranean fever in Slovakia

Management Habits Burning Out Your Best Engineers

Building a more accessible GitHub CLI

Show HN: Reverse Pac-Man

Layton Puzzle in IDP-Z3

Ever wondered why Gmail search fails to find text you're sure is present?

How we saved time (and money) on continuous integration

Carmakers Are Embracing Physical Buttons Again

How the climate crisis threatens Indigenous traditions in Canada

MASL – Metadata for Arbitrary Structures and Links

The Ins and Outs of Labor Market Adjustment to Globalization [pdf]

Learning Large-Scale Competitive Team Behaviors with Mean-Field Interactions

Buy bundles vs. Monthly recurring pricing for my first SaaS

Why Apple still lets malformed media files reach decoders – and how to stop it

A Man Who Tried to Redeem the World with Logic