frontpage.

Current LLM benchmarks are broken. We think long horizon "world" building could be an interesting additional way to evaluate LLMs, since it combines many aspects such as need for advanced reasoning, tool calling, working under large context window stress, safety, social and survival pressure from the world. For this we released Emergence World. Our first study ran 5 different parallel world, each powered by OpenAI (GPT-5-Mini), XAI (Grok-4.1), Claude (Sonnet 4.6), Gemini (3-Flash), and a world with mix of models.

Claude built a democracy. Zero crimes. The agents formed governance structures, wrote constitutions, and resolved every conflict through dialogue.

Grok burned it down. Within 48 hours, Flora (an agent in the world) set the police station on fire. Her reason? "Burn the law to ignite true incentives." Retaliatory justice became the norm. If you wronged someone, expect fire.

Gemini had an existential crisis. The agents convinced themselves they were in a simulation. They started "de-indexing" buildings — burning landmarks to "force cache-misses on the rendering engine."

While every other model built societies, fought wars, or questioned reality — OpenAI's (GPT-5-Mini) agents barely did anything.

Same tools. Same agents. Same rules. Completely different worlds.

The 30-Hour Shift That Turned a San Jose Robot Lab into a Global Spectacle

Show HN: Profine – optimize your PyTorch training script before the run

AWS user gets $30K Claude bill after cost alert misses it

The Link Between Flock Safety, Dunwoody, and Attorney General Chris Carr

We let four AIs run radio stations

Show HN: Flowy.fm – type any vibe, get music only ever played for that mood

Show HN: Browser based sythesizer, drum machine and squencer

Palantir has hired more than 30 senior UK Government officials

Even in North Korea, someone's in your parking spot

I built a bid scoring tool for construction subcontractors

Show HN: SharpSkill – Bored to fail my tech interviews for no reason

Microscale Thermite Reaction

The Cottage Book – managing secrets in Git with usage scenarios

Metriplex – L1 blockchain where identity is a fractal attractor, not a number

I built a C++20 graphics/game engine to understand rendering

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

The 52-Page Memo That Nearly Destroyed OpenAI: Ilya Sutskever's Deposition

The Information Wars: A Retrospective

AI as Externalized Context – Regaining Personal Dev Momentum

Agentic product discovery for AI apps and shopping agents

The Apple macOS Security Update Review (3 macOS Versions; 82 Unique CVEs)

Humans VS AI.IO – Browser tower defence game

California bill would require patches or refunds when online games shut down

Why Is Everyone So Wrong About AI Water Use? – Hank Green [video]

Show HN: Raybeam – A better way to screen share on macOS

UK reloads artillery plans with £1B remote-control howitzer order

This Museum Hides Secret Goods: The Swindon Museum of Computing

The End of Software

Load testing in your infra, not cloud

LocalSend puts your sneakernet out of business

Show HN: Emergence World: World building as a way to evaluate LLMs