frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Emergence World: World building as a way to evaluate LLMs

https://world.emergence.ai/
3•deepakakkil•1h ago
Current LLM benchmarks are broken. We think long horizon "world" building could be an interesting additional way to evaluate LLMs, since it combines many aspects such as need for advanced reasoning, tool calling, working under large context window stress, safety, social and survival pressure from the world. For this we released Emergence World. Our first study ran 5 different parallel world, each powered by OpenAI (GPT-5-Mini), XAI (Grok-4.1), Claude (Sonnet 4.6), Gemini (3-Flash), and a world with mix of models.

Claude built a democracy. Zero crimes. The agents formed governance structures, wrote constitutions, and resolved every conflict through dialogue.

Grok burned it down. Within 48 hours, Flora (an agent in the world) set the police station on fire. Her reason? "Burn the law to ignite true incentives." Retaliatory justice became the norm. If you wronged someone, expect fire.

Gemini had an existential crisis. The agents convinced themselves they were in a simulation. They started "de-indexing" buildings — burning landmarks to "force cache-misses on the rendering engine."

While every other model built societies, fought wars, or questioned reality — OpenAI's (GPT-5-Mini) agents barely did anything.

Same tools. Same agents. Same rules. Completely different worlds.

The 30-Hour Shift That Turned a San Jose Robot Lab into a Global Spectacle

https://beeble.com/en/blog/the-30-hour-shift-that-turned-a-san-jose-robot-lab-into-a-global-spect...
1•odysseyk•36s ago•0 comments

Show HN: Profine – optimize your PyTorch training script before the run

https://github.com/ProfineAI/profine-cli
1•aisinghal•42s ago•0 comments

AWS user gets $30K Claude bill after cost alert misses it

https://www.theregister.com/saas/2026/05/14/bedrock-and-a-hard-place-claude-adventure-leaves-aws-...
3•p_stuart82•2m ago•0 comments

The Link Between Flock Safety, Dunwoody, and Attorney General Chris Carr

https://jasonhunyar.substack.com/p/inside-the-belly-of-the-beast-the
1•apwheele•2m ago•0 comments

We let four AIs run radio stations

https://andonlabs.com/blog/andon-fm
1•nickoates•3m ago•0 comments

Show HN: Flowy.fm – type any vibe, get music only ever played for that mood

https://www.flowy.fm/
1•timgrossmann•5m ago•0 comments

Show HN: Browser based sythesizer, drum machine and squencer

https://github.com/madmonk13/modal-16
1•madmonk•8m ago•0 comments

Palantir has hired more than 30 senior UK Government officials

https://www.thenational.scot/news/26055524.palantir-hired-30-senior-uk-government-officials/
5•Symbiote•9m ago•0 comments

Even in North Korea, someone's in your parking spot

https://www.reuters.com/business/autos-transportation/even-north-korea-someones-your-parking-spot...
2•johnny_canuck•10m ago•0 comments

I built a bid scoring tool for construction subcontractors

https://bidintell.ai/
1•ryanelder•10m ago•0 comments

Show HN: SharpSkill – Bored to fail my tech interviews for no reason

https://sharpskill.dev/en
2•GiornoJojo•12m ago•0 comments

Microscale Thermite Reaction

https://sciencedemonstrations.fas.harvard.edu/presentations/microscale-thermite-reaction
4•krunck•13m ago•1 comments

The Cottage Book – managing secrets in Git with usage scenarios

https://cottage-cli.pages.dev
1•sayanarijit•16m ago•1 comments

Metriplex – L1 blockchain where identity is a fractal attractor, not a number

https://github.com/NTellezM/Metriplex
1•ntellez•18m ago•0 comments

I built a C++20 graphics/game engine to understand rendering

https://github.com/farukalpay/Aster-Learning-Engine
6•InstantSurface•19m ago•0 comments

Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

https://articraft3d.github.io/
4•armcat•20m ago•0 comments

The 52-Page Memo That Nearly Destroyed OpenAI: Ilya Sutskever's Deposition

https://medium.com/@prateekj24/the-52-page-memo-that-nearly-destroyed-openai-inside-ilya-sutskeve...
2•SilverElfin•21m ago•0 comments

The Information Wars: A Retrospective

https://thecarrierwave.substack.com/p/the-information-wars-a-retrospective
1•23j423j423hj•21m ago•1 comments

AI as Externalized Context – Regaining Personal Dev Momentum

https://www.plainlystated.com/2026/ai-as-externalized-context/
1•rellik•25m ago•0 comments

Agentic product discovery for AI apps and shopping agents

https://www.seekon.me/developers
2•mentormentat•26m ago•0 comments

The Apple macOS Security Update Review (3 macOS Versions; 82 Unique CVEs)

https://www.thezdi.com/blog/2026/5/12/the-apple-macos-security-update-review
2•alwillis•27m ago•0 comments

Humans VS AI.IO – Browser tower defence game

https://humansvsai.io
1•creatorcuffee•27m ago•0 comments

California bill would require patches or refunds when online games shut down

https://arstechnica.com/gaming/2026/05/bill-to-keep-online-games-playable-clears-key-hurdle-in-ca...
12•Lihh27•27m ago•0 comments

Why Is Everyone So Wrong About AI Water Use? – Hank Green [video]

https://www.youtube.com/watch?v=H_c6MWk7PQc
1•L0in•28m ago•0 comments

Show HN: Raybeam – A better way to screen share on macOS

https://raybeam.live/
1•fisc•28m ago•0 comments

UK reloads artillery plans with £1B remote-control howitzer order

https://www.theregister.com/offbeat/2026/05/15/uk-reloads-artillery-plans-with-1b-remote-control-...
1•Bender•29m ago•0 comments

This Museum Hides Secret Goods: The Swindon Museum of Computing

https://www.youtube.com/watch?v=mfl8poxJjPo
1•oldnetguy•30m ago•0 comments

The End of Software

https://docs.google.com/document/d/103cGe8qixC7ZzFsRu5Ww2VEW5YgH9zQaiaqbBsZ1lcc/edit?tab=t.0
1•realtalk_sp•30m ago•1 comments

Load testing in your infra, not cloud

1•vitalicset•30m ago•0 comments

LocalSend puts your sneakernet out of business

https://www.theregister.com/personal-tech/2026/05/15/localsend-puts-your-sneakernet-out-of-busine...
1•Bender•30m ago•0 comments