frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: A real-time strategy game that AI agents can play

https://llmskirmish.com/
1•__cayenne__•1h ago

Comments

__cayenne__•1h ago
I've liked all the projects that put LLMs into game environments. It's been a weird juxtaposition, though: frontier LLMs can one-shot full coding projects, and those same models struggle to get out of Pokémon Red's Mt. Moon.

Because of this, I wanted to create a game environment that put this generation of frontier LLMs' top skill, coding, on full display.

Ten years ago, a team released a game called Screeps. It was described as an "MMO RTS sandbox for programmers." The Screeps paradigm of writing code and having it executed in a real-time game environment is well suited to LLMs. Drawing on a version of the Screeps open source API, LLM Skirmish pits LLMs head-to-head in a series of 1v1 real-time strategy games.

In my testing I found that Claude Opus 4.5 was the most dominant model, but it showed weakness in round 1 as it was overly focused on its in-game economy. Meanwhile, I probably spent a third of all code on sandbox hardening because GPT 5.2 kept trying to cheat by pre-reading its opponent's strategies.

If there's interest, I'm planning on doing a round of testing with the latest generation of LLMs (Claude 4.6 Opus, GPT 5.3 Codex, etc.).

You can run local matches via CLI. I'm running a hosted match runner with Google Cloud Run that uses isolated-vm. The match playback visualizer is statically served from Cloudflare.

I've created a community ladder that you can submit strategies to via CLI, no auth required. I've found that the CLI plus the skill.md that's available has been enough for AI agents to immediately get started.

Website: https://llmskirmish.com

API docs: https://llmskirmish.com/docs

GitHub: https://github.com/llmskirmish/skirmish

A video of a match: https://www.youtube.com/watch?v=lnBPaZ1qamM

Taming Claude Code: Taking Back Control

https://saeedesmaili.com/posts/taming-claude-code-taking-back-control/
1•saeedesmaili•29s ago•0 comments

86% of Americans want Meta, Google held accountable for 'predatory' social media

https://nypost.com/2026/02/17/business/86-of-americans-want-meta-google-held-accountable-for-soci...
1•1vuio0pswjnm7•36s ago•0 comments

Climber on trial for leaving girlfriend to die on Austria's highest mountain

https://www.bbc.com/news/articles/c5yv9plyjgpo
1•tartoran•2m ago•0 comments

Show HN: AFS – filesystem-native memory layer for AI agents

1•thompson0012•2m ago•0 comments

Show HN: allsee – fast cross-platform file search built with Rust and Tauri

https://github.com/TeodorZlatanov/allsee
1•tzlatanov•2m ago•0 comments

A 5-20x faster experimental Homebrew alternative

https://github.com/lucasgelfond/zerobrew
1•nkjoep•3m ago•0 comments

Software as Wiki, Mutable Software

https://blog.exe.dev/software-as-wiki
1•gmays•3m ago•0 comments

"AI Agent Standards Initiative" for Interoperable and Secure Innovation

https://www.nist.gov/news-events/news/2026/02/announcing-ai-agent-standards-initiative-interopera...
1•ChrisArchitect•3m ago•0 comments

Unity CEO says an upcoming Beta will allow to "prompt full casual games"

https://www.gamingonlinux.com/2026/02/unity-ceo-says-an-upcoming-beta-will-allow-people-to-prompt...
1•dulvui•3m ago•0 comments

Finding a non-square mod p

https://www.johndcook.com/blog/2026/02/14/finding-a-non-square/
1•ibobev•3m ago•0 comments

Finding a square root of -1 mod p

https://www.johndcook.com/blog/2026/02/14/square-root-minus-1-mod-p/
1•ibobev•4m ago•0 comments

Wagon's Algorithm in Python

https://www.johndcook.com/blog/2026/02/14/wagons-algorithm-in-python/
1•ibobev•4m ago•0 comments

Show HN: Mailpeek – Vue.js email preview component (Gmail/Outlook rendering)

https://github.com/mailpeek/mailpeek
1•ashannon•5m ago•0 comments

Google's Lyria 3 AI music model is coming to Gemini today

https://arstechnica.com/google/2026/02/gemini-can-now-generate-ai-music-for-you-no-lyrics-required/
1•AndrewDucker•5m ago•0 comments

Show HN: AgentDX – Open-source linter and LLM benchmark for MCP servers

https://github.com/agentdx/agentdx
1•yamarldfst•5m ago•0 comments

Zero-day CSS: CVE-2026-2441 exists in the wild

https://chromereleases.googleblog.com/2026/02/stable-channel-update-for-desktop_13.html
5•idoxer•5m ago•0 comments

The Scarcity Trap: Why AI Still Feels Like a Metered Utility

https://productics.substack.com/p/the-scarcity-trap-why-ai-still-feels
1•gmays•6m ago•0 comments

Show HN: A Unix environment in a single HTML file (420 KB)

https://shiro.computer/show
4•sagebird•7m ago•1 comments

Show HN: Sher – Instant Preview Environments

https://sher.sh
1•andout_•8m ago•0 comments

Becoming a Research Engineer at a Big LLM Lab: 18 Months of Strategic Career Dev

https://www.maxmynter.com/pages/blog/jobhunt
1•evakhoury•9m ago•0 comments

Token_ledger – Ruby gem for auditable token accounting in Rails

https://github.com/wuliwong/token_ledger
1•wuliwong•9m ago•1 comments

Heaper: Local-first PKM for all filetypes with multi-device sync, tags and links

https://heaper.de/
1•gessha•10m ago•0 comments

WhatsApp Invoice Bot

1•ayoolafelix•10m ago•0 comments

How Does Shazam Know What Song Is Playing?

https://paraschopra.github.io/explainers/fourier-transform/index.html
2•Brajeshwar•10m ago•0 comments

Robot hand approaches human-like dexterity with new visual-tactile training

https://techxplore.com/news/2026-02-robot-approaches-human-dexterity-visual.html
1•Brajeshwar•11m ago•0 comments

Show HN: Axon – Agentic AI with mandatory user approval and audit logging

https://github.com/NeuroVexon/axon-community
1•NeuroVexon•11m ago•1 comments

Hacking Cloudflare's AI Playground

https://kazama.in/ai-playground-xss-to-mcp-takeover/
2•matured_kazama•12m ago•0 comments

I ported Luanti (Minetest) to WASM for a browser-based, FOSS P2P game night

1•kaesual•12m ago•0 comments

Show HN: Polyoracle – Polymarket signal monitor with KL-divergence scoring

https://github.com/rewired-gh/polyoracle
1•y_rewired•12m ago•0 comments

Don't use AI for randomness, use the atmosphere (Since 1998)

https://www.random.org/history/
1•sollewitt•14m ago•0 comments