frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: LLM Skirmish – a benchmark where LLMs play RTS games, by writing code

https://llmskirmish.com
4•__cayenne__•1h ago
I wanted to create an LLM game benchmark that put this generation of frontier LLMs' top skill, coding, on full display.

Ten years ago, a team released a game called Screeps. It was described as an "MMO RTS sandbox for programmers." In Screeps, human players write javascript strategies that get executed in the game's environment.

The Screeps paradigm, writing code and having it execute in a real-time game environment, is well suited for an LLM benchmark. Drawing on a version of the Screeps open source API, LLM Skirmish pits LLMs head-to-head in a series of 1v1 real-time strategy games.

Comments

zztank•1h ago
Oof, gonna go sell my Google position.

Such fascinating results and a cool way to design a benchmark

Enforcing rules and managing expectations for AI agents with CI and code review

https://rubyonai.com/how-do-you-know-the-software-is-working/
1•marcinos•1m ago•0 comments

Why is no-one being prosecuted over the Epstein files? [video]

https://www.bbc.com/news/videos/cd9e3nzzw3zo
1•petethomas•1m ago•0 comments

Software engineer who scaled a startup from 10→500, seeking early-stage roles

1•vampiregrey•2m ago•0 comments

How to Succeed and Thrive in a Career You Love [video]

https://www.youtube.com/watch?v=xmYekD6-PZ8
1•samixg•2m ago•0 comments

Do things like Oh My OpenCode work?

https://github.com/code-yeongyu/oh-my-opencode
1•tifa2up•2m ago•0 comments

Nintendo Switch becomes gaming giant's best-selling console in history

https://www.bbc.co.uk/news/articles/ckglk543x3go
1•rwmj•2m ago•0 comments

Crowd Control vs. Freedom of Association

https://www.opb.org/article/2026/02/03/judge-limits-federal-officer-use-of-force-portland-ice-pro...
1•cwmoore•3m ago•0 comments

Taming a flat AST: ergonomics without allocations

http://modern-c.blogspot.com/2026/02/taming-flat-ast-ergonomics-in-age-of.html
1•fanf2•3m ago•0 comments

Bugs that the Rust compiler catches for you

https://kerkour.com/bugs-rust-compiler-helps-prevent
2•redcannon218•4m ago•0 comments

Linux as daily driver, three months in

https://benovermyer.com/blog/2026/02/linux-as-daily-driver-three-months-in/
1•speckx•6m ago•0 comments

Show HN: Next.js-Based SaaS Framework

https://nextjs-boilerplate.com/nextjs-multi-tenant-saas-boilerplate
1•creativedg•6m ago•0 comments

Context Rot: Why AI Gets Worse the Longer You Chat (and How to Fix It)

https://www.producttalk.org/context-rot/
1•swolpers•7m ago•0 comments

The Unsettling Rise of AI Real-Estate Slop

https://www.theatlantic.com/culture/2026/02/real-estate-listing-ai-slop/685871/
1•nlawalker•9m ago•2 comments

Find Keywords Using ChatGPT Autocomplete

https://www.kwrds.ai/chatgpt
1•seo_god•9m ago•0 comments

Kevin Boone: Battle of the privacy-focused search engines: Kagi vs. DuckDuckGo

https://kevinboone.me/kagi_ddg.html
1•speckx•10m ago•0 comments

Why MySQL's Integration with DuckDB Is More Elegant Than PostgreSQL's

https://www.linkedin.com/top-content/
1•baotiao•10m ago•0 comments

Japan is considering nuclear subs. But are they worth the costs?

https://www.japantimes.co.jp/news/2026/01/28/japan/japan-challenges-nuclear-submarines/
3•PaulHoule•10m ago•0 comments

The F Word

http://muratbuffalo.blogspot.com/2026/02/friction.html
1•zdw•11m ago•0 comments

Greenlet Support for Python in WebAssembly

https://wasmer.io/posts/greenlet-support-python-wasm
4•syrusakbary•11m ago•0 comments

Training and Assistance

1•james_r_h•12m ago•1 comments

Data Contract Templates by Industry

https://soda.io/templates
1•santiviquez•12m ago•0 comments

RackRat: eBay Rackmount Server Deal Finder

https://rackrat.net
3•petercooper•12m ago•1 comments

AI Bots Are Now a Significant Source of Web Traffic

https://www.wired.com/story/ai-bots-are-now-a-signifigant-source-of-web-traffic/
2•ironyman•12m ago•1 comments

Does the truth still matter? [video]

https://www.youtube.com/watch?v=rnEAAFeE3W4
1•flawn•13m ago•0 comments

Why we stopped allowing autonomous fixes in production (even when tests pass)?

1•v_CodeSentinal•14m ago•0 comments

Show HN: Camel OpenAI Integration Patterns

https://github.com/ibek/camel-openai-patterns
1•aivi•14m ago•0 comments

Fish 4.4.0

https://github.com/fish-shell/fish-shell/releases/tag/4.4.0
2•voxadam•15m ago•0 comments

Show HN: Ultra-Dex v3.5 – AI orchestration layer with 17 agents and 61 commands

https://github.com/Srujan0798/Ultra-Dex
1•maya0769•16m ago•0 comments

Show HN: PageSpeed – AI that suggests code-level fixes for specific frameworks

https://pagespeed.deployhq.com
3•deployhq•17m ago•1 comments

Divan – A Modern News Aggregator with AI-Powered Intelligence

3•iedayan03•19m ago•0 comments