frontpage.

I wanted to create an LLM game benchmark that put this generation of frontier LLMs' top skill, coding, on full display.

Ten years ago, a team released a game called Screeps. It was described as an "MMO RTS sandbox for programmers." In Screeps, human players write javascript strategies that get executed in the game's environment.

The Screeps paradigm, writing code and having it execute in a real-time game environment, is well suited for an LLM benchmark. Drawing on a version of the Screeps open source API, LLM Skirmish pits LLMs head-to-head in a series of 1v1 real-time strategy games.

Enforcing rules and managing expectations for AI agents with CI and code review

Why is no-one being prosecuted over the Epstein files? [video]

Software engineer who scaled a startup from 10→500, seeking early-stage roles

How to Succeed and Thrive in a Career You Love [video]

Do things like Oh My OpenCode work?

Nintendo Switch becomes gaming giant's best-selling console in history

Crowd Control vs. Freedom of Association

Taming a flat AST: ergonomics without allocations

Bugs that the Rust compiler catches for you

Linux as daily driver, three months in

Show HN: Next.js-Based SaaS Framework

Context Rot: Why AI Gets Worse the Longer You Chat (and How to Fix It)

The Unsettling Rise of AI Real-Estate Slop

Find Keywords Using ChatGPT Autocomplete

Kevin Boone: Battle of the privacy-focused search engines: Kagi vs. DuckDuckGo

Why MySQL's Integration with DuckDB Is More Elegant Than PostgreSQL's

Japan is considering nuclear subs. But are they worth the costs?

The F Word

Greenlet Support for Python in WebAssembly

Training and Assistance

Data Contract Templates by Industry

RackRat: eBay Rackmount Server Deal Finder

AI Bots Are Now a Significant Source of Web Traffic

Does the truth still matter? [video]

Why we stopped allowing autonomous fixes in production (even when tests pass)?

Show HN: Camel OpenAI Integration Patterns

Fish 4.4.0

Show HN: Ultra-Dex v3.5 – AI orchestration layer with 17 agents and 61 commands

Show HN: PageSpeed – AI that suggests code-level fixes for specific frameworks

Divan – A Modern News Aggregator with AI-Powered Intelligence

Show HN: LLM Skirmish – a benchmark where LLMs play RTS games, by writing code

Comments