frontpage.

Hi all, I wanted to share the leaderboard I have created and am working to rank LLM models. My results are very similar to those of ARC-AGI 2 with the only exception being that DeepSeek is rated higher on my leaderboard. In order to keep the test closed-source. The plan is that once the top models max out on a given task on our test then we will adopt new criteria to differentiate.

The test is currently comprised of 10 scores, 9 of which no model scores above 0 on. Check it out and let me know what you think! Thanks

Show HN: PaySentry – Open-source control plane for AI agent payments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

Pax Historia – User and AI powered gaming platform

Show HN: I built a RAG engine to search Singaporean laws

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

Porting Doom to My WebAssembly VM

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

Full-Blown Cross-Assembler in a Bash Script

Logic Puzzles: Why the Liar Is the Helpful One

Optical Combs Help Radio Telescopes Work Together

Show HN: Myanon – fast, deterministic MySQL dump anonymizer

The Tao of Programming

Forcing Rust: How Big Tech Lobbied the Government into a Language Mandate

PanelBench: We evaluated Cursor's Visual Editor on 89 test cases. 43 fail

Can You Draw Every Flag in PowerPoint? (Part 2) [video]

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

Show HN: Sem – Semantic diffs and patches for Git

Hello world does not compile

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

Show HN: Django N+1 Queries Checker

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

Protocol Validation with Affine MPST in Rust

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

Automatic Programming Returns

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

Show HN: ZetaCrush – An Intelligent LLM Leaderboard