frontpage.

tilth gives AI agents structural code intelligence (tree-sitter definitions, callee resolution, smart outlining) via MCP. I benchmarked it on 21 code navigation tasks across 4 real repos (Express, FastAPI, Gin, ripgrep).

-> https://github.com/jahala/tilth

Results: Sonnet 4.5 — 26% cheaper per correct answer (79% → 86% accuracy). Opus 4.6 — 14% cheaper (and the only model+mode combo to crack the hardest task). Haiku 4.5 — 82% cheaper when forced to use tilth (69% → 100% accuracy at $0.04/answer).

We measure “cost per correct answer” — what you’d expect to spend before getting a usable answer under retry. A wrong answer isn’t a cheap success.

Interesting finding: smarter models adopt MCP tools voluntarily (Sonnet 95%, Opus 94%), but Haiku ignores them (9%). Instruction tuning didn’t help. Removing the overlapping built-in tools did.

https://github.com/jahala/tilth/blob/main/benchmark/README.m...

PS: I dont have the budget to run the benchmark a lot with Opus, so if any token whales has capacity to run some benchmarks, please feel free to PR results.

Show HN: Maravel-Framework 10.62.8 speeds up the console via commands:cache

My Nanbeige4.1 3B chat room can now generate micro applications [video]

Underrated Music Software – Royalty-Free

Dune II written in HTML5/JS

Show HN: Crypthold – Deterministic, Tamper-Evident Secure State Engine

Language models imply world models

Echoed.gg – Discord Alternative

GLM-5 topped the coding benchmarks. Then I used it

Show HN: PrivateWhisper – Run Whisper locally on macOS (offline transcription)

A minimal terminal coding agent harness

It Isn't the Tool, but the Hands – A Response to "Something Big Is Happening"

Dbt-Workbench, an open-source UI for working with dbt projects

Show HN: PolyMCP – A framework for building and orchestrating MCP agents

Dao Heart 3.11 Identity Preserving Value Evolution for Frontier AI Systems

Backboard.io Becomes First AI Platform to Lead Both Major Memory Benchmarks

Show HN: An automaton's code review of Gas Town with sycophancy-mode disabled

'RageCheck' Points Out Manipulative Language in News Articles

Ask HN: Hacker News Fixed Width for Widescreen Monitors" Userstyle?

Extend Trust Across the Software Supply Chain with Red Hat Trusted Libraries

CIA, Pentagon reviewed secret 'Havana syndrome' device in Norway, WaPo reports

I Analyzed 227M Rows of Medicaid Data. Here's a Sample of What I Found in Maine

AI: A Bridge Toward Diverse Intelligence

How to Write Mathematical Papers by Bruce C. Berndt [pdf]

Curosr: Expanding our long-running agents research preview

Show HN: Cappu – ADHD'er take on a different task manager

PlantNet; Identify, explore and share your observations of wild plants

Jeffrey Epstein spent years building ties to well-known hackers: Politico

Show HN: Logbooks, notebook computing for coding agents

Wazir Drop: a tournament winning board game AI engine

Siri, Alexa, ChatGPT, and OpenClaw: What's Different?

Show HN: Tilth v0.3 – 17% cheaper AI code navigation (279 runs, 3 Claude models)