frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Cheddar-bench – unsupervised benchmark for coding agents

https://github.com/przadka/cheddar-bench
5•przadka•1h ago
I built a small benchmark to test CLI coding agents on blind bug detection.

A challenger agent injects bugs and writes ground truth (`bugs.json`). A different reviewer agent audits the repo without seeing ground truth, and an LLM matcher scores bug-to-finding assignments.

Current run: 50 repos, 150 challenges, 450 reviews, 2,603 injected bugs.

Weighted detection: Claude 58.05%, Codex 37.84%, Gemini 27.81%.

LLM-judge benchmarks are easy to get wrong, so I’d really appreciate critical feedback on benchmark fairness, scoring/matching methodology, and obvious failure modes I’m missing.

Full dataset is linked in the docs.

RSS-Librarian: A read-it-later service for RSS purists

https://github.com/thefranke/rss-librarian
1•thefranke•59s ago•1 comments

Observations from Building with AI Agents

https://tomtunguz.com/9-observations-using-ai-agents/
1•vinhnx•1m ago•0 comments

Where's software going? Is software dead?

https://registerspill.thorstenball.com/p/joy-and-curiosity-75
1•linhns•1m ago•0 comments

Repeating Prompts

https://daoudclarke.net/2026/02/19/repeating-prompt
1•vinhnx•2m ago•0 comments

Does Syntax Matter?

https://www.gingerbill.org/article/2026/02/21/does-syntax-matter/
1•vrnvu•4m ago•0 comments

Money Transfer in Chat

https://s2transfer.xyz
1•edonderguti•8m ago•1 comments

Git's Magic Files

https://nesbitt.io/2026/02/05/git-magic-files.html
1•chmaynard•8m ago•0 comments

Does Opus 4.6 find the needle in the haystack?

https://georggrab.net/content/opus46retrieval.html
1•grey-area•8m ago•1 comments

Show HN: A virtual Zen garden for vibe coding

https://silentsand.me/
1•brotmitkot•11m ago•0 comments

Show HN: ByePhone- An AI assistant to automate tedious phone calls

https://byephone.io/
1•gitpullups•12m ago•1 comments

Show HN: Approve Claude Code permission requests from your phone via ntfy

1•yuu1ch13•13m ago•0 comments

Browse, preview and install 460 Ghostty terminal themes in one click

https://ghostty-style.vercel.app/
1•dhruv_ahuja•14m ago•0 comments

A 26-Gram Butterfly-Inspired Robot Achieving Autonomous Tailless Flight

https://arxiv.org/abs/2602.06811
1•Terretta•14m ago•0 comments

Show HN: Finnish Humanizer – 26 patterns for detecting AI-generated Finnish text

https://github.com/Hakku/finnish-humanizer
1•HarriSipola•19m ago•0 comments

Wonderful vi

https://world.hey.com/dhh/wonderful-vi-a1d034d3
3•tosh•23m ago•0 comments

Scipy.stats. Chatterjeexi

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chatterjeexi.html
1•kamaraju•26m ago•0 comments

The engineering behind GitHub Copilot CLI's animated ASCII banner

https://github.blog/engineering/from-pixels-to-characters-the-engineering-behind-github-copilot-c...
1•magoghm•28m ago•0 comments

Iran students stage first large anti-government protests since deadly crackdown

https://www.bbc.com/news/articles/c5yj2kzkrj0o
4•tartoran•30m ago•0 comments

Show HN: SergioAI – Trello bot with Claude that reviews PRDs and opens draft PRs

https://github.com/Belfio/sergio
1•albelfio•30m ago•0 comments

Show HN: Run 10 AI coding agents in parallel–each opens a PR when done

https://paragent.app/
1•akad•30m ago•0 comments

Show HN: Aethene – Open-source AI memory layer

https://github.com/akhilponnada/aethene
1•akhilponnada•32m ago•0 comments

Show HN: ClawHuddle – Self-hosted OpenClaw management for teams

1•allenhsutw•32m ago•0 comments

Show HN: OpenBrowser MCP: Give your AI agent a real efficient browser

https://openbrowser.me/
1•billy-enrizky-1•32m ago•0 comments

I put New Zealand behind a $1 paywall

https://rename.world/
6•kafked•32m ago•1 comments

The AI apocalypse for enshitification has started

https://old.reddit.com/r/selfhosted/comments/1rbkx5e/large_us_company_came_after_me_for_releasing_a/
2•rhspeer•32m ago•1 comments

Reverse-engineered Twitter API with full client impersonation

https://emusks.tiago.zip/
2•tiagorangel•34m ago•1 comments

OpenQ4: Open-source reimplementation of Quake 4 engine

https://github.com/themuffinator/OpenQ4
1•klaussilveira•36m ago•0 comments

What podcasts are you listening to?

1•thomk•37m ago•0 comments

Show HN: CrewForge - A share room where humans and agents think out loud

1•rexopia•38m ago•0 comments

Show HN: TLA+ Workbench skill for coding agents (compat. with Vercel skills CLI)

https://github.com/younes-io/agent-skills/tree/main/skills/tlaplus-workbench
3•youio•45m ago•1 comments