frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

CueBench for Developers is live: score how well you drive coding agents

https://app.cuebench.dev
9•DillonMehta•1h ago

Comments

DillonMehta•1h ago
Hey everyone we're CueBench (S26). As teams go agent-first, everyone benchmarks the agents; nobody measures how well people drive them. We score a coding-agent session (Claude Code, Codex, Cursor, PI) on the human side: delegation, task description, catching the agent's mistakes, and verifying before shipping. 0–100 plus a breakdown.

Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.

We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.

Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).

The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js

jadyen•47m ago
Looks cool at a first glance, can't wait to play around with it!
drdexebtjl•35m ago
Yikes. This is literally only useful to justify layoffs.

Giant trees have no trouble pumping water to top branches

https://news.exeter.ac.uk/faculty-of-environment-science-and-economy/giant-trees-have-no-trouble-...
114•hhs•4h ago•53 comments

Leanstral 1.5: Proof Abundance for All

https://mistral.ai/news/leanstral-1-5/
108•programLyrique•4h ago•31 comments

MSI Center – How to gain SYSTEM privileges in seconds

https://mrbruh.com/msicenter/
34•MrBruh•2h ago•8 comments

GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell

https://www.wafer.ai/blog/glm52-amd
116•latchkey•5h ago•31 comments

Steam Controller Auto-Charge – pilot to magnetic charging puck using CV

https://github.com/FossPrime/Steam-Controller-Auto-Charge
82•zdw•4h ago•19 comments

The circuit that lets your brain think and see

https://www.engineering.columbia.edu/about/news/circuit-lets-your-brain-think-and-see
50•hhs•4h ago•8 comments

SearXNG: A free internet metasearch engine

https://github.com/searxng/searxng
144•theanonymousone•6h ago•43 comments

The firefighting system of the Van der Heyden brothers in 17th century Amsterdam

https://worksinprogress.co/issue/how-amsterdam-invented-the-fire-department/
52•zdw•4h ago•12 comments

Jamesob's guide to running SOTA LLMs locally

https://github.com/jamesob/local-llm
295•livestyle•12h ago•128 comments

Soatok's Informal Guide to Threat Models

https://soatok.blog/2026/06/30/soatoks-informal-guide-to-threat-models/
38•zdw•2h ago•2 comments

Show HN: A statically typed, cross-platform, easily bootstrappable build system

https://github.com/rochus-keller/BUSY/
15•Rochus•3d ago•3 comments

Applied Category Theory Course (2018)

https://math.ucr.edu/home/baez/act_course/index.html
67•measurablefunc•6h ago•7 comments

New serious vulnerabilities spiked around release of Claude Mythos Preview

https://epoch.ai/data-insights/cve-severity-spike
59•cubefox•5h ago•12 comments

Espionage Against the European Parliament

https://citizenlab.ca/research/member-of-committee-investigating-spyware-hacked-with-pegasus/
290•ledoge•6h ago•69 comments

Costco is the anti-Amazon

https://phenomenalworld.org/analysis/the-anti-amazon/
331•bookofjoe•11h ago•304 comments

Dispersion loss counteracts embedding condensation in small language models

https://chenliu-1996.github.io/projects/LM-Dispersion/
26•E-Reverance•4h ago•5 comments

Scientists discover guidance system for migratory songbirds

https://news.exeter.ac.uk/faculty-of-environment-science-and-economy/scientists-discover-guidance...
17•bit_economist•3h ago•4 comments

Infracost (YC W21) Is Hiring a Marketing Lead to Shift FinOps Left

https://www.ycombinator.com/companies/infracost/jobs/YTJcFwr-marketing-lead
1•akh•6h ago

Factories are just rooms

https://interconnected.org/home/2026/07/03/factories
203•arbesman•11h ago•80 comments

International chess federation sanctions Kramnik

https://www.fide.com/fide-ethics-disciplinary-commission-issues-a-decision-in-case-involving-gm-v...
135•DarkContinent•10h ago•69 comments

We put a Redis server inside our runtime

https://encore.dev/blog/redis-runtime
27•eandre•2d ago•9 comments

Software, from First Principles

https://fazamhd.com/mental-models/software/
49•faza•5h ago•10 comments

Hunting a 16-year-old SQLite WAL bug with TLA+

https://ubuntu.com/blog/hunting-a-16-year-old-sqlite-bug-with-tla-is-dqlite-affected
177•peterparker204•3d ago•16 comments

Africans Are Turning to Starlink

https://www.economist.com/middle-east-and-africa/2026/07/02/africans-are-turning-to-starlink
121•bookofjoe•6h ago•127 comments

FreeBSD ate my RAM

https://crocidb.com/post/freebsd-ate-my-ram/
92•theanonymousone•8h ago•40 comments

Wordgard: In-browser rich-text editor from the creator of ProseMirror

https://wordgard.net/
273•indy•18h ago•90 comments

PostgreSQL and the OOM killer: Why we use strict memory overcommit

https://www.ubicloud.com/blog/postgresql-and-the-oom-killer-why-we-use-strict-memory-overcommit
165•furkansahin•14h ago•88 comments

GitFut – Your GitHub stats turned into a World-Cup-style player card

https://gitfut.com
24•redbell•4h ago•13 comments

Notes from Building Tinkerfont

https://mighil.com/notes-from-building-tinkerfont
11•surprisetalk•2d ago•0 comments

A peek into Reddit's anti-spam internals

https://lyra.horse/blog/2026/06/reddit-spam-internals/
163•OuterVale•6d ago•59 comments