news newest ask show jobs

Open Source @Github

fp.

Open in hackernews

CueBench for Developers is live: score how well you drive coding agents

https://app.cuebench.dev

9•DillonMehta•1h ago

Comments

DillonMehta•1h ago

Hey everyone we're CueBench (S26). As teams go agent-first, everyone benchmarks the agents; nobody measures how well people drive them. We score a coding-agent session (Claude Code, Codex, Cursor, PI) on the human side: delegation, task description, catching the agent's mistakes, and verifying before shipping. 0–100 plus a breakdown.

Scoring is deterministic, built on measurable signals from the session, not an LLM vibing on your transcript. Same session, same score.

We just opened a public demo and need real sessions thrown at it. Nothing to install, nothing runs on your machine, just upload a session file from your agent's logs (or paste one terminal command) and you get scored in seconds.

Where it's going: a product for engineering orgs — session-level feedback that upskills engineers at agent-driven development, and gives managers a skills signal (coaching, not surveillance).

The ask: run one real session through it this week and tell us where the score feels wrong. Brutal > polite. Demo video: https://youtu.be/r9vAdAMv6js

jadyen•47m ago

Looks cool at a first glance, can't wait to play around with it!

drdexebtjl•35m ago

Yikes. This is literally only useful to justify layoffs.

Giant trees have no trouble pumping water to top branches

https://news.exeter.ac.uk/faculty-of-environment-science-and-economy/giant-trees-have-no-trouble-...

114•hhs•4h ago•53 comments

Leanstral 1.5: Proof Abundance for All

https://mistral.ai/news/leanstral-1-5/

108•programLyrique•4h ago•31 comments

MSI Center – How to gain SYSTEM privileges in seconds

https://mrbruh.com/msicenter/

34•MrBruh•2h ago•8 comments

GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell

https://www.wafer.ai/blog/glm52-amd

116•latchkey•5h ago•31 comments

Steam Controller Auto-Charge – pilot to magnetic charging puck using CV

https://github.com/FossPrime/Steam-Controller-Auto-Charge

82•zdw•4h ago•19 comments

The circuit that lets your brain think and see

https://www.engineering.columbia.edu/about/news/circuit-lets-your-brain-think-and-see

50•hhs•4h ago•8 comments

SearXNG: A free internet metasearch engine

https://github.com/searxng/searxng

144•theanonymousone•6h ago•43 comments

The firefighting system of the Van der Heyden brothers in 17th century Amsterdam

https://worksinprogress.co/issue/how-amsterdam-invented-the-fire-department/

52•zdw•4h ago•12 comments

Jamesob's guide to running SOTA LLMs locally

https://github.com/jamesob/local-llm

295•livestyle•12h ago•128 comments

Soatok's Informal Guide to Threat Models

https://soatok.blog/2026/06/30/soatoks-informal-guide-to-threat-models/

38•zdw•2h ago•2 comments

Show HN: A statically typed, cross-platform, easily bootstrappable build system

https://github.com/rochus-keller/BUSY/

15•Rochus•3d ago•3 comments

Applied Category Theory Course (2018)

https://math.ucr.edu/home/baez/act_course/index.html

67•measurablefunc•6h ago•7 comments

New serious vulnerabilities spiked around release of Claude Mythos Preview

https://epoch.ai/data-insights/cve-severity-spike

59•cubefox•5h ago•12 comments

Espionage Against the European Parliament

https://citizenlab.ca/research/member-of-committee-investigating-spyware-hacked-with-pegasus/

290•ledoge•6h ago•69 comments

Costco is the anti-Amazon

https://phenomenalworld.org/analysis/the-anti-amazon/

331•bookofjoe•11h ago•304 comments

Dispersion loss counteracts embedding condensation in small language models

https://chenliu-1996.github.io/projects/LM-Dispersion/

26•E-Reverance•4h ago•5 comments

Scientists discover guidance system for migratory songbirds

https://news.exeter.ac.uk/faculty-of-environment-science-and-economy/scientists-discover-guidance...

17•bit_economist•3h ago•4 comments

Infracost (YC W21) Is Hiring a Marketing Lead to Shift FinOps Left

https://www.ycombinator.com/companies/infracost/jobs/YTJcFwr-marketing-lead

1•akh•6h ago

Factories are just rooms

https://interconnected.org/home/2026/07/03/factories

203•arbesman•11h ago•80 comments

International chess federation sanctions Kramnik

https://www.fide.com/fide-ethics-disciplinary-commission-issues-a-decision-in-case-involving-gm-v...

135•DarkContinent•10h ago•69 comments

We put a Redis server inside our runtime

https://encore.dev/blog/redis-runtime

27•eandre•2d ago•9 comments

Software, from First Principles

https://fazamhd.com/mental-models/software/

49•faza•5h ago•10 comments

Hunting a 16-year-old SQLite WAL bug with TLA+

https://ubuntu.com/blog/hunting-a-16-year-old-sqlite-bug-with-tla-is-dqlite-affected

177•peterparker204•3d ago•16 comments

Africans Are Turning to Starlink

https://www.economist.com/middle-east-and-africa/2026/07/02/africans-are-turning-to-starlink

121•bookofjoe•6h ago•127 comments

FreeBSD ate my RAM

https://crocidb.com/post/freebsd-ate-my-ram/

92•theanonymousone•8h ago•40 comments

Wordgard: In-browser rich-text editor from the creator of ProseMirror

https://wordgard.net/

273•indy•18h ago•90 comments

PostgreSQL and the OOM killer: Why we use strict memory overcommit

https://www.ubicloud.com/blog/postgresql-and-the-oom-killer-why-we-use-strict-memory-overcommit

165•furkansahin•14h ago•88 comments

GitFut – Your GitHub stats turned into a World-Cup-style player card

https://gitfut.com

24•redbell•4h ago•13 comments

Notes from Building Tinkerfont

https://mighil.com/notes-from-building-tinkerfont

11•surprisetalk•2d ago•0 comments

A peek into Reddit's anti-spam internals

https://lyra.horse/blog/2026/06/reddit-spam-internals/

163•OuterVale•6d ago•59 comments