frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

FrontierCode

https://cognition.ai/blog/frontier-code
26•streamer45•1h ago

Comments

swyx•27m ago
:wave: i was on the team! AMA.

some headlines

- 3000 rubrics on code quality. First benchmark to measure: "would this code get actually merged?"

- 20+ expert open-source maintainer created tasks on their own repos to capture their opinion & taste.

- 40+ hours of real human work per task. total 1000+ hours of real life software maintainer work captured in dataset

- results in 81% lower false positive rate than SWE-Bench Pro

- High quality bar: many QA stages & each task manually reviewed by Cognition researchers (examples in post)

Opus 4.8 scores 13% on FrontierCode Diamond.

one of my goals was also to datamine interesting stuff even on the easy tasks. for example, if you squint you can see the answer to "WTF Happened in late 2025" with coding models: https://x.com/swyx/status/2064081945567580323

great_psy•24m ago
How do you measure quality at scale ? Is there another model that determines if it adheres to codebase standard ?
swyx•15m ago
see Beyond Unit Tests and Novel Grading Methods in TFA.

i think something like ~60% llm as judge rubrics and the rest as described. every rubric validated by maintainer. 3000 rubrics

singpolyma3•19m ago
Since no one knows or can agree on what "code quality" is and we can't measure it for human output, I'm dubious about measuring it for LLMs
fHr•15m ago
babe wake up another eval dropped

Give your friends a chance to abandon you

https://www.bitsofwonder.co/p/give-your-friends-a-chance-to-abandon
1•eatitraw•1m ago•0 comments

Freiburg-based startup, is entering the market with caloric cooling technology

https://www.ipm.fraunhofer.de/en/press-publications/press-releases/Qurie-GmbH-founding-electrocal...
1•doener•1m ago•0 comments

My Rust SIMD code was silently running as scalar Part 2

https://coloneltoad.substack.com/p/why-my-windows-benchmarks-were-lying
1•tolugenius•3m ago•0 comments

LazyWM – Tiling window manager for Windows 11 / 10

https://kojcinemir.github.io/lazywm/index.html
1•pozitronij•7m ago•0 comments

Core AI Models

https://github.com/apple/coreai-models
1•gok•10m ago•0 comments

YouTube thumbnails, titles, tags, and editable headline text in one click

https://loop-tube.com/
1•yashness•11m ago•0 comments

Animations in my Google Slides game are SUPER tedious and long

https://old.reddit.com/r/IndieDev/comments/1u0jcg8/animations_in_my_google_slides_game_are_super/
1•greazy•14m ago•0 comments

Project Brain– Persistent Second Brain for Claude Code and Windsurf, Cursor etc.

https://raw.githubusercontent.com/OoneBreath/claude-code-project-brain/main/docs/demo.gif
1•Slav_fixflex•14m ago•0 comments

Attacking Ruby on Rails Applications (2016)

https://phrack.org/issues/69/12#article
2•downbad_•15m ago•0 comments

Discover container machines – WWDC26 – Videos Developer

https://developer.apple.com/videos/play/wwdc2026/389/
2•tambourine_man•15m ago•0 comments

Brit fraudsters using AI to doctor 'evidence' in motor insurance claims

https://www.theregister.com/ai-and-ml/2026/06/08/motor-insurance-frausters-abusing-ai-to-exaggera...
1•Bender•21m ago•0 comments

The Nerdy Escorts Cashing in on Silicon Valley's AI Boom

https://www.forbes.com/sites/annatong/2026/06/07/the-nerdy-escorts-cashing-in-on-silicon-valleys-...
2•doener•22m ago•2 comments

Signal, DuckDuckGo among firms weighing Canada exit over lawful access bill

https://globalnews.ca/news/11886905/lawful-access-bill-c-22-companies-services-canada/
1•Cider9986•23m ago•0 comments

App Trust Preview – Quick Look Safety Reports for macOS Apps

https://apptrustpreview.com/
1•IGHOR•23m ago•0 comments

Python JIT compiler project under threat after steering council

https://www.theregister.com/devops/2026/06/08/python-jit-compiler-may-be-removed/5252079
1•Bender•23m ago•1 comments

WWDC26: Discover container machines – Apple [video]

https://www.youtube.com/watch?v=Q2xD6zkDz-s
2•tambourine_man•25m ago•0 comments

Linux EFS File-System May Have New Maintainer, Or It Might Just Get Removed

https://www.phoronix.com/news/Linux-EFS-File-System-2026
1•Bender•26m ago•1 comments

OpenAI Kicks Off IPO Process in Test of Investor Appetite for Top AI Labs

https://www.wsj.com/tech/ai/openai-kicks-off-ipo-process-in-test-of-investor-appetite-for-top-ai-...
3•toephu2•27m ago•0 comments

Apple Watch for Your Kids

https://www.apple.com/apple-watch-for-your-kids/
2•antfarm•27m ago•0 comments

Pythagora-Io/GPT-Pilot Compromised Credential Stealer Blocked by Python Linter

https://github.com/Pythagora-io/gpt-pilot/issues/1182
1•kurmiashish•31m ago•1 comments

MoE expert co-activations: Reordering inputs yields easy throughput gains

https://blog.doubleword.ai/moe-expert-coactivations
1•kkm•33m ago•0 comments

AI and the Redmonk Language Rankings

https://briandouglas.ie/redmonk-language-rankings-2026/
1•coneonthefloor•33m ago•0 comments

SwiftUI Only Makes It Easy to Develop Bad Apps

https://daringfireball.net/2026/06/swiftui_only_makes_it_easy_to_develop_bad_apps
1•robenkleene•33m ago•0 comments

How PICO-8 unlocked Frédéric Souchu's dreams

https://nanark.medium.com/a-pico-8-story-how-the-fantasy-console-unlocked-fr%C3%A9d%C3%A9ric-souc...
1•atan2•38m ago•0 comments

We brought Arc-style profile switching to Dia

https://diarc.app
1•0x6A75616E•38m ago•1 comments

OpenAI Submits S-1 Draft to SEC

https://openai.com/index/openai-submits-confidential-s-1/
40•hackerBanana•38m ago•11 comments

The sample efficiency black hole

https://www.dwarkesh.com/p/the-sample-efficiency-black-hole
1•crescit_eundo•40m ago•0 comments

Mathematics Is Out There

https://aeon.co/essays/for-sergiu-klainerman-maths-is-a-fact-to-be-divined
3•cainxinth•40m ago•0 comments

OpenAI Files S-1

https://twitter.com/openainewsroom/status/2064094175541461220
6•davidbarker•44m ago•0 comments

OpenAI Confidentially Files for IPO

https://www.cnbc.com/2026/06/08/openai-confidentially-files-for-ipo-prepping-wall-street-for-ai-d...
22•rvz•45m ago•0 comments