frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking

https://www.qodo.ai/blog/deepcodebench-real-world-codebase-understanding-by-qa-benchmarking/
84•blazercohen•4mo ago

Comments

four_fifths•4mo ago
If you do a bit of digging into most of the popular benchmarks that all the big labs report on, you'll see pretty quickly that they have almost zero correlation with any real world tasks.

The approach that they're taking here of working backwards from a OS repo pull request and reverse engineering a question is unusually well thought out for a benchmark.

I haven't dug into more of the dataset questions yet, but the example they give in the blog post for the question generated for Hugging Face Transformer's repo gives me hope that this could actually be a solid benchmark:

> How do the fast image and video processor base classes prevent shared mutable state when instantiating multiple instances?

qsort•4mo ago
I particularly like their usage of LLM-as-a-judge. They don't go "hey chatgpt, sort these from best to worst based on vibes", rather they extract a set of ground truths and check how the answer compares, a task that SOTA LLM can do kind of reliably. It's a very smart way to circumvent the problems introduced by pure LLM-as-a-judge methods.
Tiberium•4mo ago
Seems like an interesting benchmark, but my takeaway from the results is that Codex is almost as good enough as their custom solution (no mention of the underlying model) and only requires a $20 ChatGPT subscription to start using it (of course with limits), without having to shell out $$$ for an enterprise Qodo plan to use Qodo Aware - https://www.qodo.ai/products/qodo-aware/. The "free" plan in Qodo Aware only lets users work with 100 hand-picked open-source repositories.

It also would be nice if the article clearly mentioned what specific model settings were used for Claude Code and Codex. Both of those allow changing the reasoning level, so if the benchmark was done using the default settings, it seems a little unfair - they have a result of their own agent at high reasoning as a separate entry.

esafak•4mo ago
This is in relation to their newly-announced "context agent": https://www.qodo.ai/blog/introducing-qodo-aware-deep-codebas...
asdev•4mo ago
Agentic search is good enough for code search and code understanding, indexing/fancy techniques will only slight outperform for a lot more effort

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
451•klaussilveira•6h ago•109 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
791•xnx•12h ago•481 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
152•isitcontent•6h ago•15 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
145•dmpetrov•7h ago•63 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
19•matheusalmeida•1d ago•0 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
46•quibono•4d ago•4 comments

A century of hair samples proves leaded gas ban worked

https://arstechnica.com/science/2026/02/a-century-of-hair-samples-proves-leaded-gas-ban-worked/
84•jnord•3d ago•8 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
257•vecti•8h ago•120 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
192•eljojo•9h ago•127 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
321•aktau•13h ago•155 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
317•ostacke•12h ago•85 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
403•todsacerdoti•14h ago•218 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
328•lstoll•13h ago•237 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
19•kmm•4d ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
50•phreda4•6h ago•8 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
110•vmatsiiako•11h ago•34 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
189•i5heu•9h ago•132 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
149•limoce•3d ago•79 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
7•DesoPK•1h ago•3 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
240•surprisetalk•3d ago•31 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
985•cdrnsf•16h ago•417 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
21•gfortaine•4h ago•2 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
43•rescrv•14h ago•17 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
58•ray__•3h ago•14 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
36•lebovic•1d ago•11 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
5•gmays•1h ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
77•antves•1d ago•57 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
40•nwparker•1d ago•10 comments

The Oklahoma Architect Who Turned Kitsch into Art

https://www.bloomberg.com/news/features/2026-01-31/oklahoma-architect-bruce-goff-s-wild-home-desi...
20•MarlonPro•3d ago•4 comments

How virtual textures work

https://www.shlom.dev/articles/how-virtual-textures-really-work/
28•betamark•13h ago•23 comments