frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
1•dhruv3006•1m ago•0 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
1•mariuz•1m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
1•RyanMu•5m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
1•ravenical•8m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
1•rcarmo•9m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
1•gmays•10m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
1•andsoitis•10m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
1•lysace•11m ago•0 comments

Zen Tools

http://postmake.io/zen-list
1•Malfunction92•13m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
1•carnevalem•13m ago•0 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•15m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
1•rcarmo•16m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•17m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•17m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
2•Brajeshwar•17m ago•0 comments

Extreme Inequality Presages the Revolt Against It

https://www.noemamag.com/extreme-inequality-presages-the-revolt-against-it/
2•Brajeshwar•17m ago•0 comments

There's no such thing as "tech" (Ten years later)

1•dtjb•18m ago•0 comments

What Really Killed Flash Player: A Six-Year Campaign of Deliberate Platform Work

https://medium.com/@aglaforge/what-really-killed-flash-player-a-six-year-campaign-of-deliberate-p...
1•jbegley•19m ago•0 comments

Ask HN: Anyone orchestrating multiple AI coding agents in parallel?

1•buildingwdavid•20m ago•0 comments

Show HN: Knowledge-Bank

https://github.com/gabrywu-public/knowledge-bank
1•gabrywu•26m ago•0 comments

Show HN: The Codeverse Hub Linux

https://github.com/TheCodeVerseHub/CodeVerseLinuxDistro
3•sinisterMage•27m ago•2 comments

Take a trip to Japan's Dododo Land, the most irritating place on Earth

https://soranews24.com/2026/02/07/take-a-trip-to-japans-dododo-land-the-most-irritating-place-on-...
2•zdw•27m ago•0 comments

British drivers over 70 to face eye tests every three years

https://www.bbc.com/news/articles/c205nxy0p31o
38•bookofjoe•27m ago•13 comments

BookTalk: A Reading Companion That Captures Your Voice

https://github.com/bramses/BookTalk
1•_bramses•28m ago•0 comments

Is AI "good" yet? – tracking HN's sentiment on AI coding

https://www.is-ai-good-yet.com/#home
3•ilyaizen•29m ago•1 comments

Show HN: Amdb – Tree-sitter based memory for AI agents (Rust)

https://github.com/BETAER-08/amdb
1•try_betaer•30m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
2•anhxuan•30m ago•0 comments

Show HN: Seedance 2.0 Release

https://seedancy2.com/
2•funnycoding•30m ago•0 comments

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
1•thelok•30m ago•0 comments

Towards Self-Driving Codebases

https://cursor.com/blog/self-driving-codebases
1•edwinarbus•31m ago•0 comments
Open in hackernews

DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking

https://www.qodo.ai/blog/deepcodebench-real-world-codebase-understanding-by-qa-benchmarking/
84•blazercohen•4mo ago

Comments

four_fifths•4mo ago
If you do a bit of digging into most of the popular benchmarks that all the big labs report on, you'll see pretty quickly that they have almost zero correlation with any real world tasks.

The approach that they're taking here of working backwards from a OS repo pull request and reverse engineering a question is unusually well thought out for a benchmark.

I haven't dug into more of the dataset questions yet, but the example they give in the blog post for the question generated for Hugging Face Transformer's repo gives me hope that this could actually be a solid benchmark:

> How do the fast image and video processor base classes prevent shared mutable state when instantiating multiple instances?

qsort•4mo ago
I particularly like their usage of LLM-as-a-judge. They don't go "hey chatgpt, sort these from best to worst based on vibes", rather they extract a set of ground truths and check how the answer compares, a task that SOTA LLM can do kind of reliably. It's a very smart way to circumvent the problems introduced by pure LLM-as-a-judge methods.
Tiberium•4mo ago
Seems like an interesting benchmark, but my takeaway from the results is that Codex is almost as good enough as their custom solution (no mention of the underlying model) and only requires a $20 ChatGPT subscription to start using it (of course with limits), without having to shell out $$$ for an enterprise Qodo plan to use Qodo Aware - https://www.qodo.ai/products/qodo-aware/. The "free" plan in Qodo Aware only lets users work with 100 hand-picked open-source repositories.

It also would be nice if the article clearly mentioned what specific model settings were used for Claude Code and Codex. Both of those allow changing the reasoning level, so if the benchmark was done using the default settings, it seems a little unfair - they have a result of their own agent at high reasoning as a separate entry.

esafak•4mo ago
This is in relation to their newly-announced "context agent": https://www.qodo.ai/blog/introducing-qodo-aware-deep-codebas...
asdev•4mo ago
Agentic search is good enough for code search and code understanding, indexing/fancy techniques will only slight outperform for a lot more effort