frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

DeepCodeBench: Real-World Codebase Understanding by Q&A Benchmarking

https://www.qodo.ai/blog/deepcodebench-real-world-codebase-understanding-by-qa-benchmarking/
84•blazercohen•5mo ago

Comments

four_fifths•4mo ago
If you do a bit of digging into most of the popular benchmarks that all the big labs report on, you'll see pretty quickly that they have almost zero correlation with any real world tasks.

The approach that they're taking here of working backwards from a OS repo pull request and reverse engineering a question is unusually well thought out for a benchmark.

I haven't dug into more of the dataset questions yet, but the example they give in the blog post for the question generated for Hugging Face Transformer's repo gives me hope that this could actually be a solid benchmark:

> How do the fast image and video processor base classes prevent shared mutable state when instantiating multiple instances?

qsort•4mo ago
I particularly like their usage of LLM-as-a-judge. They don't go "hey chatgpt, sort these from best to worst based on vibes", rather they extract a set of ground truths and check how the answer compares, a task that SOTA LLM can do kind of reliably. It's a very smart way to circumvent the problems introduced by pure LLM-as-a-judge methods.
Tiberium•4mo ago
Seems like an interesting benchmark, but my takeaway from the results is that Codex is almost as good enough as their custom solution (no mention of the underlying model) and only requires a $20 ChatGPT subscription to start using it (of course with limits), without having to shell out $$$ for an enterprise Qodo plan to use Qodo Aware - https://www.qodo.ai/products/qodo-aware/. The "free" plan in Qodo Aware only lets users work with 100 hand-picked open-source repositories.

It also would be nice if the article clearly mentioned what specific model settings were used for Claude Code and Codex. Both of those allow changing the reasoning level, so if the benchmark was done using the default settings, it seems a little unfair - they have a result of their own agent at high reasoning as a separate entry.

esafak•4mo ago
This is in relation to their newly-announced "context agent": https://www.qodo.ai/blog/introducing-qodo-aware-deep-codebas...
asdev•4mo ago
Agentic search is good enough for code search and code understanding, indexing/fancy techniques will only slight outperform for a lot more effort

Show HN: HalalCodeCheck – Verify food ingredients offline

https://halalcodecheck.com/
1•pythonbase•2m ago•0 comments

Student makes cosmic dust in a lab, shining a light on the origin of life

https://www.cnn.com/2026/02/06/science/cosmic-dust-discovery-life-beginnings
1•Brajeshwar•4m ago•0 comments

In the Australian outback, we're listening for nuclear tests

https://www.abc.net.au/news/2026-02-08/australian-outback-nuclear-tests-listening-warramunga-faci...
1•defrost•4m ago•0 comments

'Hermès orange' iPhone sparks Apple comeback in China

https://www.ft.com/content/e2d78d04-7368-4b0c-abd5-591c03774c46
1•Brajeshwar•5m ago•0 comments

Show HN: Goxe 19k Logs/S on an I5

https://github.com/DumbNoxx/goxe
1•nxus_dev•6m ago•1 comments

The async builder pattern in Rust

https://blog.yoshuawuyts.com/async-finalizers/
1•fanf2•7m ago•0 comments

(Golang) Self referential functions and the design of options

https://commandcenter.blogspot.com/2014/01/self-referential-functions-and-design.html
1•hambes•8m ago•0 comments

Show HN: Model Training Memory Simulator

https://czheo.github.io/2026/02/08/model-training-memory-simulator/
1•czheo•10m ago•0 comments

Claude Code Controller

https://github.com/The-Vibe-Company/claude-code-controller
1•shidhincr•14m ago•0 comments

Software design is now cheap

https://dottedmag.net/blog/cheap-design/
1•dottedmag•14m ago•0 comments

Show HN: Are You Random? – A game that predicts your "random" choices

https://github.com/OvidijusParsiunas/are-you-random
1•ovisource•19m ago•0 comments

Poland to probe possible links between Epstein and Russia

https://www.reuters.com/world/poland-probe-possible-links-between-epstein-russia-pm-tusk-says-202...
1•doener•27m ago•0 comments

Effectiveness of AI detection tools in identifying AI-generated articles

https://www.ijoms.com/article/S0901-5027(26)00025-1/fulltext
2•XzetaU8•33m ago•0 comments

Warsaw Circle

https://wildtopology.com/bestiary/warsaw-circle/
1•hackandthink•34m ago•0 comments

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
1•pacod•39m ago•0 comments

The AI4Agile Practitioners Report 2026

https://age-of-product.com/ai4agile-practitioners-report-2026/
1•swolpers•40m ago•0 comments

Digital Independence Day

https://di.day/
1•pabs3•44m ago•0 comments

What a bot hacking attempt looks like: SQL injections galore

https://old.reddit.com/r/vibecoding/comments/1qz3a7y/what_a_bot_hacking_attempt_looks_like_i_set_up/
1•cryptoz•45m ago•0 comments

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

https://flashmesh.netlify.app
1•Elevanix•46m ago•0 comments

Show HN: AgentLens – Open-source observability and audit trail for AI agents

https://github.com/amitpaz1/agentlens
1•amit_paz•47m ago•0 comments

Show HN: ShipClaw – Deploy OpenClaw to the Cloud in One Click

https://shipclaw.app
1•sunpy•49m ago•0 comments

Unlock the Power of Real-Time Google Trends Visit: Www.daily-Trending.org

https://daily-trending.org
1•azamsayeedit•51m ago•1 comments

Explanation of British Class System

https://www.youtube.com/watch?v=Ob1zWfnXI70
1•lifeisstillgood•52m ago•0 comments

Show HN: Jwtpeek – minimal, user-friendly JWT inspector in Go

https://github.com/alesr/jwtpeek
1•alesrdev•55m ago•0 comments

Willow – Protocols for an uncertain future [video]

https://fosdem.org/2026/schedule/event/CVGZAV-willow/
1•todsacerdoti•56m ago•0 comments

Feedback on a client-side, privacy-first PDF editor I built

https://pdffreeeditor.com/
1•Maaz-Sohail•1h ago•0 comments

Clay Christensen's Milkshake Marketing (2011)

https://www.library.hbs.edu/working-knowledge/clay-christensens-milkshake-marketing
2•vismit2000•1h ago•0 comments

Show HN: WeaveMind – AI Workflows with human-in-the-loop

https://weavemind.ai
9•quentin101010•1h ago•2 comments

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

https://seedream5ai.org
1•dallen97•1h ago•0 comments

A contributor trust management system based on explicit vouches

https://github.com/mitchellh/vouch
2•admp•1h ago•1 comments