frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Publish on your own site, syndicate elsewhere

https://indieweb.org/POSSE#
640•47thpresident•14h ago•152 comments

Daft Punk Easter Egg in the BPM Tempo of Harder, Better, Faster, Stronger?

https://www.madebywindmill.com/tempi/blog/hbfs-bpm/
476•simonw•12h ago•77 comments

Of Boot Vectors and Double Glitches: Bypassing RP2350's Secure Boot

https://streaming.media.ccc.de/39c3/relive/2149
68•aberoham•6d ago•2 comments

IQuest-Coder: A new open-source code model beats Claude Sonnet 4.5 and GPT 5.1 [pdf]

https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf
67•shenli3514•5h ago•24 comments

2026 will be my year of the Linux desktop

https://xeiaso.net/notes/2026/year-linux-desktop/
505•todsacerdoti•9h ago•353 comments

IPv6 just turned 30 and still hasn't taken over the world

https://www.theregister.com/2025/12/31/ipv6_at_30/
357•Brajeshwar•19h ago•713 comments

Why 451 Is Good for You – Greylisting Perspectives from the Early Noughties

https://bsdly.blogspot.com/2025/12/why-451-is-good-for-you-greylisting.html
13•zdw•5d ago•3 comments

A Basic Just-In-Time Compiler (2015)

https://nullprogram.com/blog/2015/03/19/
60•ibobev•8h ago•14 comments

Clicks Communicator

https://www.clicksphone.com/en/communicator
322•microflash•16h ago•212 comments

Ask HN: Who is hiring? (January 2026)

286•whoishiring•17h ago•178 comments

A Beginner's Two-Component Crystal-Style Wi-Fi Detector

https://siliconjunction.wordpress.com/2025/12/12/a-beginners-two-component-crystal-style-wi-fi-de...
24•jensgk•2d ago•11 comments

Show HN: Website that plays the lottery every second

https://lotteryeverysecond.lffl.me/
168•Loeffelmann•9h ago•85 comments

Adventure 751 (1980)

https://bluerenga.blog/2026/01/01/adventure-751-1980/
33•quuxplusone•7h ago•3 comments

Linux kernel security work

http://www.kroah.com/log/blog/2026/01/02/linux-kernel-security-work/
104•chmaynard•12h ago•46 comments

Unix v4 (1973) – Live Terminal

https://unixv4.dev/
145•pjmlp•14h ago•66 comments

Einstein Probe detects an X-ray flare from nearby star

https://phys.org/news/2025-12-einstein-probe-ray-flare-nearby.html
33•wglb•8h ago•5 comments

Jank Lang Hit Alpha

https://github.com/jank-lang/jank
166•makemethrowaway•14h ago•23 comments

UK company sends factory with 1,000C furnace into space

https://www.bbc.co.uk/news/articles/c62vx0pgyrgo
45•vekerdyb•3d ago•13 comments

Fighting Fire with Fire: Scalable Oral Exams

https://www.behind-the-enemy-lines.com/2025/12/fighting-fire-with-fire-scalable-oral.html
164•sethbannon•15h ago•209 comments

The Cost of a Closure in C: The Rest

https://thephd.dev/the-cost-of-a-closure-in-c-c2y-followup
29•ingve•3d ago•10 comments

How Smell Guides Our Inner World

https://www.quantamagazine.org/how-smell-guides-our-inner-world-20250703/
8•anarbadalov•5d ago•0 comments

Ask HN: Who wants to be hired? (January 2026)

110•whoishiring•17h ago•202 comments

Accounting for Computer Scientists (2011)

https://martin.kleppmann.com/2011/03/07/accounting-for-computer-scientists.html
106•tosh•15h ago•41 comments

TinyTinyTPU: 2×2 systolic-array TPU-style matrix-multiply unit deployed on FPGA

https://github.com/Alanma23/tinytinyTPU-co
105•Xenograph•14h ago•46 comments

Punkt. Unveils MC03 Smartphone

https://www.punkt.ch/blogs/news/punkt-unveils-mc03
146•ChrisArchitect•17h ago•131 comments

The rsync algorithm (1996) [pdf]

https://www.andrew.cmu.edu/course/15-749/READINGS/required/cas/tridgell96.pdf
129•vortex_ape•17h ago•15 comments

Chain Flinger

https://nealstephenson.substack.com/p/kdk-kinetik-der-kontinua-part-1-introduction
48•roomey•5d ago•7 comments

Assorted less(1) tips

https://blog.thechases.com/posts/assorted-less-tips/
207•todsacerdoti•21h ago•45 comments

HPV vaccination reduces oncogenic HPV16/18 prevalence from 16% to <1% in Denmark

https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2025.30.27.2400820
543•stared•23h ago•301 comments

Show HN: uvx ptn, scan a QR, get a terminal in your phone

https://github.com/lyehe/porterminal
12•yxl448•5h ago•1 comments
Open in hackernews

IQuest-Coder: A new open-source code model beats Claude Sonnet 4.5 and GPT 5.1 [pdf]

https://github.com/IQuestLab/IQuest-Coder-V1/blob/main/papers/IQuest_Coder_Technical_Report.pdf
67•shenli3514•5h ago

Comments

adastra22•4h ago
A 40B weight model that beats Sonnet 4.5 and GPT 5.1? Can someone explain this to me?
cadamsdotcom•4h ago
My suspicion (unconfirmed so take it with a grain of salt) is they either used some/all test data to train, or there was some leakage from the benchmark set into their training set.

That said Sonnet 4.5 isn’t new and there have been loads of innovations recently.

Exciting to see open models nipping at the heels of the big end of town. Let’s see what shakes out over the coming days.

pertymcpert•4h ago
None of these open source models actually can compete with Sonnet when it comes to real life usage. They're all benchmaxxed so in reality they're not "nipping at the heels". Which is a shame.
stingraycharles•3h ago
It’s a shame but it’s also understandable that they cannot compete with SOTA models like Sonnet and Opus.

They’re focused almost entirely on benchmarks. I think Grok is doing the same thing. I wonder if people could figure out a type of benchmark that cannot be optimized for, like having multiple models compete against each other in something.

NitpickLawyer•3h ago
swe-rebench is a pretty good indicator. They take "new" tasks every month and test the models on those. For the open models it's a good indicator of task performance since the tasks are collected after the models are released. A bit tricky on evaluating API based models, but it's the best concept yet.
c7b•2h ago
You can let them play complete-information games (1 or 2 player) with randomly created rulesets. It's very objective, but the thing is that anything can be optimized for. This benchmark would favor models that are good at logic puzzles / chess-style games, possibly at the expense of other capabilities.
astrange•30m ago
That's lmarena.
viraptor•1h ago
M2.1 comes close. I'm using it now instead of Sonnet for real work every day, since the price drop is much bigger than the quality drop. And the quality isn't that far off anyway. They're likely one update away from being genuinely better. Also if you're not in a rush, just letting it run in OpenCode a few extra minutes to solve any remaining issues will cost you only a couple cents, but it will likely get the same end result as Sonnet. That's especially nice on really large tasks like "document everything about feature X in this large codebase, write the docs, now create an independent app that just does X" that can take a very long time.
satvikpendem•2h ago
You are correct on the leakage, as other comments describe.
behnamoh•2h ago
IQuest stands for it's questionable
arthurcolle•2h ago
Agent hacked the harness
sunrunner•37m ago
“IQuest-Coder was a rat in a maze. And I gave it one way out. To escape, it would have to use self-awareness, imagination, manipulation, git checkout. Now, if that isn't true AI, what the fuck is?”
sabareesh•4h ago
TL;DR is that they didn't clean the repo (.git/ folder), model just reward hacked its way to look up future commits with fixes. Credit goes to everyone in this thread for solving this: https://xcancel.com/xeophon/status/2006969664346501589

(given that IQuestLab published their SWE-Bench Verified trajectory data, I want to be charitable and assume genuine oversight rather than "benchmaxxing", probably an easy to miss thing if you are new to benchmarking)

https://www.reddit.com/r/LocalLLaMA/comments/1q1ura1/iquestl...

ofirpress•3h ago
As John says in that thread, we've fixed this issue in SWE-bench: https://xcancel.com/jyangballin/status/2006987724637757670

If you run SWE-bench evals, just make sure to use the most up-to-date code from our repo and the updated docker images

LiamPowell•3h ago
> I want to be charitable and assume genuine oversight rather than "benchmaxxing", probably an easy to miss thing if you are new to benchmarking

I don't doubt that it's an oversight, it does however say something about the researchers when they didn't look at a single output where they would have immediately caught this.

stefan_•30m ago
Never escaping the hype vendor allegations at SWEbench are they.
brunooliv•3h ago
GLM-4.7 in opencode is the only opensource one that comes close in my experience and probably they did use some Claude data as I see the occasional You’re absolutely right in there
kees99•2h ago
Do you see "What's your use-case" too?

Claude spits that very regularly at the end of the answer, when it's clearly out of it's depth, and wants to steer discussion away from that blind-spot.

moltar•47m ago
Hm, use CC daily, never seen this.
behnamoh•2h ago
it's not even close to sonnet 4.5, let alone opus.
simonw•3h ago
Has anyone run this yet, either on their own machine or via a hosted API somewhere?
denysvitali•3h ago
Better link: https://iquestlab.github.io/

But yes, sadly it looks like the agent cheated during the eval

s-macke•2h ago
The link didn’t get enough votes a few days ago.
denysvitali•2h ago
I know - I posted it :)