frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

ProgramBench: Can Language Models Rebuild Programs from Scratch?

https://arxiv.org/abs/2605.03546
19•jonbaer•3h ago

Comments

vatsachak•1h ago
In before "but they did not use my agent swarm"
makerofthings•1h ago
It’s the annoying thing about AI. If it works, the AI is magic. If it doesn’t work, you’re using it wrong.
NitpickLawyer•47s ago
So, would you change your view if someone else runs this bench w/ a different harness and gets better results?
red75prime•8m ago
In science N=1 is statistically insignificant. In business it might mean that you have a product.
keyle•1h ago
How long until AI is not even writing code but producing machine code?

Think about it, all these compilers, tooling, what a waste!

I imagine a future where chipset makers will provide a model you can just prompt to "act upon that chipset" and voila, "You're absolutely right! Here is your binary."

We won't be developers, we won't be devops, we'll be rollmops! /s

_pdp_•1h ago
Coding agents can write ASM. But if you mean writing the actual byte-code that will require a very different approach at a very different level of abstraction that LLMs are not designed to do. Keep in mind that all LLMs are trained first on text and then fine-tuned on code.
keyle•52m ago
Good point! Long live ASM! Wasm everything!!1 /jk
quinnjh•1h ago
My hunch is that it would take years of hundreds of thousands of developers working with machine code, posting stackoverflow questions with machine code, and publishing github repos written on it with documentation. Thats all the free labor LLMs leveraged to use high level langs.

>We won't be developers, we won't be devops, we'll be modelops! /s

I can still see this happening with higher level langs. the thing is the compiler is not replaced in the training data, more likely LLMs will give rise to semideterministic layers on the compilers

I could see nvidia achieving this first with how nice the devex is with CUDA

osti•41m ago
I heard they are already proficient at assembly languages.
_pdp_•1h ago
I am not surprised but this one sticks out...

> Models favor monolithic, single-file implementations that diverge sharply from human-written code.

Well, all of our code is monolithic with some files close 20K lines of code and we do use coding agents - not for the original code but as of late. I've always had that hunch that splitting everything into tiny files does not improve AI coding agent performance although it feels counterintuitive due to model context constraints.

To me the important parts of a program should be clustered together so the implementation is obvious. Scattering the implementation in various files all over the source tree does not help much building the mental model.

That also closely match how software used to be written in the past too.

BurningPenguin•40m ago
Kinda surprising to me, since i had some trouble with Cursor & Co. once the file went over ~800 lines. It repeatedly failed to edit it, until i split it up into multiple logical components. As it should have been from the beginning...

Though, it was some time ago, so things might have improved?

_pdp_•17m ago
VSCode basically any model can edit the 20K file without any issues. The coding harness does not read the entire file at once though. It reads chunks of it so the size does not really matter. What matters is how close are the things the agent needs to make the edit.
Garlef•12m ago
> Scattering the implementation in various files all over the source tree

If you treat the source tree seriously, you can communicate a lot with how it is structured

luca-ctx•48m ago
RE: monolithic, single-file implementations

We have a lint that caps source code files at 650 LOC and it works really well.

miguel_martin•31m ago
It’s unfortunate that they didn’t eval using subagents/orchestration for such a complex set of tasks (from what I can tell), e.g. analyze program to produce initial spec -> code -> review and rinse&repeat with each of those steps being a separate subagent allocated

I would be interested to see if there’s a significant quantifiable difference.

NitpickLawyer•1m ago
This might actually be the whole value prop of this benchmark. Forget their initial scores, take open models (so we can be sure the base doesn't change), and test different combinations of harness + prompts + strategies + whatever memthing is popular today. See if the scores improve. Repeat.

Valve releases Steam Controller CAD files under Creative Commons license

https://www.digitalfoundry.net/news/2026/05/valve-releases-steam-controller-cad-files-under-creat...
1307•haunter•15h ago•400 comments

Appearing productive in the workplace

https://nooneshappy.com/article/appearing-productive-in-the-workplace/
996•diebillionaires•14h ago•376 comments

Permacomputing Principles

https://permacomputing.net/principles/
97•andsoitis•4h ago•29 comments

Diskless Linux boot using ZFS, iSCSI and PXE

https://aniket.foo/posts/20260505-netboot/
60•stereo-highway•3h ago•19 comments

SQLite Is a Library of Congress Recommended Storage Format

https://sqlite.org/locrsf.html
166•whatisabcdefgh•9h ago•34 comments

Vibe coding and agentic engineering are getting closer than I'd like

https://simonwillison.net/2026/May/6/vibe-coding-and-agentic-engineering/
546•e12e•15h ago•583 comments

ProgramBench: Can Language Models Rebuild Programs from Scratch?

https://arxiv.org/abs/2605.03546
19•jonbaer•3h ago•16 comments

Chevrolet Performance eCrate package (400v/200hp)

https://www.chevrolet.com/performance-parts/crate-engines/ecrate
13•mindcrime•2d ago•3 comments

The Mathematical Dance Inside Plant Cells

https://www.quantamagazine.org/the-hidden-mathematical-dance-inside-plant-cells-20260504/
24•isaacfrond•1d ago•2 comments

The Vatican's Website in Latin

https://www.vatican.va/latin/latin_index.html
117•ks2048•5h ago•67 comments

RSS Feeds Send Me More Traffic Than Google

https://shkspr.mobi/blog/2026/05/rss-feeds-send-me-more-traffic-than-google/
52•SpyCoder77•6h ago•11 comments

From Supabase to Clerk to Better Auth

https://blog.val.town/better-auth
243•stevekrouse•13h ago•166 comments

Google Cloud fraud defense, the next evolution of reCAPTCHA

https://cloud.google.com/blog/products/identity-security/introducing-google-cloud-fraud-defense-t...
279•unforgivenpasta•13h ago•265 comments

Pen pal programs endure in a digital age

https://apnews.com/article/pen-pals-letters-comeback-bc87e1b9c229665bafd368e19751d6ca
37•petethomas•1d ago•3 comments

What I Learned Making an App for My Family

https://mendelgreenberg.com/posts/ourcar/
33•chabad360•17h ago•5 comments

Programming Still Sucks

https://www.stvn.sh/writing/programming-still-sucks-fqffhyp
291•jeromechoo•11h ago•120 comments

Show HN: Hallucinopedia

http://halupedia.com/
202•bstrama•14h ago•187 comments

Building the TD4 4-Bit CPU

https://jayakody2000lk.blogspot.com/2026/05/building-td4-4-bit-cpu.html
10•zdw•2h ago•6 comments

Show HN: Tilde.run – Agent sandbox with a transactional, versioned filesystem

https://tilde.run/
153•ozkatz•15h ago•105 comments

Community firmware for the Xteink X4 e-paper reader

https://github.com/crosspoint-reader/crosspoint-reader
84•dmos62•1d ago•21 comments

Finding the differences in a series of power supplies

https://www.lttlabs.com/articles/2026/05/05/testing-psu-series
40•LabsLucas•1d ago•2 comments

Building my own Vi text editor in BASIC

https://leetusman.com/nosebook/yvi
49•zeech•1d ago•22 comments

Learning the Integral of a Diffusion Model

https://sander.ai/2026/05/06/flow-maps.html
128•benanne•12h ago•20 comments

A Theory of Deep Learning

https://elonlit.com/scrivings/a-theory-of-deep-learning/
172•elonlit•1d ago•37 comments

Perturb-MARS: Reading mouse experiments through a human lens

https://www.noetik.blog/p/perturb-mars-reading-mouse-experiments
18•crescit_eundo•2d ago•2 comments

SoundOff: Low-Cost Passive Ultrasound Tags

https://yibo-fu.com/SoundOff-Low-cost-Passive-Ultrasound-Tags-for-Non-invasive-and-Non
60•jonbaer•13h ago•1 comments

Ted Turner has died

https://www.cnn.com/2026/05/06/us/ted-turner-death
263•pseudolus•16h ago•207 comments

Inkscape 1.4.4

https://inkscape.org/doc/release_notes/1.4.4/Inkscape_1.4.4.html
282•s1291•11h ago•84 comments

Show HN: PHP-fts – Full-text search engine in pure PHP, no extensions

https://github.com/olivier-ls/php-fts
64•asmodios•10h ago•15 comments

Wolfgang Koeppen's Structural Musicality

https://www.theparisreview.org/blog/2026/05/04/wolfgang-koeppens-structural-musicality/
6•prismatic•2d ago•0 comments