frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

1-Bit Hokusai's "The Great Wave" (2023)

https://www.hypertalking.com/2023/05/08/1-bit-pixel-art-of-hokusais-the-great-wave-off-kanagawa/
155•stephen-hill•3d ago•24 comments

New 10 GbE USB adapters are cooler, smaller, cheaper

https://www.jeffgeerling.com/blog/2026/new-10-gbe-usb-adapters-cooler-smaller-cheaper/
353•calcifer•8h ago•199 comments

Martin Galway's music source files from 1980's Commodore 64 games

https://github.com/MartinGalway/C64_music
54•ingve•4h ago•6 comments

Google plans to invest up to $40B in Anthropic

https://www.bloomberg.com/news/articles/2026-04-24/google-plans-to-invest-up-to-40-billion-in-ant...
684•elffjs•22h ago•672 comments

Lambda Calculus Benchmark for AI

https://victortaelin.github.io/lambench/
49•marvinborner•3h ago•16 comments

How to Implement an FPS Counter

https://vplesko.com/posts/how_to_implement_an_fps_counter.html
82•vplesko•3d ago•15 comments

A web-based RDP client built with Go WebAssembly and grdp

https://github.com/nakagami/grdpwasm
35•mariuz•3h ago•12 comments

Panipat: The Rise of the Mughals

https://www.historytoday.com/archive/feature/panipat-rise-mughals
22•Thevet•3d ago•12 comments

Plain text has been around for decades and it’s here to stay

https://unsung.aresluna.org/plain-text-has-been-around-for-decades-and-its-here-to-stay/
178•rbanffy•13h ago•78 comments

A 3D Body from Eight Questions – No Photo, No GPU

https://clad.you/blog/posts/questionnaire-mlp/
110•arkadiuss•3d ago•19 comments

Only One Side Will Be the True Successor to MS-DOS – Windows 2.x

https://blisscast.wordpress.com/2026/04/21/windows-2-gui-wonderland-12a/
21•keepamovin•3h ago•16 comments

Humpback whales are forming super-groups

https://www.bbc.com/future/article/20260416-the-humpback-super-groups-swarming-the-seas
160•andsoitis•3d ago•80 comments

Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)

https://github.com/nex-crm/wuphf
145•najmuzzaman•5h ago•67 comments

A Man Who Invented the Future

https://hedgehogreview.com/web-features/thr/posts/the-man-who-invented-the-future
41•apollinaire•3d ago•12 comments

Paraloid B-72

https://en.wikipedia.org/wiki/Paraloid_B-72
238•Ariarule•3d ago•44 comments

Sabotaging projects by overthinking, scope creep, and structural diffing

https://kevinlynagh.com/newsletter/2026_04_overthinking/
476•alcazar•1d ago•115 comments

Replace IBM Quantum back end with /dev/urandom

https://github.com/yuvadm/quantumslop/blob/25ad2e76ae58baa96f6219742459407db9dd17f5/URANDOM_DEMO.md
230•pigeons•13h ago•33 comments

My audio interface has SSH enabled by default

https://hhh.hn/rodecaster-duo-fw/
281•hhh•19h ago•84 comments

The mail sent to a video game publisher

https://www.gamefile.news/p/panic-mail-arco-despelote-time-flies-thank-goodness-teeth
84•colinprince•4d ago•1 comments

Iliad fragment found in Roman-era mummy

https://www.thehistoryblog.com/archives/75877
216•wise_blood•3d ago•69 comments

Commenting and Approving Pull Requests

https://www.jakeworth.com/posts/on-commenting-and-approving-pull-requests/
9•jwworth•2d ago•6 comments

PCR is a surprisingly near-optimal technology

https://nikomc.com/2026/04/22/pcr/
64•mailyk•2d ago•10 comments

Open source memory layer so any AI agent can do what Claude.ai and ChatGPT do

https://alash3al.github.io/stash?_v01
101•alash3al•13h ago•50 comments

There Will Be a Scientific Theory of Deep Learning

https://arxiv.org/abs/2604.21691
297•jamie-simon•20h ago•126 comments

Education must go beyond the mere production of words

https://www.ncregister.com/commentaries/schnell-repairing-the-ruins
97•signor_bosco•14h ago•47 comments

Cosmology with Geometry Nodes

https://www.blender.org/user-stories/cosmology-with-geometry-nodes/
88•shankysingh•13h ago•3 comments

Email could have been X.400 times better

https://buttondown.com/blog/x400-vs-smtp-email
214•maguay•2d ago•175 comments

DeepSeek v4

https://api-docs.deepseek.com/news/news260424
1987•impact_sy•1d ago•1516 comments

Work with the garage door up (2024)

https://notes.andymatuschak.org/Work_with_the_garage_door_up
174•jxmorris12•3d ago•121 comments

Turbo Vision 2.0 – a modern port

https://github.com/magiblot/tvision
171•andsoitis•10h ago•45 comments
Open in hackernews

Lambda Calculus Benchmark for AI

https://victortaelin.github.io/lambench/
49•marvinborner•3h ago

Comments

tromp•3h ago
The corresponding repo https://github.com/VictorTaelin/LamBench describes this as:

    λ-bench
    A benchmark of 120 pure lambda calculus programming problems for AI models.
    → Live results
    What is this?
    λ-bench evaluates how well AI models can implement algorithms using pure lambda calculus. Each problem asks the model to write a program in Lamb, a minimal lambda calculus language, using λ-encodings of data structures to implement a specific algorithm.
    The model receives a problem description, data encoding specification, and test cases. It must return a single .lam program that defines @main. The program is then tested against all input/output pairs — if every test passes, the problem is solved.
"Live results" wrongly links to https://victortaelin.github.io/LamBench/ rather than the correct https://victortaelin.github.io/lambench/

An example task (writing a lambda calculus evaluator) can be seen at https://github.com/VictorTaelin/lambench/blob/main/tsk/algo_...

Curiously, gpt-5.5 is noticeably worse than gpt-5.4, and opus-4.7 is slightly worse than opus-4.6.

lioeters•26m ago
As an admirer of your work with binary lambda calculus, etc., I'm curious to hear your thoughts on the author's company with HVM and interaction combinators. https://higherorderco.com/ I've always felt there was untapped potential in this area, and their work seems like a way toward a practical application for parallel computing and maybe leveraging LLMs using a minimal language specification.
dataviz1000•1h ago
lambench is single-attempt one shot per problem.

I don't think they understand how the LLM models work. To truly benchmark a non-deterministic probabilistic model, they are going to need to run each about 45 times. LLM models are distributions and behave accordingly.

The better story is how do the models behave on the same problem after 5 samples, 15 samples, and 45 samples.

That said, using lambda calculus is a brilliant subject for benchmarking.

The models are reliably incorrect. [0]

[0] https://adamsohn.com/reliably-incorrect/

NitpickLawyer•1h ago
New, unbenched problems are really the only way to differentiate the models, and every time I see one it's along the same lines. Models from top labs are neck and neck, and the rest of the bunch are nowhere near. Should kinda calm down the "opus killer" marketing that we've seen these past few months, every time a new model releases, esp the small ones from china.

It's funny that even one the strongest research labs in china (deepseek) has said there's still a gap to opus, after releasing a humongous 1.6T model, yet the internet goes crazy and we now have people claiming [1] a 27b dense model is "as good as opus"...

I'm a huge fan of local models, have been using them regularly ever since devstral1 released, but you really have to adapt to their limitations if you want to do anything productive. Same as with other "cheap", "opus killers" from china. Some work, some look like they work, but they go haywire at the first contact with a real, non benchmarked task.

[1] - https://x.com/julien_c/status/2047647522173104145

cmrdporcupine•1h ago
The question isn't whether it's "as good as Opus" but that there exists something that costs 1/10th the cost to use but can still competently write code.

Honestly, I was "happy" with December 2025 time frame AI or even earlier. Yes, what's come after has been smarter faster cleverer, but the biggest boost in productivity was just the release of Opus 4.5 and GPT 5.2/5.3.

And yes it might be a competitive disadvantage for an engineer not to have access to the SOTA models from Anthropic/OpenAI, but at the same time I feel like the missing piece at this point is improvements in the tooling/harness/review tools, not better-yet models.

They already write more than we can keep up with.

NitpickLawyer•1h ago
Oh, I agree. Last year I tried making each model a "daily driver", including small ones like gpt5-mini / haiku, and open ones, like glm, minimax and even local ones like devstral. They can all do some tasks reliably, while struggling at other tasks. But yeah, there comes a point where, depending on your workflows, some smaller / cheaper models become good enough.

The problem is with overhypers, that they overhype small / open models and make it sound like they are close to the SotA. They really aren't. It's one thing to say "this small model is good enough to handle some tasks in production code", and it's a different thing to say "close to opus". One makes sense, the other just sets the wrong expectations, and is obviously false.

cmrdporcupine•59m ago
I am desperate for the tooling that puts me back in charge. And just has the models as advisor. In which case the "smart level" is just a dial.

I'm probably going to have to make it myself.

vorticalbox•18m ago
Some ide’s already have this. In zed you can stick it “ask” mode.

Being able to use it as a rubber duck while it can also read the code works quite well.

There are a few APIs at work I have never worked on and the person that wrote them no longer works with us so AI fills that gap well.

adrian_b•17m ago
There is no doubt that for many tasks the SotA models of OpenAI and Anthropic are better than the available open weights models.

Nevertheless, I do not believe that either OpenAI or Anthropic or Google know any secret sauce for better training LLMs. I believe that their current superiority is just due to brute force. This means that their LLMs are bigger and they have been trained on much more data than the other LLM producers have been able to access.

Moreover, for myself, I can extract much more value from an LLM that is not constrained by a metered by token cost and where I have full control on the harness used to run the model. Even if the OpenAI or Anthropic models had been much better in comparison with the competing models, I would have still been able to accomplish more useful work with an open-weights model.

I have already passed once through the transition from fast mainframes and minicomputers that I was accessing remotely by sharing them with other users, to slow personal computers over which I had absolute control. Despite the differences in theoretical performance, I could do much more with a PC and the same is true when I have absolute control over an LLM.

adrian_b•1h ago
Benchmarks for LLMs without complete information about the tested models are hard to interpret.

For the OpenAI and Anthropic models, it is clear that they have been run by their owners, but for the other models there are a great number of options for running them, which may run the full models or only quantized variants, with very different performances.

For instance, in the model list there are both "moonshotai/kimi-k2.6" and "kimi-k2.6", with very different results, but there is no information about which is the difference between these 2 labels, which refer to the same LLM.

Moreover, as others have said, such a benchmark does not prove that a certain cheaper model cannot solve a problem. It happened to not solve it within the benchmark, but running it multiple times, possibly with adjusted prompts, may still solve the problem.

While for commercial models running them many times can be too expensive, when you run a LLM locally you can afford to run it much more times than when you are afraid of the token price or of reaching the subscription limits.

NitpickLawyer•1h ago
Agreed. But, at least as of yesterday, dsv4 was only served by deepseek. And, more importantly, that's what the "average" experience would be if you'd setup something easy like openrouter. Sure, with proper tuning and so on you can be sure you're getting the model at its best. But are you, if you just setup openrouter and go brrr? Maybe. Maybe not.
cmrdporcupine•32m ago
I think it's important to point out that DeepSeek was basically soft-launching their v4 model, and they weren't emphasizing it as some sort of SOTA-killer but more as proof of a potentially non-NVIDIA serving world, and as a venue for their current research approaches.

I think/hope we'll see a 4.2 that looks a lot better, same as 3.2 was quite competitive at the time it launched.

cmrdporcupine•1h ago
Odd to see GPT 5.5 behind 5.4?
no_op•53m ago
The author posted new results using the API (apparently the original run was through Codex), and 5.5 moves to the top: https://x.com/VictorTaelin/status/2047818978664268071
internet_points•1h ago
Would love to see where the mistral stuff lands.

Also, being from Victor Taelin, shouldn't this be benching Interaction Combinators? :)

maciejzj•30m ago
Can anyone more familiar with lambda calculus speculate why all models fail to implement fft? There are gazzilion fft implementations in various languages over the web and the actual cooley-tukey algorithm is rather short.