frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Cockpit is a web-based graphical interface for servers

https://github.com/cockpit-project/cockpit
101•modinfo•2h ago•58 comments

Astral to Join OpenAI

https://astral.sh/blog/openai
1138•ibraheemdev•9h ago•709 comments

Google details new 24-hour process to sideload unverified Android apps

https://arstechnica.com/gadgets/2026/03/google-details-new-24-hour-process-to-sideload-unverified...
376•0xedb•5h ago•401 comments

How the Turner twins are mythbusting modern technical apparel

https://www.carryology.com/insights/how-the-turner-twins-are-mythbusting-modern-gear/
63•greedo•2d ago•27 comments

Return of the Obra Dinn: spherical mapped dithering for a 1bpp first-person game

https://forums.tigsource.com/index.php?topic=40832.msg1363742#msg1363742
192•PaulHoule•3d ago•25 comments

Show HN: Three new Kitten TTS models – smallest less than 25MB

https://github.com/KittenML/KittenTTS
281•rohan_joshi•7h ago•87 comments

Be intentional about how AI changes your codebase

https://aicode.swerdlow.dev
30•benswerd•1h ago•14 comments

EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages

https://esolang-bench.vercel.app/
37•matt_d•2h ago•10 comments

Noq: n0's new QUIC implementation in Rust

https://www.iroh.computer/blog/noq-announcement
121•od0•4h ago•17 comments

Waymo Safety Impact

https://waymo.com/safety/impact/
150•xnx•2h ago•124 comments

From Oscilloscope to Wireshark: A UDP Story (2022)

https://www.mattkeeter.com/blog/2022-08-11-udp/
61•ofrzeta•3h ago•11 comments

Clockwise acquired by Salesforce and shutting down next week

https://www.getclockwise.com
42•nigelgutzmann•3h ago•23 comments

NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute

https://qlabs.sh/10x
73•sdpmas•4h ago•11 comments

4Chan mocks £520k fine for UK online safety breaches

https://www.bbc.com/news/articles/c624330lg1ko
214•mosura•8h ago•331 comments

“Your frustration is the product”

https://daringfireball.net/2026/03/your_frustration_is_the_product
369•llm_nerd•11h ago•226 comments

Launch HN: Voltair (YC W26) – Drone and charging network for power utilities

42•wweissbluth•6h ago•22 comments

Juggalo makeup blocks facial recognition technology (2019)

https://consequence.net/2019/07/juggalo-makeup-facial-recognition/
215•speckx•10h ago•133 comments

Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster

https://blog.skypilot.co/scaling-autoresearch/
101•hopechong•6h ago•44 comments

OpenBSD: PF queues break the 4 Gbps barrier

https://undeadly.org/cgi?action=article;sid=20260319125859
171•defrost•9h ago•53 comments

An update on Steam / GOG changes for OpenTTD

https://www.openttd.org/news/2026/03/19/steam-changes-update
242•jandeboevrie•5h ago•170 comments

Tesla: Failure of the FSD's degradation detection system [pdf]

https://static.nhtsa.gov/odi/inv/2026/INOA-EA26002-10023.pdf
135•doener•2h ago•59 comments

Xiaomi launches next-gen SU7 with 902 km range and Lidar, still undercuts Tesla

https://electrek.co/2026/03/19/xiaomi-launches-next-gen-su7-902-km-range-undercuts-tesla/
41•breve•1h ago•3 comments

The Shape of Inequalities

https://www.andreinc.net/2026/03/16/the-shape-of-inequalities/
86•nomemory•8h ago•14 comments

The Need for an Independent AI Grid

https://amppublic.com/
7•olalonde•1h ago•0 comments

Connecticut and the 1 Kilometer Effect

https://alearningaday.blog/2026/03/19/connecticut-and-the-1-kilometer-effect/
33•speckx•5h ago•24 comments

macOS 26 breaks custom DNS settings including .internal

https://gist.github.com/adamamyl/81b78eced40feae50eae7c4f3bec1f5a
296•adamamyl•7h ago•145 comments

Anthropic takes legal action against OpenCode

https://github.com/anomalyco/opencode/pull/18186
319•_squared_•3h ago•267 comments

I turned Markdown into a protocol for generative UI

https://fabian-kuebler.com/posts/markdown-agentic-ui/
64•FabianCarbonara•9h ago•33 comments

Afroman found not liable in defamation case

https://nypost.com/2026/03/18/us-news/afroman-found-not-liable-in-bizarre-ohio-defamation-case/
1065•antonymoose•13h ago•603 comments

Android developer verification: Balancing openness and choice with safety

https://android-developers.googleblog.com/2026/03/android-developer-verification.html
12•WalterSobchak•2h ago•2 comments
Open in hackernews

EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages

https://esolang-bench.vercel.app/
37•matt_d•2h ago

Comments

deklesen•1h ago
Mhh... my hunch is that part of this is that all python keywords are 1 token, I assume. And for those very weird languages, tokenizing might make it harder to reason over those tokens.

Would love to see how the benchmarks results change if the esoteric languages are changed a bit to make them have 1-token keywords only.

chychiu•1h ago
Considering that brainfuck only has 8 characters and models are scoring at 6.2% I don't think tokenization is the issue
altruios•1h ago
The only issue. *

Reasoning is hard, reasoning about colors while wearing glasses that obfuscate the real colors... even harder... but not the core issue if your brain not wired correctly to reason.

I suspect the way out of this is to separate knowledge from reason: to train reasoning with zero knowledge and zero language... and then to train language on top of a pre-trained-for-reasoning model.

__alexs•1h ago
I had hope we might finally be ushering in a bold new era of programming in Malbolge but apparently that was too optimistic.
bwestergard•1h ago
I'm shocked to see how poorly these models, which I find useful day to day, do in solving virtually any of the problems in Unlambda.

Before looking at the results my guess was that scores would be higher for Unlambda than any of the others, because humans that learn Scheme don't find it all that hard to learn about the lambda calculus and combinatory logic.

But the model that did the best, Qwen-235B, got virtually every problem wrong.

__alexs•1h ago
They are also weirdly bad at Brainfuck which is basically just a subset of C.
simianwords•1h ago
I bet I can do better by allowing this: the llm can pull documentation of the language from the web to understand how it works.

If the llm has “skills” for that language, it will definitely increase accuracy.

orthoxerox•31m ago
> Frontier models score ~90% on Python but only 3.8% on esoteric languages, exposing how current code generation relies on training data memorization rather than genuine programming reasoning.

I would probably score about the same, does this prove I also rely on training data memorization rather than genuine programming reasoning?

Or does this simply show that esolangs are hard to reason in by design? A more honest approach would use a "real", but relatively unpopular, language. Make them use CoffeeScript or Ada or PL/I or Odin or that other systems programming language that that very opinionated guy is implementing on top of QBE.

iloveoof•21m ago
Try MUMPS, widely used but little training data online. Probably less than some esolangs
wavemode•19m ago
> I would probably score about the same, does this prove I also rely on training data memorization rather than genuine programming reasoning?

Setting aside whether this benchmark is meaningful or not - the argument you're making is faulty. There are indeed humans who can write complete programs in Brainfuck and these other esolangs. The fact that you personally can't is not logically relevant.