frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Forked Garry Tan's gstack and adapted for Google's Antigravity and Gemini-CLI

https://github.com/asecretcompany/gstack-fork
1•andrewjneumann•26s ago•0 comments

I Spoke to AI Agent Claude – Sen Bernie Sanders

https://www.youtube.com/watch?v=h3AtWdeu_G0
1•timetraveller26•2m ago•0 comments

ShouldIBuildThat finds app opportunities that appear across multiple signals

https://www.shouldibuildthat.com/
1•da352•2m ago•0 comments

Building a UI Framework [pdf]

https://software.hixie.ch/ui-frameworks.pdf
1•jarek-foksa•5m ago•0 comments

IdeaClaw – one sentence, get a camera-ready paper, BP, DD reports, health report

https://github.com/StartripAI/ideaClaw
1•AlfredHua1•7m ago•0 comments

What's in a name? – The unknown faces of history

https://www.uni-bonn.de/en/news/048-2026
1•hhs•7m ago•0 comments

Making an Argument for (Voluntary) Online Identity Verification

https://agoraid.com/blog/supporting-online-identity-verification/
1•kisamoto•8m ago•0 comments

To Catholic thinkers, Pentagon's AI demands violate 'human dignity'

https://www.washingtonpost.com/nation/2026/03/19/anthropic-war-ai-catholic-church/
1•reaperducer•11m ago•0 comments

I built a database scoring what separates high-scoring pitch decks from the rest

https://www.unbiasedventures.ch/pitch-deck-examples-2026/
1•peterweisz•11m ago•0 comments

House speaker, Intel chiefs make new push to renew surveillance law

https://www.reuters.com/legal/government/republican-speaker-intel-chiefs-make-new-push-renew-surv...
3•petethomas•12m ago•0 comments

Replacing Anki: what I learned building a language app (1k users, $21 MRR)

https://www.indiehackers.com/post/i-built-a-language-learning-app-to-replace-anki-1-000-users-21-...
1•vital_pavlenko•13m ago•0 comments

Agent-rendered: the pattern that replaces runtime infra with build-time AI

https://gumeo.github.io/post/agent-rendered-infrastructure/
1•gumeo•16m ago•0 comments

Vulnerabilities in OpenClaw: A Complete Enterprise Security Analysis

https://ClawNanny.com/docs_viewer?markdown_url=/static/docs/ClawNanny_OpenClaw_Enterprise_Securit...
1•OpenSystemApps•17m ago•0 comments

Minecraft Source Code Is Interesting

https://www.karanjanthe.me/posts/minecraft-source/
2•KMJ-007•17m ago•0 comments

AI Pentester

https://www.noscope.com/
1•realtryhackme•18m ago•0 comments

Update iOS to protect your iPhone from web attacks

https://support.apple.com/en-us/126776
1•tech234a•19m ago•0 comments

New "PolyShell" flaw allows unauthenticated RCE on Magento e-stores

https://www.bleepingcomputer.com/news/security/new-polyshell-flaw-allows-unauthenticated-rce-on-m...
1•uyzstvqs•19m ago•0 comments

Generalized Dot-Product Attention: Tackling Real-World Challenges in GPU Kernels

https://pytorch.org/blog/generalized-dot-product-attention-tackling-real-world-challenges-in-gpu-...
1•matt_d•19m ago•0 comments

Delve (YC W24) – Fake Compliance as a Service – Part I

https://deepdelver.substack.com/p/delve-fake-compliance-as-a-service
2•sebmellen•20m ago•0 comments

M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

https://arxiv.org/abs/2603.14360
1•matt_d•21m ago•0 comments

COW Fork: Zero-Copy Sandbox Cloning for AI Agents

https://multikernel.io/2026/03/19/sandlock-cow-fork/
1•wang_cong•22m ago•0 comments

Netcup Increases Prices over 21%

https://www.netcup.com/en/priceadjustment
1•gethly•23m ago•0 comments

Location of French aircraft carrier leaked in real time via Strava user on board

https://www.lemonde.fr/international/article/2026/03/19/stravaleaks-le-porte-avions-charles-de-ga...
1•asp1•23m ago•0 comments

360B tokens, 3M customers, 6 engineers

https://vercel.com/blog/360-billion-tokens-3-million-customers-6-engineers
1•gmays•28m ago•0 comments

Beat Paxos

http://muratbuffalo.blogspot.com/2026/03/break-paxos.html
2•ingve•28m ago•0 comments

Things That Turbo Pascal Is Smaller Than (2011)

https://prog21.dadgum.com/116.html
1•birdculture•30m ago•0 comments

Justice Department Disrupts Iranian Cyber Enabled Psychological Operations

https://www.publicnow.com/view/938A7EFC4064A4EE42581494308F60A13767ADEA
1•Animats•30m ago•1 comments

US Jobless Claims Fell Last Week to Lowest Since January

https://www.bloomberg.com/news/articles/2026-03-19/us-jobless-claims-declined-last-week-to-lowest...
1•DGAP•31m ago•0 comments

Kalshi in Hot Water – What This Means for Startups Like PolyBets

https://www.nytimes.com/2026/03/17/technology/arizona-criminal-charges-kalshi.html
3•realJared54•31m ago•0 comments

Crypto.com lays off 12% of workforce as latest company to cite AI in job cuts

https://www.cnbc.com/2026/03/19/crypto-com-layoffs-12percent-ai-job-loss.html
2•DGAP•32m ago•0 comments
Open in hackernews

EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages

https://esolang-bench.vercel.app/
30•matt_d•1h ago

Comments

deklesen•1h ago
Mhh... my hunch is that part of this is that all python keywords are 1 token, I assume. And for those very weird languages, tokenizing might make it harder to reason over those tokens.

Would love to see how the benchmarks results change if the esoteric languages are changed a bit to make them have 1-token keywords only.

chychiu•1h ago
Considering that brainfuck only has 8 characters and models are scoring at 6.2% I don't think tokenization is the issue
altruios•54m ago
The only issue. *

Reasoning is hard, reasoning about colors while wearing glasses that obfuscate the real colors... even harder... but not the core issue if your brain not wired correctly to reason.

I suspect the way out of this is to separate knowledge from reason: to train reasoning with zero knowledge and zero language... and then to train language on top of a pre-trained-for-reasoning model.

__alexs•1h ago
I had hope we might finally be ushering in a bold new era of programming in Malbolge but apparently that was too optimistic.
bwestergard•1h ago
I'm shocked to see how poorly these models, which I find useful day to day, do in solving virtually any of the problems in Unlambda.

Before looking at the results my guess was that scores would be higher for Unlambda than any of the others, because humans that learn Scheme don't find it all that hard to learn about the lambda calculus and combinatory logic.

But the model that did the best, Qwen-235B, got virtually every problem wrong.

__alexs•1h ago
They are also weirdly bad at Brainfuck which is basically just a subset of C.
simianwords•1h ago
I bet I can do better by allowing this: the llm can pull documentation of the language from the web to understand how it works.

If the llm has “skills” for that language, it will definitely increase accuracy.

orthoxerox•16m ago
> Frontier models score ~90% on Python but only 3.8% on esoteric languages, exposing how current code generation relies on training data memorization rather than genuine programming reasoning.

I would probably score about the same, does this prove I also rely on training data memorization rather than genuine programming reasoning?

Or does this simply show that esolangs are hard to reason in by design? A more honest approach would use a "real", but relatively unpopular, language. Make them use CoffeeScript or Ada or PL/I or Odin or that other systems programming language that that very opinionated guy is implementing on top of QBE.

iloveoof•6m ago
Try MUMPS, widely used but little training data online. Probably less than some esolangs
wavemode•4m ago
> I would probably score about the same, does this prove I also rely on training data memorization rather than genuine programming reasoning?

Setting aside whether this benchmark is meaningful or not - the argument you're making is faulty. There are indeed humans who can write complete programs in Brainfuck and these other esolangs. The fact that you personally can't is not logically relevant.