news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The Benchmark Gap: 1,472 runs show coding-agent context changes outcomes

https://github.com/dorukardahan/benchmark-gap

3•dorukardahan•1h ago

Comments

dorukardahan•1h ago

I ran this as a practitioner benchmark, not as a vendor takedown.

The main result is not “GLM-5.1 is bad.” At 80K+ context, the same model family performs well: GLM-5.1 reached 98.7% in the main matrix and 100% in the matched preserved-thinking control. The 32K failure is specifically about tool-mediated runtime conditions in OpenCode 1.3.17, where about 21K tokens of built-in context were already present before the task had room to work.

The paper does not claim Z.AI benchmarks are fake, that GLM-5.1 cannot work at 32K, or that the supplemental probes rank tools. I included raw SQLite DBs, runners, verifiers, checksums, reproduction docs, limitations, and a reviewer FAQ.

I work at 0G Foundation, but this is personal research and has no connection to 0G. I would especially welcome reproduction attempts with different coding tools or models.

GloVe Galaxy Explorer

https://glove.theory-a.com

1•notShabu•3m ago•0 comments

Local-Run Graph-Based Scalable AGI

https://boggersthefish.com/

1•explaingarlic•4m ago•0 comments

Survival Is the Only Success

https://ofdollarsanddata.com/survival-is-the-only-success/

1•speckx•9m ago•0 comments

DeepSeek V4 in vLLM: Efficient Long-Context Attention

https://vllm-website-pdzeaspbm-inferact-inc.vercel.app/blog/deepseek-v4

1•Palmik•10m ago•0 comments

Show HN: I got tired of boring SQL tutorials, so I built a game

https://sqlprotocol.com

1•ItaiZeilig•11m ago•0 comments

Multi-player agents don't fit in the sandbox

https://www.mendral.com/blog/multi-player-agents-sandbox

1•shad42•11m ago•0 comments

Show HN: Duckville, a persistent-world life SIM where you're a duck

https://duckville.town

1•stfurkan•11m ago•0 comments

GPT hallucinated a bug in my code, so I 'fixed' it

https://www.droppedasbaby.com/posts/2602-02/

1•offbyone42•12m ago•0 comments

UK to permanently ban future generations from buying cigarettes

https://nypost.com/2026/04/21/world-news/uk-to-permanently-ban-future-generations-from-buying-cig...

1•ivewonyoung•14m ago•2 comments

How People Smuggle the Internet Through DNS [video]

https://www.youtube.com/watch?v=Bnir1IQAPPE

1•hexomancer•15m ago•0 comments

Claude Code Tips I Wish I'd Had from Day One

https://marmelab.com/blog/2026/04/24/claude-code-tips-i-wish-id-had-from-day-one.html

2•adunk•16m ago•0 comments

20 Years Ago, I Spent $8 on This. My Life Was Never the Same

https://ryanholiday.net/20-years-ago-i-spent-8-on-this-my-life-was-never-the-same/

1•speckx•17m ago•0 comments

TSMC Says ASML's Latest Chipmaking Gear Is Too Pricey to Use

https://www.bloomberg.com/news/articles/2026-04-22/tsmc-says-asml-s-latest-chipmaking-gear-is-too...

1•spenrose•19m ago•0 comments

New 21-character nuclear command message observed during April exercise window

https://neetintel.substack.com/p/its-just-an-exercise-bro

1•Quasimarion•20m ago•0 comments

Show HN: I built an automatic micropayment-ish system to support the Web

https://www.inamoon.com

1•mankins•23m ago•1 comments

Cloaca (art installation)

https://en.wikipedia.org/wiki/Cloaca_(art_installation)

1•fecalorimeter•25m ago•0 comments

Show HN: MultiTable – I built a dashboard so I could vibe-code from my phone

https://github.com/erickalfaro/multitable

1•ericksnetwork•25m ago•3 comments

AlphaGo versus Lee Sedol

https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol

1•simonebrunozzi•28m ago•1 comments

Show HN: SherifDB, a databe written in Golang under 500 LOC

https://emmanuel326.github.io/blogs/sheriffdb.html

2•Nya-kundi•30m ago•1 comments

Jumping into cold water can stop your heart

https://jorgenmelau.substack.com/p/the-first-sixty-seconds

12•fanf2•31m ago•1 comments

A Guide to Reducing Cognitive Load

https://www.softwaredesign.ing/blog/a-guide-to-reducing-cognitive-load

2•prakhar897•32m ago•0 comments

Show HN: I built a free LLMs.txt generator

https://veerhost.com/llms-txt-generator/

4•aiwrita•34m ago•2 comments

DrP: Meta's Root Cause Analysis Platform at Scale

https://engineering.fb.com/2025/12/19/data-infrastructure/drp-metas-root-cause-analysis-platform-...

2•theorchid•34m ago•0 comments

Show HN: PlayDrop – A home for vibe coded browser games

https://www.playdrop.ai

2•zabar•39m ago•1 comments

Fedora 44 Go/No-Go meeting

https://meetbot.fedoraproject.org/meeting_matrix_fedoraproject-org/2026-04-23/f44-final-go-no-go-...

2•politelemon•40m ago•0 comments

Music of the BBC Microcomputer System

https://www.acornelectron.co.uk/eug/72/a-musi.html

3•eightb•41m ago•0 comments

Government watchdog urges FAA to address Boeing MAX engine issue

https://www.seattletimes.com/business/boeing-aerospace/government-watchdog-urges-faa-to-address-b...

2•dangle1•42m ago•0 comments

United States of America vs. Matthew David Keirans [pdf]

https://ecf.ca8.uscourts.gov/opndir/26/04/251339P.pdf

2•nz•44m ago•0 comments

Surf-CLI – a CLI for AI agents to control Chrome

https://github.com/nicobailon/surf-cli

3•cardboard9926•47m ago•0 comments

Radar Laboratory – Interactive Radar Phenomenology

https://radarlaboratory.com/

3•jonbaer•48m ago•0 comments