frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

"A milion token context" Big AI says. But the model is accurate for 2-4K tokens

https://unagent.eu/2025/04/22/misleading-promises-of-long-context-llm/
2•kzawpl•11mo ago

Comments

kzawpl•11mo ago
Over last two years there were claims of better long context capabilities for LLM, but that is often tested on exact text search. New benchmark called NoLiMa shows that long context capability of LLM is still poor, if you want LLM to perform some abstraction and reasoning.
vessenes•11mo ago
Meh. NoLima is helpful, in that it shows what we all "feel" working with models -- there's a marked dropoff in accuracy and intelligence as we get past 4-32k of context, depending on the model.

But, it seems unreasonable to be super worried about this -- a year or two ago, models couldn't easily find needles in haystacks of long context. As training and test strategies delivered trainable content, this became a thing that could be done perfectly across millions of tokens of context. There has not been a good way to incentivize models to do anything more but remember locations yet.

We are (mostly) paying the full costs of attending to the entire context in current architectures, and it seems pretty reasonable that we will therefore be able to train those architectures to more fully attend across context if we get the right training data into (ideally) an RL loop.

NoLima is an okay test, but I think the most recent OpenAI tests are significantly better and quite interesting; OpenAI-MRCR and Graphwalks are both super smart ideas about how to programmatically generate data that is easy to evaluate and forces better cross context attention.

From their 4.1 announcement: Graphwalks fills the context window with a directed graph composed of hexadecimal hashes, and then asks the model to perform a breadth-first search (BFS) starting from a random node in the graph. We then ask it to return all nodes at a certain depth.

MRCR asks for direct quotes at semantically identified locations in the text, e.g. poems about tapirs, bears and ballerinas, as well as stories about tapirs, bears and ballerinas are generated, perhaps fifty each. The system is asked "give me the third poem about tapirs". This requires counting, conceptual attention, and also distinguishing between stories and poems.

They only test their own models on MRCR for the benchmark graph, but it's still worth reviewing: the accuracy curves are super interesting. https://openai.com/index/gpt-4-1/

America's AI Build-Out Hinges on Chinese Electrical Parts

https://www.bloomberg.com/news/features/2026-04-01/us-ai-data-center-expansion-relies-on-chinese-...
1•doener•44s ago•0 comments

Letting go of climate guilt in 5 easy steps [pdf]

https://hsph.harvard.edu/wp-content/uploads/2024/11/21.08-Letting-go-of-climate-guilt-in-5-easy-s...
1•num42•2m ago•0 comments

Anthropics Mythos Model Sparks Fears of AI Doomsday

https://nypost.com/2026/04/08/business/anthropics-claude-mythos-model-sparks-fears-of-ai-doomsday...
1•silexia•4m ago•0 comments

Under oath, Frank Lloyd Wright introduced himself as "world greatest architect"

https://www.pidgeondigital.com/talks/the-world-s-greatest-architect/chapters/
3•felipevb•7m ago•1 comments

Does Anybody Need Me?

https://ed-wentworth.medium.com/does-anybody-need-me-6fde408000cb
1•gpi•8m ago•0 comments

Navigating the Mythos-haunted world of platform security

https://www.redhat.com/en/blog/navigating-mythos-haunted-world-platform-security
1•LaSombra•10m ago•0 comments

Show HN: Connect with strangers who feel the same as you

https://emotiapp.com/
2•lirongliu•11m ago•1 comments

The Life and Death of the Book Review

https://libertiesjournal.com/articles/the-life-and-death-of-the-book-review/
1•lermontov•12m ago•0 comments

The Usefulness of AI Agents

https://erikjohannes.no/posts/20260408-on-the-usefulness-of-ai-agents/index.html
1•wazHFsRy•15m ago•1 comments

I wish Xcode was more like Visual Studio when coding C++

https://www.lasselaursen.com/post/i-wish-xcode-was-more-like-visual-studio-when-coding-c/
2•Gazoo101•22m ago•0 comments

Formal Verification in Any Language for Everybody (lean 4)

https://www.dev-log.me/formal_verification_in_any_language_for_everybody/
1•wazHFsRy•22m ago•2 comments

Can LLMs accelerate science? An experiment

https://pavpanchekha.com/blog/llk.html
1•pavpanchekha•26m ago•0 comments

Federal Court Denies Anthropic's Motion to Lift 'Supply Chain Risk' Label

https://www.nytimes.com/2026/04/08/technology/anthropic-pentagon-risk-circuit-court.html
1•DeathArrow•27m ago•0 comments

Flatpak: Complete Sandbox Escape

https://github.com/flatpak/flatpak/security/advisories/GHSA-cc2q-qc34-jprg
1•eyberg•28m ago•0 comments

AI #163: Mythos Quest

https://thezvi.substack.com/p/ai-163-mythos-quest
1•paulpauper•33m ago•0 comments

US adults are having fewer kids – and it's forcing schools to close

https://www.theguardian.com/us-news/2026/mar/16/birthrate-schools-closing
1•PaulHoule•36m ago•0 comments

'The Egg' by Andy Weir (2009)

https://www.galactanet.com/oneoff/theegg_mod.html
2•goekjclo•37m ago•0 comments

When AI Day of Reckoning?

https://www.overcomingbias.com/p/when-ai-day-of-reckoning
1•paulpauper•37m ago•0 comments

Keychron has open sourced its hardware

https://github.com/Keychron/Keychron-Keyboards-Hardware-Design/tree/main
5•azhenley•38m ago•0 comments

I rebuilt Claude Code's removed /buddy companion as a permanent MCP app

https://github.com/1270011/claude-buddy
2•1270011•40m ago•0 comments

Violating Copyright, Not the Planet

https://mumumelon.co/
1•Tomte•55m ago•0 comments

Bryson DeChambeau to use a 5-iron he made with 3D printer at Masters

https://www.espn.com/golf/story/_/id/48431238/bryson-dechambeau-using-iron-made-3d-printer-masters
1•1659447091•55m ago•0 comments

You've Been Writing Matplotlib Abstractions Wrong

https://www.dontusethiscode.com/blog/2026-04-08_matplotlib_abstractions.html
1•Tomte•56m ago•0 comments

How can I get your attention?

1•ChalresWT•58m ago•0 comments

On Partners, Mario, & Pi

https://www.foggynotions.day/#on-partners-mario-pi
1•doppp•58m ago•0 comments

The Waymo Rule for AI-Generated Code

https://rng.md/posts/the-waymo-rule-for-ai-generated-code/
2•handfuloflight•1h ago•0 comments

Know the Owners – Explore the ownership structures behind name brand products

https://knowtheowners.com/
3•pkutro•1h ago•1 comments

Pizza Tycoon simulated traffic on a 25 MHz CPU

https://pizzalegacy.nl/blog/traffic-system.html
1•vinhnx•1h ago•0 comments

Alignment Risk Update for Claude Mythos [pdf]

https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de43218158e5f25c.pdf
1•jablongo•1h ago•0 comments

Bash Owns the Loop

https://www.nibzard.com/wrappers/
1•peterdemin•1h ago•0 comments