frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

"A milion token context" Big AI says. But the model is accurate for 2-4K tokens

https://unagent.eu/2025/04/22/misleading-promises-of-long-context-llm/
2•kzawpl•1y ago

Comments

kzawpl•1y ago
Over last two years there were claims of better long context capabilities for LLM, but that is often tested on exact text search. New benchmark called NoLiMa shows that long context capability of LLM is still poor, if you want LLM to perform some abstraction and reasoning.
vessenes•1y ago
Meh. NoLima is helpful, in that it shows what we all "feel" working with models -- there's a marked dropoff in accuracy and intelligence as we get past 4-32k of context, depending on the model.

But, it seems unreasonable to be super worried about this -- a year or two ago, models couldn't easily find needles in haystacks of long context. As training and test strategies delivered trainable content, this became a thing that could be done perfectly across millions of tokens of context. There has not been a good way to incentivize models to do anything more but remember locations yet.

We are (mostly) paying the full costs of attending to the entire context in current architectures, and it seems pretty reasonable that we will therefore be able to train those architectures to more fully attend across context if we get the right training data into (ideally) an RL loop.

NoLima is an okay test, but I think the most recent OpenAI tests are significantly better and quite interesting; OpenAI-MRCR and Graphwalks are both super smart ideas about how to programmatically generate data that is easy to evaluate and forces better cross context attention.

From their 4.1 announcement: Graphwalks fills the context window with a directed graph composed of hexadecimal hashes, and then asks the model to perform a breadth-first search (BFS) starting from a random node in the graph. We then ask it to return all nodes at a certain depth.

MRCR asks for direct quotes at semantically identified locations in the text, e.g. poems about tapirs, bears and ballerinas, as well as stories about tapirs, bears and ballerinas are generated, perhaps fifty each. The system is asked "give me the third poem about tapirs". This requires counting, conceptual attention, and also distinguishing between stories and poems.

They only test their own models on MRCR for the benchmark graph, but it's still worth reviewing: the accuracy curves are super interesting. https://openai.com/index/gpt-4-1/

Hackers breach JDownloader website to serve malware-laced downloads

https://www.neowin.net/news/if-you-downloaded-this-popular-software-recently-you-might-have-insta...
2•bundie•2m ago•0 comments

Show HN: NanoCorp – Create autonomous companies run by AI

https://www.nanocorp.so
1•AdrienBA•2m ago•1 comments

How dangerous is Anthropic's Mythos AI? - Bruce Schneier

https://www.theguardian.com/commentisfree/2026/may/08/how-dangerous-is-anthropics-mythos-ai
1•kuerbel•5m ago•0 comments

Papers That Inspire Wonder

https://nicholasdecker.substack.com/p/100-papers-that-inspire-wonder
2•surprisetalk•6m ago•0 comments

PostgreSQL's Aggregate Filter Will Spoil You

https://stokerpostgresql.blogspot.com/2025/02/how-postgresqls-aggregate-filter-will.html
1•surprisetalk•7m ago•0 comments

SpaceX is starting to move on from the most successful rocket

https://arstechnica.com/space/2026/05/spacex-is-starting-to-move-on-from-the-worlds-most-successf...
1•rbanffy•10m ago•0 comments

Apple's Camera-Equipped AirPods In Late State Testing

https://www.bloomberg.com/news/articles/2026-05-07/apple-s-camera-equipped-airpods-reach-advanced...
1•flippyhead•10m ago•0 comments

I've been trying to understand how antennas work ...

https://cloudisland.nz/@pjf/114821653743562368
1•ColinWright•10m ago•0 comments

The Other (Analog) Computer

https://www.therml.ai/blog/two-tracks
1•Anon84•11m ago•0 comments

Printing Press – Print the best agent-designed CLI of all time

https://printingpress.dev/
1•Anon84•11m ago•0 comments

Nuclear-Powered Shipping Initiative Push to Revive U.S. Maritime Industry

https://gcaptain.com/trump-administration-launches-nuclear-powered-shipping-push-to-revive-u-s-ma...
1•mpweiher•11m ago•0 comments

The Magnet Beneath Every Robot Joint – Atoms to Algorithms

https://atomsfrontier.substack.com/p/the-magnet-beneath-every-robot-joint
1•jpatel3•11m ago•0 comments

US Government releases first batch of UAP documents

https://www.war.gov/UFO/
3•david-gpu•11m ago•2 comments

If You Can Make a Compute Engine, You Can Sell a Compute Engine

https://www.nextplatform.com/compute/2026/05/06/if-you-can-make-a-compute-engine-you-can-sell-a-c...
1•rbanffy•12m ago•0 comments

The Impact of AI-Generated Text on the Internet

https://ai-on-the-internet.github.io/
1•TheWeiHu•13m ago•0 comments

A life in pictures: celebrating David Attenborough at 100

https://www.nature.com/immersive/d41586-026-01371-5/index.html
2•sohkamyung•16m ago•0 comments

Python 3.15.0 beta 1 is here

https://blog.python.org/2026/05/python-3150-beta-1/
2•GalaxySnail•16m ago•0 comments

BeeL – VeriFactu-compliant invoicing API for Spain

https://beel.es
2•massanaRoger•16m ago•0 comments

David Attenborough's 100th Birthday

https://www.bbc.com/news/articles/cp3pww9g0p5o
3•defrost•18m ago•0 comments

Show HN: Rfp.ai – answer RFPs from approved docs, with source citations

https://rfp.ai/
3•dutchcode•20m ago•0 comments

Benchmarking AI agent retrieval strategies on Kubernetes bug fixes

https://www.cncf.io/blog/2026/05/08/benchmarking-ai-agent-retrieval-strategies-on-kubernetes-bug-...
2•xngbuilds•20m ago•0 comments

The future of work is world models

https://www.strangeloopcanon.com/p/the-future-of-work-is-world-models
2•surprisetalk•22m ago•0 comments

Repomind – 256K context coding agent on a single AMD MI300X (FP8)

https://github.com/SRKRZ23/repomind
2•sardor_r1•23m ago•0 comments

Shader Model 6.10 Preview and AgilitySDK 720 Preview

https://devblogs.microsoft.com/directx/shader-model-6-10-agilitysdk-720-preview/
2•ibobev•24m ago•0 comments

D3D12 LinAlg Matrix Preview

https://devblogs.microsoft.com/directx/d3d12-linalg-preview/
2•ibobev•24m ago•0 comments

Quantizing Tangent Frames

https://zeux.io/2026/04/30/quantizing-tangent-frames/
2•ibobev•25m ago•0 comments

Building an Open Source 7MB AI Terminal with Agents and Code Editor (Rust/Tauri)

https://github.com/crynta/terax-ai
2•crynta•26m ago•1 comments

The reason that everybody use LLMs [video]

https://www.youtube.com/watch?v=xE9W9Ghe4Jk
2•modinfo•26m ago•0 comments

Gnome Surface next-generation AI desktop platform built with GirCore-SkiaSharp

https://github.com/GnomeMaui/surface
2•czirok•27m ago•0 comments

Sumud flotilla ships released from Greek detention

https://www.marinetraffic.com/en/ais/home/shipid:85582/zoom:10
2•burnt-resistor•30m ago•1 comments