frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

"A milion token context" Big AI says. But the model is accurate for 2-4K tokens

https://unagent.eu/2025/04/22/misleading-promises-of-long-context-llm/
2•kzawpl•10mo ago

Comments

kzawpl•10mo ago
Over last two years there were claims of better long context capabilities for LLM, but that is often tested on exact text search. New benchmark called NoLiMa shows that long context capability of LLM is still poor, if you want LLM to perform some abstraction and reasoning.
vessenes•10mo ago
Meh. NoLima is helpful, in that it shows what we all "feel" working with models -- there's a marked dropoff in accuracy and intelligence as we get past 4-32k of context, depending on the model.

But, it seems unreasonable to be super worried about this -- a year or two ago, models couldn't easily find needles in haystacks of long context. As training and test strategies delivered trainable content, this became a thing that could be done perfectly across millions of tokens of context. There has not been a good way to incentivize models to do anything more but remember locations yet.

We are (mostly) paying the full costs of attending to the entire context in current architectures, and it seems pretty reasonable that we will therefore be able to train those architectures to more fully attend across context if we get the right training data into (ideally) an RL loop.

NoLima is an okay test, but I think the most recent OpenAI tests are significantly better and quite interesting; OpenAI-MRCR and Graphwalks are both super smart ideas about how to programmatically generate data that is easy to evaluate and forces better cross context attention.

From their 4.1 announcement: Graphwalks fills the context window with a directed graph composed of hexadecimal hashes, and then asks the model to perform a breadth-first search (BFS) starting from a random node in the graph. We then ask it to return all nodes at a certain depth.

MRCR asks for direct quotes at semantically identified locations in the text, e.g. poems about tapirs, bears and ballerinas, as well as stories about tapirs, bears and ballerinas are generated, perhaps fifty each. The system is asked "give me the third poem about tapirs". This requires counting, conceptual attention, and also distinguishing between stories and poems.

They only test their own models on MRCR for the benchmark graph, but it's still worth reviewing: the accuracy curves are super interesting. https://openai.com/index/gpt-4-1/

New course on generative AI for behavioral science

https://statmodeling.stat.columbia.edu/2026/03/10/new-course-on-generative-ai-for-behavioral-scie...
1•dlojudice•29s ago•0 comments

Google sells partial stake in fiber, becomes minority owner of new venture

https://www.cnbc.com/2026/03/11/google-sells-partial-stake-in-fiber-becomes-minority-owner-in-ven...
1•internet-390•31s ago•0 comments

ICE/DHS gets hacked, all Contractors exposed

https://micahflee.github.io/ice-contracts/
1•peq42•4m ago•0 comments

Scaling the Lexinova Data Pipeline

1•LEXINOVAFaqs•6m ago•0 comments

Microsoft's growing control of Linux (2022)

https://lunduke.substack.com/p/microsofts-growing-control-of-linux
2•totetsu•7m ago•0 comments

Food costs set to spike as urea prices nearly doubles due to war in Iran

https://tradingeconomics.com/commodity/urea
2•burnt-resistor•8m ago•1 comments

Collecting perceptual data for a possible CSS optical-center property

1•gorkemyildiz•8m ago•0 comments

The Department of War is making a mistake [video]

https://www.youtube.com/watch?v=KBPOTklFTiU
1•ipnon•11m ago•0 comments

How do you handle state persistence in non-orientable data structures?

https://zenodo.org/records/18942850
1•MareSerenitatis•13m ago•1 comments

What happens if OpenAI or Anthropic fail?

https://www.reuters.com/commentary/breakingviews/what-happens-if-openai-or-anthropic-fail-2026-03...
4•billybuckwheat•13m ago•0 comments

Ask HN: Is Github Down Again?

https://twitter.com/m0nle0z/status/2031910716790517895
3•doanbactam•14m ago•2 comments

Why America Is Losing the War with Iran

https://chrishedges.substack.com/p/why-america-is-losing-the-war-with
5•chmaynard•15m ago•0 comments

I made a Chrome extension to export an entire Gemini chat

2•backrun•16m ago•0 comments

10 Years Later, I Reverse-Engineered iCloud's SyncToken by Brute Force

https://robhooper.xyz/blog-synctoken.html
2•rhoopr•17m ago•0 comments

Scalable quantum batteries can charge faster than their classical counterparts

https://phys.org/news/2026-03-scalable-quantum-batteries-faster-classical.html
1•Brajeshwar•17m ago•0 comments

Big Tech backs Anthropic in fight against Trump administration

https://www.bbc.com/news/articles/c4g7k7zdd0zo
3•jethronethro•19m ago•0 comments

Tunneling Nanotube

https://en.wikipedia.org/wiki/Tunneling_nanotube
1•rolph•20m ago•0 comments

The New York Times hated crossword puzzles before it embraced them

https://bigthink.com/pessimists-archive/new-york-times-hated-crossword-puzzles-wordle/
2•michaeld123•21m ago•1 comments

Live Coding with Caffeine

https://caffeine.js.org/talks/2018-08-25-demos-teaser/#/title
2•coliveira•22m ago•0 comments

I Don't Destroy Snowmen

https://writings.hongminhee.org/2026/01/ethics-of-small-actions/
4•foxfired•23m ago•1 comments

The First Telephone Call

https://theconversation.com/the-story-of-the-first-telephone-call-nine-words-that-changed-the-wor...
4•gmays•28m ago•0 comments

Grammarly Hit with Class-Action Suit over AI Identity Theft

https://www.techbuzz.ai/articles/grammarly-hit-with-class-action-suit-over-ai-identity-theft
2•twalichiewicz•29m ago•0 comments

Resume AI Analysis and Tailoring Portal

https://resume-elevator.com/
1•videsh•29m ago•0 comments

I Built a Reddit Alternative

https://exitapp.social
1•oligopoly_2•29m ago•1 comments

Optimizing for Decision Points

https://narphorium.com/blog/decision-points/
1•narphorium•32m ago•1 comments

BlackRock Launches $100M Skilled Trades Initiative

https://www.blackrock.com/corporate/newsroom/press-releases/article/corporate-one/press-releases/...
1•toomuchtodo•36m ago•0 comments

5 Games I Use to Teach English as an Alt

https://landenlove.com/five-games-i-use-to-teach-english/
1•LandenLove•36m ago•0 comments

Duckstation is ending Android support

https://www.androidauthority.com/duckstation-ends-android-support-3648430/
1•flykespice•37m ago•0 comments

Browserbase Founder Rejected by 500 Internships before founding $300M company [video]

https://www.youtube.com/watch?v=Eyuo1kG_APQ
4•dutilh•41m ago•0 comments

Apple releases iOS 15.8.7 to fix Coruna exploit for iPhone 6S from 2015

https://support.apple.com/en-us/126632
38•seam_carver•45m ago•9 comments