Grok 4 Fast now has 2M context window

https://docs.x.ai/docs/models

40•hereme888•2h ago

Comments

changoplatanero•1h ago

Anyone can make a long context window. The key is if your model can make effective use of it or not.

bigyabai•1h ago

Long context window = huge amounts of vacant VRAM = our servers are fucking empty

trash_cat•1h ago

But isn't context window dependent on model architecture and not available VRAM that you can just increase or decrease as you like?

reasonableklout•1h ago

Most attention implementations can work across an arbitrarily long context.

The limiting factors are typically: 1. Often there are latency/throughput requirements for model serving which become challenging to fulfill at a certain context length. 2. The model has to be _trained_ to use the desired context length, and training becomes prohibitively expensive at larger contexts.

(2) is even a big enough problem that some popular open source models that claim to support large context lengths in fact are trained on smaller ones and use "context length extension" hacks like YaRN to trick the model into working on longer contexts at inference time.

chucknthem•1h ago

How do they make the context window longer? (serious question, I want to learn how this works)

TheCoolGuy•1h ago

You literally just shift the window over by to the next token once you reach the max amount of tokens you want for context window, NOT with what you train on, (only limited with memory now)

This has obvious issues since you're now losing information from the now unseen tokens which becomes significant if your context window is small in comparision of the answer/question you're looking at. That's why companies try to give stupidly large context windows. The problem is they're not training on the large context window, they're training on something smaller (2048 and above). Due to how attention is setup, you can train on a small amount of context and extrapolate it to any number of tokens possible since they train via ROPE which trains the model because on words and their offset to the neighboring words. This allows us to effectively x2,x3,x10,x100 the amount of tokens we generate vs train with with some form consistency BUT still cause a lot of issues consistency wise since the model approaches more of a "this was trained on snippets but not the entire thing" situation where it has a notion of the context but not fundamentally the entire combined context

vlovich123•54m ago

That’s a very basic way to keep the LLM inferring past the context window size (there’s better, smarter ways) but that’s not at all what the question was which is how they train a 2M token length window. My understanding at a basic level is that you need corpuses that are >2M in length for training data which is where the problem comes in for - there’s only so much long form content and it’s swamped by all the smaller stuff. I think there’s probably tricks now but I suspect it’s still largely an open problem.

nbardy•1h ago

No they can't, it's a N^2 algorithm, just fitting it in the context window is a challenge.

And sure maybe not 2mil of it is usable, but they're reliably pushing the frontier here.

ggeorgovassilis•1h ago

I came here just to complain about that :-) All LLMs I used seem to give more weight to things at the beginning of the context window and omit many details. Eg. I tried this simple thing: pasted a friend's and my CV into Gemini and asked it to recommend topics for a joint conference presentation. Results depended greatly on the order of CVs pasted in.

behnamoh•1h ago

Who here actually uses Grok? It's sad to see Elon's arc but when he doubled down on some of his political ideas he had it coming with the Tesla sales going down and x.ai not taken seriously.

I've always tried to remain apolitical and unbiased but it's hard to overlook who's behind a technology you wanna buy. Not that sama and others are saints either, it's just Elon's very obvious and vocal about it.

It's a shame, really, because Grok is a good model. But Elon promised to open source the previous model and it took them forever to do that with Grok 3. Sorry, but I wanna buy from someone who keeps their promises ("FSD by next year").

jacquesm•1h ago

Bluntly: you couldn't pay me to use it.

YetAnotherNick•1h ago

Grok fast is by far the most used model in openrouter with more than a trillion tokens weekly[1].

[1]: https://openrouter.ai/rankings

behnamoh•1h ago

Because some tools (AFAIR Kilo Code but I might be wrong) gave it away for free. The model itself was (still is?) free for a while, so I'm not surprised.

rjdj377dhabsn•1h ago

For at least the last year, I've been using Grok for 90% of my queries. I pay for their $30 plan as well as $20 for Claude Code, which I only use for simple development projects. For anything more complicated, Grok's expert mode has consistently better results.

Sherveen•1h ago

You're 110% doing something (or many things) wrong.

dynjo•1h ago

Epic gaslighting effort.

weird-eye-issue•1h ago

> I've always tried to remain apolitical and unbiased

Clearly

kelsolaar•1h ago

As you point out, Sam Altman is not exactly an altar boy: https://fastcompany.co.za/business/2025-11-07-sam-altmans-tr...

andai•54m ago

Thought this would be about the whistleblower. They didn't even mention it!

supriyo-biswas•1h ago

I've been occasionally using Grok and found it good for devops stuff; specifically it often is able to explain and produce working configurations without getting lost or introducing subtle mistakes as I've sometimes seen with other models.

sipsi•1h ago

i didn't

galaxy_gas•1h ago

I have try it a few times in Copilot as code fast 1 because it was advertised. It has never correctly done something so far. Maybe because it's the fast ver ?

mudkipdev•1h ago

I don't but only because the model is not satisfying, not because I dislike Tesla

raincole•1h ago

In my experience Grok Fast is the best "cheaper" model out there. Far better than Haiku 4.5 and Gemini Flash. I don't think the other cheaper models should be treated seriously at this point.

behnamoh•1h ago

Gemini Flash is the first model I disable in any tool I use. It's a joke, and to add salt to injury, google announced a "lite" version of that as well!

RobKohr•1h ago

I like grok for noncoding stuff. I find it hasn't been tuned for "Safety" (meaning it isn't tuned much for political correctness). It also seems good at making images and stories up well. I run some choose your own adventures stories with my kids through it. We tell it who each of their characters are and what the theme is for the night and grok gives them each a section of story and 4 choices. They also have the option of choosing something different then suggested. We have it so it cycles around the turns for everyone. Works pretty well, and if the kids wanna go dark (preteen boy) grok doesn't mind the violence.

Kinda reminds me of the video game from enders game.

vlovich123•57m ago

> meaning it isn't tuned much for political correctness

Is being tuned for right wing viewpoints the same as not being tuned for political correctness? Because there is tuning happening to a specific viewpoint:

https://gizmodo.com/elon-says-hes-working-to-fix-grok-after-...

LorenDB•1h ago

I do! I have felt bad vibes from OpenAI for a while now, and eventually defaulted to Grok as somewhat the lesser of many evils. I respect anybody who doesn't wish to use it, but it's good enough for what I need it for. Case in point: it just spit out valid OpenSCAD code for an adapter piece I want to 3D print.

dmead•58m ago

Calling mecahitler the least bad option is absolutely wild.

LorenDB•55m ago

I feel compelled to point out that the Mechahitler thing was prompted by bad actors hiding invisible tokens in tweets, but sure, it's maybe an unpopular opinion.

Basically, the major free options out there for LLMs are OpenAI, Google, Perplexity, DeepSeek, Meta, and Grok. (I could be missing stuff here, but those are the main players.) DeepSeek is out because of China ties. OpenAI and Perplexity have CEOs that seem incredibly shifty to me. I refuse to give Meta and Google any more info than I have to, so I'm avoiding them. Hence we fall back to Grok. Again, maybe not a completely logical progression, but it's my choice and I get to live with the consequences :)

dmead•49m ago

The best ones are out for... Reasons? This seems completely bad faith and honestly really Elon musk fanboyish.

Literally none of this options you listed are that objectionable.

Do what the rest of us do and switch frequently. Don't use mekafurhur and you'll be fine.

ronsor•26m ago

I find it funny that people are still calling Grok "mechahitler" as if that weren't prompted by trolls and the AI model is going to set up concentration camps on every block.

minimaxir•59m ago

Going off OpenRouter's rankings (https://openrouter.ai/rankings), Grok Code Fast 1 is the most used model by a significant margin, and since those metrics are calculated as of this week, that's after providers stopped giving free promotional access to it. Grok 4 Fast is #5 on that list which was never free.

In terms of models, Grok 4 Fast has essentially zero restrictions on safety, which a) makes it unusable for most applications that allow user input and b) makes it extremely useful for certain applications.

Void_•55m ago

Half of USA voted for Trump. That should answer “who actually uses Grok”.

I personally use the best tool for the job, which Grok sometimes is.

mehdibl•1h ago

What matter is not context or the recod token/s you get.

But the quality for the model. And it seem Grok pushing the wrong metrics again, after launching fast.

saretup•1h ago

Seems reductive. Some applications require higher context length or fast tokens/s. Consider it a multidimensional Pareto frontier you can optimize for.

jeswin•1h ago

Depends. For coding at least, you can divide tasks into high-intelligence ($$$) and low-intelligence ($) tasks. Being able to do low-intelligence tasks super fast and cheap would be quite beneficial. A majority of code edits would fall into the fast-and-cheap subset.

cactusplant7374•1h ago

I had a failed refactor with Codex recently and I am wondering if context window size is the cause.

sgc•1h ago

I not an expert ai user (and have never touched Codex), but anything remotely important I do, I force the smallest context window possible. I just did something very beautiful using that principle, which will soon be ready to show the world. It would have been a garbled pile of garbage with long context windows.

Obviously major architectural changes need a bigger context window. But try to aggressively modularize your tasks as much as you can, and where possible run batch jobs to keep your workflow moving while each task stays a smaller chunk.

jakevoytko•1h ago

With the current crop of LLMs/agents, I find that refactors still have to be done at a granular level. "I want to make X change. Give me the plan and do not implement it yet. Do the first thing. Do the second thing. Now update the first call site to use the new pattern. You did it wrong and I fixed it in an editor; update the second call site to match the final implementation in $file. Now do the next one. Do the next one. Continue. Continue.", etc.

johnnyApplePRNG•1h ago

But for some reason if I load a 400kb file into it... it can't even read the file?! Pffft, whatever elon. Go play with your rockets.

straydusk•58m ago

Grok, no matter how good the technology, is just tainted by Elon. It's sad.

raincole•50m ago

It's funny how fast this post is flagged, lol. Have other LLMs or blunt ads got the same treatment on HN?

hereme888•36m ago

It's probably because lots of people here resent their difference in personal ideology with Elon Musk.

jauntywundrkind•16m ago

I'd bet 200% the opposite. That the forces & believers of Musk all 400% know that any discussion of Musk is going to look awful for him.

People who flag don't do it because they don't want to dig in. They are almost universally a force for suppression & ignorance, the billionaire imperialist fatcat friend who is desperate to minimize the public eye.

ronsor•21m ago

This post really has no reason to be flagged. I know Elon is controversial, and I have a lot of gripes with his business practices myself, but this is literally just documentation for a frontier LLM. Can we stay on topic?

Why tech giants are offering premium AI tools to Indians for free

Doublespeed, Dead Internet as a Service

I Am Mark Zuckerberg

DOS's Last Stand on a modern ThinkPad: X13 Gen 1 with Intel i5-10310U (2024)

Tanzania has its Tiananmen moment

Show HN: Zenith – Una app en Flask de nuestro proyecto para monitorear servicios

Physicists use a single molecule as a particle collider

Peter Thiel: Capitalism Isn't Working for Young People

Show HN: Tech Job Notify

Dynamic Self Stabalizing Loops

Kosong: Kimi AI's Agent SDK

Beyond Standard LLMs

Hedy Lamarr: The Unseen Genius Behind Wi-Fi and Bluetooth

Doo: Programming Language Ideas and Suggestions?

Genetically Engineered Babies Are Banned. Tech Titans Are Trying to Make One

Sometimes Red, Sometimes Blue

The competing values framework [pdf]

Under 40s Declining Memory

Salman Rushdie's Literary Inspirations

Forth – is it still relevant?

Building a CI/CD Pipeline Runner from Scratch in Python

Show HN: PyNIFE. 400-900× speedup for embedding-based retrieval pipelines

Runc breaks pods when CPU requests aren't multiples of 10

Boat crews accused of being narco-terrorists have a more nuanced truth

Claude in Xcode

LightGBM Explained

Study of 3 Million Finnish Adults Finds Non-Voters Tend to Die Earlier

Show HN: Coderive – Programming Language built on a phone

Baby Shoggoth Is Listening

The Big Sleep differences between 1945/1946 versions video comparison

Why tech giants are offering premium AI tools to Indians for free

Doublespeed, Dead Internet as a Service

I Am Mark Zuckerberg

DOS's Last Stand on a modern ThinkPad: X13 Gen 1 with Intel i5-10310U (2024)

Tanzania has its Tiananmen moment

Show HN: Zenith – Una app en Flask de nuestro proyecto para monitorear servicios

Physicists use a single molecule as a particle collider

Peter Thiel: Capitalism Isn't Working for Young People

Show HN: Tech Job Notify

Dynamic Self Stabalizing Loops

Kosong: Kimi AI's Agent SDK

Beyond Standard LLMs

Hedy Lamarr: The Unseen Genius Behind Wi-Fi and Bluetooth

Doo: Programming Language Ideas and Suggestions?

Genetically Engineered Babies Are Banned. Tech Titans Are Trying to Make One

Sometimes Red, Sometimes Blue

The competing values framework [pdf]

Under 40s Declining Memory

Salman Rushdie's Literary Inspirations

Forth – is it still relevant?

Building a CI/CD Pipeline Runner from Scratch in Python

Show HN: PyNIFE. 400-900× speedup for embedding-based retrieval pipelines

Runc breaks pods when CPU requests aren't multiples of 10

Boat crews accused of being narco-terrorists have a more nuanced truth

Claude in Xcode

LightGBM Explained

Study of 3 Million Finnish Adults Finds Non-Voters Tend to Die Earlier

Show HN: Coderive – Programming Language built on a phone

Baby Shoggoth Is Listening

The Big Sleep differences between 1945/1946 versions video comparison

Grok 4 Fast now has 2M context window

Comments