Grok Code Fast 1

90•Terretta•2h ago

Comments

cft•1h ago

it's free in Cursor till Sept 2. My experience is subpar so far

giancarlostoro•1h ago

Its focus seems to be on faster responses, which Grok 3 definitely is good at. I have a different approach to LLMs and coding, I want to understand their proposed solutions and not just paste garbled up code (unless its scaffolded) if you treat every LLM as a piecemeal thing when designing code (or really trying to figure out anything) and go step by step, you get better results from most models.

oulipo2•48m ago

It's bad, AND it's made by a nazi sympathizer

so it goes to the trashcan

tasty_freeze•23m ago

To the people downvoting this comment -- it isn't just that Musk made a couple of very sharp nazi salutes. You may say, oh, that was just an unfortunate similarity, he wasn't doing a nazi salute at all. But he has a history of boosting nazi posts on twitter. Oh, Musk posts so often he can't vet the source of all of his retweets. But if those are mistakes, the fact is he never makes a mistake in the other direction, which strongly suggests it wasn't an accident.

Eg, https://www.msn.com/en-us/news/world/musk-retweets-hitler-di...

johnfn•1h ago

Ah, so this is what the Sonic model that Cursor had was. I've been doing this personal bench where I ask each model to create a 3D render of a guy using a laptop on a desk. I haven't written up a post to show the different output from each model, yet, but it's been a fun way to test the capabilities. Opus was probably the best -- Sonic put the guy in the middle of the desk, and the laptop floating over his head. Sonic was very fast, though!

boole1854•1h ago

It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

eterm•1h ago

It depends how fast.

If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.

It simply enables a different method of interactive working.

Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.

Latency can have critical impact on not just user experience but the very way tools are used.

Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.

postalcoder•49m ago

Besides being a faster slot machine, to the extent that they're any good, a fast agentic LLM would be very nice to have for codebase analysis.

giancarlostoro•46m ago

> If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.

ffsm8•31m ago

I thought the current vibe was doing the former to produce the latter and then use the output as the task plan?

giancarlostoro•18m ago

I don't know what other people are doing, I mostly use LLMs:

* Scaffolding

* Ask it what's wrong with the code

* Ask it for improvements I could make

* Ask it what the code does (amazing for old code you've never seen)

* Ask it to provide architect level insights into best practices

One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.

Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.

34679•40m ago

>If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?

_kb•17m ago

You just need to scale out more. As you approach infinite monkeys, sorry - models, you'll surely get the result you need.

6r17•1h ago

Tbh I kind of disagree ; there are certain use-cases were legitimately speed would be much more interesting such as generating a massive amount of HTML. Tough I agree this makes it look like even more of a joke for anything serious.

They reduce the costs tough !

jsheard•1h ago

That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

esafak•59m ago

I agree. Coding faster than humans can review it is pointless. Between fast, good, and cheap, I'd prioritize good and cheap.

Fast is good for tool use and synthesizing the results.

peab•58m ago

depends for what.

For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.

If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.

With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR

jml78•58m ago

To a point. If gpt5 takes 3 minutes to output and qwen3 does it in 10 seconds and the agent can iterate 5 times to finish before gpt5, why do I care if gpt5 one shot it and qwen took 5 iterations

wahnfrieden•21m ago

It doesn’t though. Fast but dumb models don’t progressively get better with more iterations.

furyofantares•48m ago

Fast can buy you a little quality by getting more inference on the same task.

I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.

I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.

dotancohen•27m ago

  > I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds.

I'd love to hear how you have this set up.

mchusma•23m ago

This is a nice setup. I wonder how much it helps in practice? I suspect most of the problems opus has for me are more context related, and I’m not sure more models would help. Speculation on my part.

giancarlostoro•47m ago

I'm more curious if its based on Grok 3 or what, I used to get reasonable answers from Grok 3. If that's the case, the trick that works for Grok and basically any model out there is to ask for things in order and piecemeal, not all at once. Some models will be decent at the 'all at once' approach, but when me and others have asked it in steps it gave us much better output. I'm not yet sure how I feel about Grok 4, have not really been impressed by it.

londons_explore•34m ago

A a a a a a a a a a a a a a a.

At least this comment was written fast.

M4v3R•21m ago

Speed absolutely matters. Of course if the quality is trash then it doesn't matter, but a model that's on par with Claude Sonnet 4 AND very speedy would be an absolute game changer in agentic coding. Right now you craft a prompt, hit send and then wait, and wait, and then wait some more, and after some time (anywhere from 30 seconds to minutes later) the agent finishes its job.

It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.

It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.

defen•10m ago

> I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

We already know that in most software domains, fast (as in, getting it done faster) is better than 100% correct.

Workaccount2•1h ago

Is this the model that is the "Coding" version of Grok-4 promised when Grok-4 had awful coding benchmarks?

I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.

[1]https://www.reddit.com/user/Suspicious_Store_137/

bearjaws•1h ago

Yay more garbage code - faster

A hint to all AI companies, nobody wants quickly generated broken code.

echelon•49m ago

AI coding tools are amazing and if you don't use them, that's fine. But lots of people, myself included, are finding tremendous utility in these models.

I'm getting 30-50% larger code changes in per day now. Yesterday I plumbed six slightly mechanical, but still major changes through our schema, several microservice layers, API client libraries, and client code. I wrote down the change sites ahead of time to track progress: 54. All requiring individual business logic. This would have been tedious without tab complete.

And that's not the only thing I did yesterday.

I wouldn't trust these tools with non-developers, but in our hands they're an exoskeleton. I like them like I like my vim movements.

A similar analogy can be made for the AI graphics design and editing models. They're extremely good time saving tools, but they still require a human that knows what they're doing to pilot them.

esafak•1h ago

"On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness."

Let's see this harness, then, because third party reports rate it at 57.6%

https://www.vals.ai/models/grok_grok-code-fast-1

RedMist•55m ago

I've been testing Grok for a few days, and it feels like a major step backward. It randomly deleted some of my code - something I haven't had happen in a long time.

While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.

mwigdahl•53m ago

Full Self Coding?

RedMist•39m ago

No, making edits to an exiting codebase.

(If that's what you meant)

pdabbadabba•29m ago

I think that was just a joke about "Full Self Driving" -- and how it still doesn't work.

hu3•49m ago

Interesting. Available in VSCode Copilot for free.

https://i.imgur.com/qgBq6Vo.png

I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.

threeducks•22m ago

I have been testing it since yesterday in VS Code and it seemed fine so far. But I am also happy with all the GPT-4 variants, so YMMV.

cendyne•46m ago

My experience with 'sonic' during the stealth phase had it do stuff plenty fast, but the quality was slightly off target for some things. It did create tests and then iterate on those tests. The tests it wrote don't actually verify intended behavior. It only verified that mocks were called with the intended inputs while missing the larger picture of how it is used.

Demiurge•46m ago

I've actually seem really good outputs from the regular Grok 4. The issue seemed to be that it didn't explain anything and just made some changes, which like, I said, were pretty good. I never wanted a faster version, I just wanted a bit more feedback and explanations for suggested changes.

I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.

cft•42m ago

I have the same experience, except while I agree that GPT-5 is better than Sonnet 4 for architecture and deep thinking, Sonnet 4 still seems to be better for just banging out code when you have a well-defined and a very detailed plan.

Incipient•26m ago

I noticed it pop up on copilot so gave it about two attempts. Neither were fast, and both were incredibly average. Gpt4.1 and 5-mini do a better job, and 5-mini was faster...but I find speed of response varies hugely and seemingly randomly throughout the day.

disposition2•19m ago

This will probably be a unpopular, wet blanket opinion...

But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.

Looks like they are bringing even more natural gas turbines online...great!

https://netswire.usatoday.com/story/money/business/developme...

d0gsg0w00f•14m ago

Where does OpenAI and Anthropic get their water?

onlyrealcuzzo•14m ago

Why can't it suck up water right from the Mississippi and do Once-Through cooling? Isn't it close? There's definitely more than enough water

mchusma•16m ago

Fast is cool! Totally has its place. But I use Claude code in a way right now where it’s not a huge issue and quality matters more.

Opus 4.1 is by far the best right now for most tasks. It’s the first model I think will almost always pump out “good code”. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.

For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldn’t pay that much for just sonnet).

ceroxylon•1m ago

According to the model card it is extremely fast, can be hijacked 25% of the time, has access to search tools, and has a propensity for dishonesty.

I also think it is optimistic to think the jailbreak percentage will stay at "0.00" after public use, but time will tell.

https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf

Writing Mac and iOS Apps Shouldn't Be So Difficult

PSA: Elegoo Centauri Carbon and GPL Compliance

Algorithms for Modern Hardware

Why AI Isn't Ready to Be a Real Coder

Digital Markets Act: Real Time Context Portability (Google, Meta, TikTok etc.)

The Myths of Chinese Exceptionalism

The Executive Director Steven Deobald is leaving the GNOME Foundation this week

Seedbox Lite: A lightweight torrent streaming app with instant playback

Suspicious journals flagged by AI screening tool

Show HN: Design System by Government of India

Ex-Mar-a-Lago employee granted broad AI search patent (US12,277,125) – thoughts? [pdf]

Dental care is increasingly under threat in the U.S.

Quickbooks Online Payroll Issue

Show HN: Pactiq – AI catalog manager powered by CAIT AI (ex-IBM Watson)

State of AI in Business [pdf]

Show HN: Morocco-based travel insurance for students abroad

Show HN: Earth Zoom in AI- Space to Ground Videos

How Apple AirPods Work [video]

RefreshOS 2.5: The Debian remix that borrows from every desk in the house

The Rabbit Hole of Health Care Admin Costs

Ask HN: Joining a khronos working group as an outsider

Ask HN: Theory That Economic Growth Stagnates in a Civilization

The Evidence That AI Is Destroying Jobs for Young People Just Got Even Stronger

Show HN: Security Test Framework – 16 automated checks and reports

1970: Could Machines Become Intelligent? – Horizon – Past Predictions – BBC Arch [video]

Finland announces opposition to Chat Control

Make Government Code Publicly Available

Privacy-first temporary email service

Show HN: CoinScore – lightweight crypto scoring tool (MVP)

Keyboard Is Holding Your AI Back