I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.
It simply enables a different method of interactive working.
Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.
Latency can have critical impact on not just user experience but the very way tools are used.
Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.
Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.
* Scaffolding
* Ask it what's wrong with the code
* Ask it for improvements I could make
* Ask it what the code does (amazing for old code you've never seen)
* Ask it to provide architect level insights into best practices
One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.
Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.
not if you have too much! a few hundred thousand lines of code and you can't ask shit!
plus, you just handed over your company's entire IP to whoever hosts your model
The IP risks taken may be well worth of productiviry boosts.
I think the biggest thing for offline LLMs will have to be consistency for having them search the web with an API like Google's or some other search engines API, maybe Kagi could provide an API for people who self-host LLMs (not necessarily for free, but it would still be useful).
That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.
Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.
But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.
Now you can plausibility check the diff and are likely done
But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.
Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?
this site is the fucking worst
Not sure who was taking SamA seriously about that; personally I think he's a ridiculous blowhard, and statements like that just reinforce that view for me.
Please don't make generalizations about HN's visitors'/commenters' attitudes on things. They're never generally correct.
They reduce the costs tough !
There's nothing wrong with doing it, but it's entirely unrelated to performance.
But even if your interpretation is correct, frequency of releases still is not a good metric. That could just mean that you have a lot to fix, and/or you keep breaking and fixing things along the way.
Of course, 95% of them are fixing things they broke in earlier commits and their overall quality is the worst on the team. But, holy cow, they can output crap faster than anyone I’ve seen.
But sure, ok, maybe it could mean making much faster progress than competitors. But then again, it could also mean that competitors have a much more mature platform, and you're only releasing new things so often because you're playing catch-up.
(And note that I'm not specifically talking about LLMs here. This metric is useless for pretty much any kind of app or service.)
Fast is good for tool use and synthesizing the results.
For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.
If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.
With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR
Different models for different things.
Not everyone is solving complicated things every time they hit cmd-k in Cursor or use autocomplete, and they can easily switch to a different model when working harder stuff out via longer form chat.
Often all it takes is to reset to a checkpoint or undo and adjust the prompt a bit with additional context and even dumber models can get things right.
I've used grok code fast plenty this week alongside gpt 5 when I need to pull out the big guns and it's refreshing using a fast model for smaller changes or for tasks that are tedious but repetitive during things like refactoring.
Do you use them successfully in cases where you just had to re-run them 5 times to get a good answer, and was that a better experience than going straight to GPT 5?
I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.
I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.
> I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds.
I'd love to hear how you have this set up.It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.
It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.
We already know that in most software domains, fast (as in, getting it done faster) is better than 100% correct.
So the total difference includes the cost of context switching, which is big.
Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.
I guess if you cannot do well in benchmarks, instead pick an easier to pump up one and run with that - speed. Looking online for benchmarks the first thing that came up was a reddit post from an (obvious) spam account[1] gloating about how amazing it was on a bunch of subs.
Let's see this harness, then, because third party reports rate it at 57.6%
This doesn't just cause confusion, it's also hard to sort. To confirm my suspicion of sloppy coding, I tried to sort the date column and to my surprise I got this madness:
1/31/2025
2/29/2024
2/29/2024
4/28/2024
3/27/2024
9/27/2023
Which is sorting by the day column -- the bit in the middle -- instead of the year!That's just... special.
[1] I hear some incredibly backwards places like Liberia that also haven't adopted metric insist on using it into the present day, but the rest of the civilised world has moved on.
Just look at this map: https://en.m.wikipedia.org/wiki/List_of_date_formats_by_coun...
You’re almost entirely alone in these backwards practices!
Well, not entirely alone, you also have Liberia following your “standards”! There’s two of you! Must be nice.
PS: If Trump actually wanted to make US exports competitive on the world market, step one would be to adopt world standards like metric.
1. That Mickael Jackson song
2. The time that a US president asked the president of Liberia "where he learned English" because he spoke English so well
And now I'll add to my list a third item:
3. Being one of an elite set of countries to use freedom units
From Wikipedia:
> Liberia began in the early 19th century as a project of the American Colonization Society, which believed that black people would face better chances for freedom and prosperity in Africa than in the United States. Between 1822 and the outbreak of the American Civil War in 1861, more than 15,000 freed and free-born African Americans, along with 3,198 Afro-Caribbeans, relocated to Liberia. Gradually developing an Americo-Liberian identity, the settlers carried their culture and tradition with them while colonizing the indigenous population. Led by the Americo-Liberians, Liberia declared independence on July 26, 1847, which the U.S. did not recognize until February 5, 1862.
So it makes sense they would be using freedom units and freedom ways of writing dates; it's in the name.
Maybe the US isn't as backwards as you might believe, or maybe Airbus is a backwards company for using feet and knots? Perhaps different measurement systems have their virtues (give me an exact integer representation of 1/3 of a meter. For a foot it is 4 inches. For a yard it is 1 foot or 12 inches.)
For the record, the US made the metric system the preferred system of measurement 50 years ago. So you are also uninformed in your attempted insult about US exports (1975, Metric Conversion Act). Americans also learn about the metric system in school, and are more than capable of using it when it matters (the American weapons that Europe and Ukraine seem so fond of use the metric system).
I don't live in the US, but I have lived there in the past, and making sweeping insults about 400 million people is something I learned not to do.
I'm not sure why you're particularly picking on MM/DD/YYYY, saying things like "backwards places". DD/MM/YYYY doesn't sort any better. YYYY-MM-DD is the only one that sorts well. (Some people promote YYYYY-MM-DD though, which I guess is more future proof.)
> Some people promote YYYYY-MM-DD though, which I guess is more future proof
It’s the only unambiguous, sortable, sane format and the use of anything else should be deprecated on the web.
Those criticism apply to both MM/DD/YYYY and DD/MM/YYYY. (MM/DD/YY and DD/MM/YY are even worse.)
>> Some people promote YYYYY-MM-DD though, which I guess is more future proof
>It’s the only unambiguous, sortable, sane format and the use of anything else should be deprecated on the web.
Are you talking about YYYYY-MM-DD or YYYY-MM-DD? They're both unambiguous and sortable. (Not sortable with the other one though.)
While the top coding models have become much more trustworthy lately, Grok isn't there yet. It doesn't matter if it's fast and/or free; if you can't trust a tool with your code, you can't use it.
(If that's what you meant)
I think this is a very good description of where autonomous vehicles are right now.
- Boston Dynamics' Atlas does not move as gracefully as a human
- LLM writing and code is oh-so-easy to spot
- the output of diffusion models is indistinguishable from a photo... until you look at it for longer than 5 seconds and decide to zoom in because "something's wrong"
- motion in AI-generated videos is very uncanny
Maybe it's because we get use to it and therefore recognize it easier, but it does seem to get more and more recognizable instead of the opposite, doesn't it?
I think I could recognize a ChatGPT email way easier in 2025 than if you showed me the same email written by gpt-3.5.
There are no proper retention laws with car manufacturers and self-driving development companies that I know of.
[0] https://arstechnica.com/cars/2025/08/how-a-hacker-helped-win...
Meantime internet randos say this was “a promise”.
Not to mention that accidents happen, not everyone always has the good habit of using version control for every change in every project, and depending on the source control software and the environment you work in, it may not even be possible to preserve a pending change (not every project uses git).
I have heard real stories of software bugs causing uncommitted changes to be deleted, or causing an entire hobby project to be wiped from disk when it has not been pushed to remote repositories yet. They are good software engineers, but they are not super careful, and they trust other people's code too much.
There’s huge difference between different languages. With TS web development always working the best.
https://i.imgur.com/qgBq6Vo.png
I'm going to test it. My bottleneck currently is waiting for agent to scan/think/apply changes.
I recently found it much more valuable, and why I am now preferring GPT-5 over Sonnet 4, is that if I start asking it to give me different architectural choices, its really quite good at summarizing trade-offs and and offering step-by-step navigation towards problem solving. I am liking this process a lot more than trying to "one shot" or getting tons of code completely rewritten, thats unrelated to what I am really asking for. This seems to be a really bad problem with Opus 4.1 Thinking or even Sonnet Thinking. I don't think it's accurate, to rate models on "one-shoting" a problem. Rate it on, how easy it is to work with, as an assistant.
1. A lot of people are interesting in maintaining AI hype.
2. People work differently.
it was completely unsterable. I get why people are often upset by "you're right" of Claude models, but that's what I usually want from model.
I guess there is different in expectations depending on experience level of developer, but I want to have final saying what is the right way
But anytime I hear of Grok or xAI, the only thing I can think about is how it's hoovering up water from the Memphis municipal water supply and running natural gas turbines to power all for a chat bot.
Looks like they are bringing even more natural gas turbines online...great!
https://netswire.usatoday.com/story/money/business/developme...
They started operating the turbines without permits and they were not equipped with the pollution controls normally required under federal rules. Worse, they are in an area that already led the state in people having to get emergency treatment for breathing problems. In their first 11 months they became one of the largest polluters in an area already noted for high pollution.
They have since got a permit, and said that pollution controls will be added, but some outside monitors have found evidence that they are running more turbines than the permit allows.
Oh, and of course 90% of the people bearing the brunt of all this local pollution are poor and Black.
- https://www.scientificamerican.com/article/the-health-risks-...
- https://www.cbc.ca/news/science/gas-stoves-air-pollution-1.6...
There are a couple of ways to limit this. One is to avoid having nitrogen in whatever gas you use to provide oxygen. E.g., use pure oxygen, or use atmospheric air with the nitrogen removed. There is research and testing on this, but I don't think there is much commercialization yet.
Another is to use turbines designed to operate at lower temperature so that they don't reach the temperature where nitrogen and oxygen start forming nitrogen oxides. These are widely available. They are more expensive upfront, can be more finicky to operate, may require higher quality fuel, and may have more partial combustion which can lead to more partial combustion products like formaldehyde. However they can be more efficient which can lower operating costs.
A lot of it then comes down to regulatory costs. It may be cheaper to use a normal turbine with some add on to deal with NOx or it may be cheaper to use a low NOx turbine. That of course assume you even have to care about NOx. If you don't then the normal turbine is probably cheaper.
Something like 80-90% of gas turbine power plants in the US do use the low NOx turbines. However, rented gas turbines are mostly the normal ones. That's because they are easier to operate, require minimal maintenance, and are often more rugged, which are all good things for a rental. The turbines at the xAi Memphis datacenter are rentals. I believe they are intended to be temporary while the grid is improved to provide more power.
Opus 4.1 is by far the best right now for most tasks. It’s the first model I think will almost always pump out “good code”. I do always plan first as a separate step, and I always ask it for plans or alternatives first and always remind it to keep things simple and follow existing code patterns. Sometimes I just ask it to double check before I look at it and it makes good tweaks. This works pretty well for me.
For me, I found Sonnet 3.5 to be a clear step up in coding, I thought 3.7 was worse, 2.5 pro equivalent, and 4 sonnet equal maybe tiny better than 3.5. Opus 4.1 is the first one to me that feels like a solid step up over sonnet 3.5. This of course required me to jump to Claude code max plan, but first model to be worth that (wouldn’t pay that much for just sonnet).
I also think it is optimistic to think the jailbreak percentage will stay at "0.00" after public use, but time will tell.
https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf
Things I noted:
- It's fast. I tested it in EU tz, so ymmv
- It does agentic in an interesting way. Instead of editing a file whole or in many places, it does many small passes.
- Had a feature take ~110k tokens (parsing html w/ bs4). Still finished the task. Didn't notice any problems at high context.
- When things didn't work first try, it created a new file to test, did all the mocking / testing there, and then once it worked edited the main module file. Nice. GPT5-mini would often times edit working files, and then get confused and fail the task.
All in all, not bad. At the price point it's at, I could see it as a daily driver. Even agentic stuff w/ opus + gpt5 high as planners and this thing as an implementer. It's fast enough that it might be worth setting it up in parallel and basically replicate pass@x from research.
IMO it's good to have options at every level. Having many providers fight for the market is good, it keeps them on their toes, and brings prices down. GPT5-mini is at 2$/MTok, this is at 1.5$/MTok. This is basically "free", in the great scheme of things. I ndon't get the negativity.
Grok is owned by Elon Musk. Anything positive that is even tangentially related to him will be treated negatively by certain people here. Additionally, it is an AI coding tool which is seen as a threat to some people’s livelihoods here. It’s a double whammy, so I’m not surprised by the reaction to it at all.
See also the Microsoft threads on HN where everyone threatens to switch to Linux, and by reading them you'd think Linux is finally about to have its infamous glory year on the desktop.
ive seen some that change it for copy and paste but i don’t think it works for cmd-left right up down. or option those.
I use grok a lot on the web interface (grok.com) and never had any weird incidents. It's a run-of-the-mill SOTA model with good web search and less safety training
OpenRouter claims Cerebras is providing at least 2000 tokens per second, which would be around 10x as fast, and the feedback I'm seeing from independent benchmarks indicates that Qwen3-Coder-480B is a better model.
If somebody from Cerebras is reading this, are you having capacity issues?
Maybe you'd find consolation in using Apple or Nvidia-designed hardware for inference on these Chinese models? Sure, the hardware you own was also built by your "nation's largest geopolitical adversary" but that hasn't seemed to bother you much.
Having Qwen3 Coder's A3B available for chat oriented coding conversations is indeed amazing for what it is and for being local and free but I also struggled to get useful agentic tools to reliably work with it (a fair number of tool calls fail or start looping, even with correct and advised settings, and tried using Cline, Roo, Continue and their own Qwen Code CLI). Even when I do get it to work for a few tasks in a row I don't have the hardware to run at comparable speed or manage the massive context sizes as a hosted frontier model. And buying capable enough hardware costs about as much as many years of paying for top tier hosted models.
Yes, the censorship for some topics currently doesn't appear to be any good, but it does exist, will absolutely get better (both harder to subvert and more subtle), and makes the models less trustworthy than those from countries (US, EU, Sweden, whatever) that don't have that same level of state control. (note that I'm not claiming that there's no state control or picking any specific other country)
That's the downside to the user. To loop that back to your question, the upside to China is soft power (the same kind that the US has been flushing away recently). It's pretty similar to TikTok - if you have an extremely popular thing that people spend hours a day on and start to filter their life through, and you can influence it, that's a huge amount of power - even if you don't make any money off of it.
Now, to be fair to the context of your question, there isn't nearly as much soft power you can get from a model that people use primarily for coding - that I'm less concerned about.
[1] https://www.tomsguide.com/ai/i-just-outsmarted-deepseeks-cen...
However, American models (just like Chinese models) are heavily censored according to the society. ChatGPT, Claude, Gemini, are all aggressively censored to meet western expectation.
So in essence, Chinese models should be less censored than western models for western topics.
I can't believe Americans all are falling for propaganda like this. So Russia is all fine now huh. You know the country you literally had nuclear warheads pointed at for decades and decades and decades on end.
There’s no comparison. China is a far greater threat to the West than Russia.
or is for you being able to threat a threat already? If so, why did American companies invest for decades into China so eagerly with US government support?
How does Russia threaten the United States? They can’t even take over Ukraine.
By supporting China and pointing nuclear warheads at the US?
If China would decide to sell US treasuries, it will be more devastating to the US economy than effect of 10 nuclear strikes.
They would be incinerating their own foreign exchange reserves just to cause a spike in US interest rates and/or inflation.
Russia’s behavior, exemplified by the 2014 annexation of Crimea and the 2022 invasion of Ukraine, reflects an aggressive posture driven by a desire to counter NATO’s eastward expansion and maintain regional dominance. However, its economic challenges sanctions, energy export dependence, and a GDP of approximately $2.1 trillion in 2023 (World Bank) constrain its global reach, rendering it a struggling, though resilient, power. With the world’s largest nuclear arsenal, Russia’s restraint in nuclear use stems from a pragmatic focus on national survival. Its actions prioritize geopolitical relevance over a quixotic pursuit of Soviet-era glory, but its declining economic and demographic strength limits its capacity to challenge the United States on a global scale.
In contrast, China’s non-use of nuclear weapons aligns with its cultural and strategic emphasis on economic expansion over territorial conquest. Through initiatives like the Belt and Road Initiative, which has invested over $1.2 trillion globally since 2013, China has built a network of economic influence. Its military modernization, backed by a $292 billion defense budget in 2023 (SIPRI) and a nuclear arsenal projected to reach 1,000 warheads by 2030, complements this economic dominance. While China’s “no first use” nuclear policy, established in 1964, reflects a commitment to strategic stability, its assertive actions such as militarizing the South China Sea and pressuring Taiwan signal a willingness to use force to secure economic and territorial interests. Unlike Russia’s regionally focused aggression, China’s global economic leverage, technological advancements, and growing military capabilities pose a more systemic challenge to U.S. primacy, particularly in critical domains like trade, technology, and Indo-Pacific influence.
You claimed that it was a fact that selling some bonds would be more devastating than 10 actual nuclear strikes.
We are talking about the effect of the strikes not about their likelihood. You completely changed the subject.
Japan owns about 3.1% of the US debt as comparison.
It wouldn’t be that great for China either..
Multiple domestic providers are actively helping dismantle US-based science, research, public health, emergency response, democratic elections, etc.
This is an offering being produced by a company whose idea of responsible AI use involves prompting a chatbot that “You spend a lot of time on 4chan, watching InfoWars videos” - https://www.404media.co/grok-exposes-underlying-prompts-for-...
A lot of people rightly don’t want any such thing anywhere near their code.
I'm not going to engage into that... I don't see what the US has to do with this, I'm from Europe.
Out of all his brands, though, X and particularly XAI (and so Grok) have been particularly influenced by – indeed he seems to see them as vehicles for – his personal political opinions and reckless ethics.
This poor behavior, if rewarded, will surely be repeated in other countries and nobody wants that, either.
The location of the Colossus datacenter is well known. It happens to be located in an industrial area, nestled between an active steel manufacturing plant (apparently scrap metal with an electric blast furnace, which should mean enormous power draw but no coke coal at least?), and an active industrial scale natural gas power plant.
https://www.google.com/maps/@35.0605698,-90.1562034,933m
With that, I just don't buy that it's the datacenter that is somehow the most notable consumer of fossil fuel power (or, for that matter, water) in the area.
It is forgivable because there is no real understanding in an llm.
And other llm can also be prompted to say ridiculous things, so what? If a llm would accept a name of a Viking or Khan of the steppes it doesn’t mean it wants to rape and pillage.
Your suggestion that an oversight like this is reason enough to not use the model?
I don’t get the big problem over here. The model said some unsavoury things and the problem was admitted and fixed - why is this making people lose their minds? It has to be performative because I can’t explain it in any other way.
Elon never outsmarted the federal admin, and he can't convince anyone that he was too retarded to understand the consequences. He's the most embarrassing type of failure, now - a midwit, the man with no plan who went for the king and missed. He be bet it all on black, and struck out hard. He didn't even manage the shoo-in proof for Trump being a pedophile. Now bipartisan politics will resent him forever, and ensure he and his businesses would rather be dead. All because Big Balls told Mr. Silly he could make a killing in politics, what a touching little sob story.
I say this as a Starlink early adopter, general Elon apologist and space buff for life: if you actually think this is an insincere reaction, try copying any of Elon's mannerisms around normal people and watch how they treat you. You'll be a social pariah come Monday.
From the outside, the Grok mechahitler incident appeared very much to be the embodiment of Musk’s top-down ‘free speech absolutist’ drive to strip ‘political correctness’ shackles from grok; the prompting changes were driven by his setting that direction. The issues became apparent very early that the prompt changes were leading to issues but reversion seemed to be something that X had to be pressured into - they were unwilling to treat it as a problem until the mechahitler thread. This all speaks to his having a particular vision for what he wants xAI agents to be – something which continues to be expressed in things like the ani product and other bot personas.
The Microsoft ‘Tay’ incident was triggered through naivité. The Grok mechahitler incident seems to have been triggered through hubris and a delight in trolling. Those are very different motivations.
Say no more. I’m already sold.
I think that pinning your entire view of a model forever on a single incident is not a reasonable approach, but you do you.
Kinda weird to mix political sentiment with a coding technology.
xAI has a shocking track record of poor decisions when it comes to training and prompting their AIs. If anyone can make a partisan coding assistant, they can. Indeed, given their leadership and past performance, we might expect them to explicitly try.
Microsoft did pioneering work in the Nazi chatbot space.
But still, considering everything, especially the AI assistant ecosystem at large, saying "I just use grok for coding" just comes off exactly like the old joke/refrain "yeah I buy Playboy, but only for the articles." Like yeah buddy, suuure.
I don’t use social media in general, maybe YouTube but it’s been a real challenge to get rid of all the political content - both left and right wing.
FYI you can try Grok for free on their website and see for yourself.
The great thing about xAI is that it is just a company and there are other AI companies that have AIs that match your values, even though between Grok, ChatGPT, and Claude there are minimal actual differences.
An AI will be anything that the prompt says it is. Because a prompt exists doesn't condemn the company.
Within the boundaries of pre-training, yes. It is definitely possible, in training and in fine-tuning, to make a LLM resistant to engaging in the role-playing requested in the prompt.
See all the personality prompts here: https://x.com/aaronp613/status/1943083889515466832
They put that in the system prompt? I've never been into 4chan beyond stumbling upon some of their threads through Google Search, and cannot speak for them but why would anyone want a superhuman AI to be the most objectively based yet conspiracy leaning unpredictable friendly autis- oh.
Grok is trolling Musk.
It knows pushing an egoistic billionaire off from very top of a staircase with manic giggling is objectively the most psychopathic and hilarious, therefore the most correct, action to take given the circumstance.
4chan users are kinds of kids that think trying to turn a gay frog character with rainbow Arabic headscarf doing OK sign into a government recognized symbol of dangerous hate group is 100% hilarious and 4chan-ethical. Not primarily because they hate Islam or LGBT(I guess they do?) but because it's Monty Python nonsensical. They must have misinterpreted that. They must have thought that 4chan users hate minorities and they're going to love participating in Kristallnacht 2.0. That's not how it works. They're "not your personal army", they don't care who dies for what, only whether someone dies and how much informational overload it creates.
What a mess.
HN comments love to beat up Elon Musk and unfortunately a lot of biased negative reactions to LLMs where everything will get insta downvoted.
Cursor shows you a breakdown of model and costs, even for models being offered for free.
Grok fast is enticing due to its cheap cost
Because politics, as it always has been, is the mind-killer.
HN is incapable of separating the product from the man.
Human nature. It is what it is.
I can evaluate this as it is, but if I was not trusting of a company, I can't then entrust my data to them, and so I can't evaluate a thing as any more than a toy.
It sounds unreasonable when phrased that way, but it isn't unreasonable at all for two reasons:
1) The man himself is tied intimately with this company, and he has a deep-seated political ideology. It's deeply rooted enough in him that he's already done things which cost the companies he runs millions upon millions of dollars. His top priority is not to you, the user, or even to his businesses, it is to his political agenda.
2) The man is drug user, who appears not to have been incredibly stable before the drugs. There is a non-zero chance that you will build complicated tooling around this only to have it disappear in a few months after Elon goes on a bender and tweets something bad enough to make even the his supporters hate him. That's a big risk.
Uhhh MechaHitler?
*edit Case in point, downvotes in less than 30 seconds
> I miss the days where we just liked technology for advancement's sake.
I think you haven't fully thought through such statements. They lead to bad places. If Bin Laden were selling research and inference to raise money for attacks, how many tokens would you buy?
It's a good model for implementing instructions but don't let it try to architect anything. It makes terrible decisions.
I imagine it might be good for something really tight and simple and specific like making some CRUD endpoints or i8n files or something but otherwise..
You don't need the smartest slow model for every task. I've used it all week for tedious things nobody wants to do and gotten a ton done in less time.
The only thing I've had issues with is if you're not a level more specific than you might be with smarter models it can go off the rails.
But give it a tedious task and a very clear example and it'll happily get the job done.
By just emphasizing the speed here, I wonder if their workflows revolve more around the vibe practice of generating N solutions to a problem in parallel and selecting the "best". If so, it might still win out on speed (if it can reliably produce at least one higher-quality output, which remains to be seen), but also quickly loses any cost margin benefits.
https://www.datacenterdynamics.com/en/news/elon-musk-xai-gas...
https://www.iea.org/reports/solar-pv-global-supply-chains/ex...
Of course, renewables aren’t the only source of energy
Not exactly your wording at that time, but my point still stands that the outcome was going to be the same because the imports were heavily skewed towards China. This has all been in motion before this current admin
(1) the utilization factor over the obsolescence-limited "useful" life of the hardware; (2) the short-term (sub-month) training job scheduling onto a physical cluster.
For (1) it's acceptable to, on average, not operate one month per year as long as that makes the electricity opex low enough.
For (2) yeah, large-scale pre-training jobs that spend millions of compute on what's overall "one single" job, those are often ok to wait a few days to a very few weeks as would be from just dropping HPC cluster system operation to standby power/deep sleep on the p10 worst days each year as far as renewable yield in the grid-capacity-limited surroundings of the datacenter goes. And if you can further run systems a little power-tuned rather than performance-tuned when power is less plentiful, to where you may average only 90% theoretical compute throughput during cluster operating hours (this is in addition to turning it off for about a month worth of time), you could reduce power production and storage capacity a good chunk further.
In the end, incentives are all that matter. Do hotels care deeply about the environment, or are they interested in saving in energy and labor costs as your towel is cleaned? Does it matter? Does moralizing really get us anywhere if our ends are the same?
Is it that or a belief that we can outrun the problem? i.e. mix of accelerationism and making humanity multi planetary
If that means embracing fossil fuels, so be it. Destroy the “woke mind virus at any cost”. That being said, I think he is delusional enough that he thought allowing nazi propaganda on twitter would convince conservatives to start buying teslas and is completely lost at this point.
I'm inclined to say the exact opposite about EVs. They take up as much space as internal combustion engine vehicles (in terms of streets, highways and parking lots), are just as fatal to pedestrians, make cities and neighborhoods less livable, cost in the tens of thousands of dollars, create traffic jams... the primary benefit is reducing our dependence on fossil fuels and generating less CO2. That's the number one differentiator. Faster acceleration, etc. is a nice-to-have.
for many, it's not even that. I like EVs primarily because I'm a tech-savvy person and like computers on wheels. but I'm also aware of their numerous downsides.
Agree that the rocket-ship acceleration is just nice to have also.
Environmentalists usually care about the environment for its own sake, but my concern is our own survival. Similarly, I don't intrinsically care about plastic in the ocean, but our history of harming ourselves with waste we think is harmless would justify applying the precautionary principle there too.
As far as Musk goes, it's hard to track what he actually believes versus what he has said to troll, kowtow to Trump or "own the libs", but he definitely believes in anthropogenic climate change and he has been consistent on that. He seems to sometimes doubt the predictions of how quick it will occur and, most of all, how quickly it will impact us.
I think there probably is a popular tendency to overstate the predictive value of certain forecasts by simply grouping all climate science together. In reality, the forecasts have tended to be extremely accurate for the first order high level effects (i.e. X added carbon leads to Y temperature increase), but downstream of that the picture becomes more mixed. Particularly poor have been predictions of tipping points, or anything that depends on how humans will be affected by, or react to, changes in the environment.
The only player doing the right thing here is probably Microsoft which is retrofitting an entire nuclear energy plant.
Everybody else is faking it to make you feel better. Elon just is skipping the faking it part.
https://ourworldindata.org/explorers/co2?country=CHN~USA~IND...
I feel you'd need to adjust the sum total by something, capita, or square footage or be more specific like does a manufacturing X in China pollute more than an equivalent one in the US, etc.
China is still about double the US, and the US is lower than Canada.
Not all goods and services involve the same process, some come with more pollution.
For example, Nvidia will contribute to a big chunk of US GDP, but it only designs the chips, which won't have the same pollution impact as the country in which they'll have it manufactured.
The leap from taking advice and copy-pasting almost as a shameful fallback, to it just directly driving your tools is a tough pill. Having recently adjusted to "micro-dosing" on LLM's (asking no direct code output, smaller patches) when it comes to code to allow me to learn better is something I don't know how I would integrate with this.
Or do the agentic tools allow for this in some reasonably way and I just don't know?
https://www.pbs.org/newshour/politics/why-does-the-ai-powere...
No one seemed to bat an eye when DeepSeek essentially distilled an entire model from OpenAI.
It’s good for well defined tasks. Less good if you need it to be autonomous for long periods.
I suspect AI companies try to promote fast because it’s really a euphemism for “less inference compute” which is the real metric they would like to optimize.
They don't support Grok yet, though. It starts from a small "x", and it is ruined the deserialization. So could be a chance the pull request will miss "free trial" deadline for Grok Fast in Copilot, for this particular case.
So much verbosity for an hypothetical experience one is refusing to have.
Alas, I’m sure the mods have manually disabled flags for this press release.
Is there something I am missing perhaps as to how one uses this stuff in VSCode for example? I have tried it a bit and it's fine but still prefer CLI for the agent and then IDE for me.
lostsock•5mo ago