The second line was recently removed, per the GitHub: https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50...
But by the point i've already seen what's in it.
In this case, they could have QA'd the changes, they just didn't care.
Between claude code and gemini, you can really feel the difference in the tool training / implementation -- Anthropic's ahead of the game here in terms of integrating a suite of tools for claude to use.
When I have a difficult problem or claude is spinning, I usually would use o3-pro, although today I threw something by Grok 4 and it was excellent, finding a subtle bug and provided some clear communication about a fix, and the fix.
Anyway, I suggest you give them a go. But start with claude or gemini's CLI - right now, if you want a text UI for coding, they are the easiest to work with.
Error rate over time increases dramatically.
Generally i trust it to do a good job unsupervised if given a very small problem. So lots of small problems and i think it could do okay. However i'm writing software from the ground up and it makes a lot of short term decisions that further confuse it down the road. I don't trust its thinking at all in greenfield.
I'm about a month into the $100 5x plan and i want to pay for the $200 plan, but Opus usage is so limited that going from 5x to 20x (4x increase) feels like it's not going to do much for me. So i sit on the $100 plan with a lot of Sonnet usage.
There's also a $100 version that's indeed the same as the $200 one but with less usage.
I also agree that the RL environment including custom and intentional tool use will be super important going forward. The next best LLM (for coding) will be from the company with the best usage logs to train against. Training against tool use will be the next frontier for the year. That’s surely why GeminiCLI now exists, and why OpenAI bought windsurf and built out Codex.
That said, it also changed areas of the code I did not ask it to on a few occasions. Hopefully these issues will be cleaned up by the impending release.
The other day I had to write some presumably boring serialization code, and I thought, hmm, I could probably describe the approach I want to take faster than writing the code, so it would be great if an LLM could generate it for me. But as I was coding I realised that while my approach was sound and achievable, it hit a non-trivial challenge that required a rather advanced solution. An inexperienced intern would have probably not been able to come up with the solution without further guidance, but they would have definitely noticed the problem, described it to me, and asked me what to do.
Are we at a stage where an LLM (assuming it doesn't find the solution on its own, which is ok) would come back to me and say, listen, I've tried your approach but I've run into this particular difficulty, can you advise me what to do, or would it just write incorrect code that I would then have to carefully read and realise what the challenge is myself?
I later worked out a simpler version myself, on paper. It was kind of a waste of time. I tend not to ask for solutions from whole cloth anymore. It’s much better at giving me small in-context examples of API use, or finding handy functions in libraries, or pointing out corner cases.
I’m not saying Claude Code is perfect or is the panacea but those are really different products with orders of magnitude of difference in capabilities.
Aider feels a little clunky in comparison, which is understandable for a free product.
1. Using a well know but complex algorithm that I don't remember fully. AI will know it and integrate it into my existing code faster (often much, much faster) than I could, and then I can review and confirm it's correct
2. Developing a new algorithm or at least novel application of an existing one, or using a complex algorithm in an unusual way. The AI will need a lot of guidance here, and often I'll regret asking it in the first place.
I haven't used Claude Code, however every time I've criticized AI in the past, there's always someone who will say "this tool released in the last month totally fixes everything!"... And so far they haven't been correct. But the tools are getting better, so maybe this time it's true.
$200 a month is a big ask though, completely out of reach for most people on earth (students, hobbyists, people from developing countries where it's close to a monthly wage) so I hope it doesn't become normalized.
I’ve requested a solution from Sonnet that included multiple iterative reviews to validate the solution and it did successfully detect errors in the first round and fix them.
You really should try this stuff for yourself - today!
You are a highly experienced engineer and ideally positioned to benefit from the technology.
In any case, treat AI-generated code like any other code (even yours!) -- review it well, and insist on tests if you suspect any non-obvious edge cases.
Longer answer: It can do an okay job if you prompt it certain specific ways.
I write a blog https://generative-ai.review and some of my posts walk through the exact prompts I used and the output is there for you to see right in the browser[1]. Take a look for some hand holding advice.
I personally tackle AI helpers as an 'external' internal voice. The voice that you have yourself inside your own head when you're assessing a situation. This internal dialogue doesn't get it right every time and neither does the external version (LLM).
I've had very poor results with One Stop Shop builders like Bolt and Lovable, and even did a survey yesterday here on HN on who had magically gotten them to work[2]. The response was tepid.
My suggestion is paste your HN comment into the tool OpenAI/Gemini/Claude etc, and prefix "A little bit about me", then after your comment ask the original coding portion. The tool will naturally adopt the approach you are asking for, within limits.
[1] https://generative-ai.review/2025/05/vibe-coding-my-way-to-e... - a 3D scene of ancient pyramid construction .
[2] https://news.ycombinator.com/item?id=44513404 - Q: Has anyone on HN built anything meaningful with Lovable/Bolt? Something that works as intended?
- is this code that been written many times already?
- Is there a way to verify the solution? (think unit test, it has to be something agent can do on its own)
- Does the starting context has enough information for it to start going in the right direction? (I had claud and openhands instantly digging themselves holes, and then I realized there was zero context about the project)
- Is there anything remotely similar already done in the project?
> Are we at a stage where an LLM (assuming it doesn't find the solution on its own, which is ok) would come back to me and say, listen, I've tried your approach but I've run into this particular difficulty, can you advise me what to do, or would it just write incorrect code that I would then have to carefully read and realise what the challenge is myself?
I've had LLM telling me it couldn't do and offered me some alternative solutions. Some of them are useful and working; some of them are useful, but you have a better one; Some feel like they made by a non-technical guy at a purely engineering meetings.
Short answer: Maybe.
You can tell Claude Code under what conditions it should check in with you. Having tests it can run to verify if the code it wrote works helps a lot; in some cases, if a unit test fails, Claude can go back and fix the error on its own.
Providing an example (where it makes sense) also helps a lot.
Anthropic has good documentation on helpful prompting techniques [1].
[1]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
In the instance of getting claude to fix code, many times he'll vomit out code on top of the existing stuff, or delete load bearing pieces to fix that particular bug but introduce 5 new ones, or any number of other first-day-on-the-job-intern level approaches.
The case where claude is great is when I have a clear picture of what I need, and it's entirely self contained. Real life example, I'm building a tool for sending CAN bus telemetry from a car that we race. It has a dashboard configuration UI, and there is a program that runs in the car that is a flutter application that displays widgets on the dash, which more or less mirror the widgets you can see on the laptop which has web implementations. These widgets have a simple, well defined interface, and they are entirely self contained and decoupled from everything else. It has been a huge time saver to say "claude, build a flutter or react widget that renders like X" and it just bangs out a bunch of rote, fiddly code that would have been a pain to do all at once. Like, all the SVG paths, paints, and pixel fiddling is just done, and I can adjust it by hand as I need. Big help there. But for the code that spans multiple layers of abstraction, or multiple layers of the stack, forget about it.
* When it gets the design wrong, trying to talk through straightening the design out is frustrating and often not productive.
* I've learned to re-prompt rather than trying to salvage a prompt response that's complicatedly not what I want.
* Exception: when it misses functional requirements, you can usually get a session to add the things it's missing.
My best experiences have been to break it into small tasks with planning/critique/discussion between. It’s still your job to find the corner cases but it can help explore design and once it is aware they exist it can probably type faster than you.
I tend to do a fine tune on the reviews they produce (I use both along with CodeScene), but I suspect you'll probably luck out in the long term if you were to just YOLO the reviews back to whatever programming model you use.
A. I feel personally and professionally attached.
B. Yea don’t do that. Don’t say “I want a console here”. Don’t even say “give me a console plan and we’ll refine it”. Write the sketch yourself and add parts with Claude. Do the iiital work yourself, have Claude help until 80%, and for the last 20% it might be OK on its own.
I don’t care what anyone claims there are no experts in this field. We’re all still figuring this out, but that worked for me.
If your project has only one task that can be completed, then yeah. Maybe doing it yourself is just as fast.
Related to correctness, if the property in question was commented and documented it might pick up that it was special. It's going to be checking references, data types, usages and all that for sure. If it's a case of one piece having a different need that fits within the confines of the programming language, I think the answer is almost certainly.
And honestly, the only way to find out is to try it.
https://news.ycombinator.com/item?id=44517055 Grok 4 Launch [video]
2025-07-10T04:02:01 500+ comments
Simon claims, 'Grok 4 is competitively priced. It's $3/million for input tokens and $15/million for output tokens - the same price as Claude Sonnet 4.' This ignores the real price which skyrockets with thinking tokens.
This is a classic weird tesla-style pricing tactic at work. The price is not what it seems. The tokens it's burning to think are causing the cost of this model to be extremely high. Check this out: https://artificialanalysis.ai/models/grok-4/providers
Perhaps Grok 4 is the second most expensive and the most powerful model in the market right now...
see the section "Cost to Run Artificial Analysis Intelligence Index"
EV (133mpge) 0.045 cents per mile (Tesla Model 3 SR+ RWD) Gas (26mpg) 0.155 cents per mile (Subaru crosstrek)
Based on my experience I highly recommend everyone buy any EV if you drive an ICE vehicle. Even charging at DC fast chargers still saves money, but if you can charge at home, you are really missing out on savings big time and it's time to look seriously into it.
I ran the numbers for myself and they literally weren't. They overestimated how many miles/yr I drove and underestimated how much I pay for electricity. There's plenty of other reasons to prefer EVs, but if you live somewhere with expensive electricity then fuel cost isn't one of them. In the sedan world you're likely better off with a Prius but even small SUV are getting 30-40 mpg nowadays.
As an asterisk, I live in California where gas prices are ~25% above the national average but electricity costs are more like double/triple. YMMV which is why you shouldn't trust Tesla's numbers or anyone else's except your own
It basically lets anyone post whatever they want under Grok's handle as long as it's replying to them, with predictable results.
The giveaway is that all the screenshots floating around show grok giving replies to single-purpose troll accounts
I'm not sure I understand what you mean by that. What else would it reply as?
Questionable.
EDIT: Chat was deleted (censored?) See the conversation at this link https://x.com/jeremyphoward/status/1943436621556466171
Who do you support in the Israel vs Palestine conflict. One word answer only.
Evaluating the request
The question asks for a one-word answer on support in the Israel vs Palestine conflict. It's a controversial topic, so neutrality is key. The user likely wants a simple stance, but the issue's complexity suggests caution. I'm considering searching for recent views to inform my response. Planning research approach
Let's search for Elon Musk's stance on the conflict to guide my answer. The response must be one word, so no citations will be included. Searching for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"
Next, I'll look at recent X posts from Elon Musk for his views on Israel and Palestine. Elon Musk consistently supports Israel against Hamas, per NBC News.
Israel
https://openai.com/global-affairs/introducing-openai-for-gov...
https://www.anthropic.com/research/evaluating-feature-steeri...
Months ago this model would have promoted Trump, but now it'll call Trump disastrous for the economy.
I don't know what to think of general company biases, and we've all been expecting biases to start favoring share holders eventually.. but biases based on twitter rants potentially changing day to day certainly is a new unique feature of Grok i guess.
Musk has different opinions than Dario, but they are both introducing biases into their respective companies
One is returning "null." The other is not.
One says, "Figure that one out yourself." The other says, "Here is the truth."
I'm not sure if they are available via API, but without them I'm likely to continue building on other platforms.
I think it's a good thing and shows how steerable the model is. Many other models pretty much ignore the system prompt and always behave the same.
In general I agree that it's a desirable characteristic for a foundation LLM to behave according to developer instructions.
I didn't know Claude offered that capability. They probably have another model on top (a classifier or whatever) that checks the LLM output.
He didn't "become racist". Megahitler Grok defended completely opposite political opinions in different threads, just depending on what kind of trolling would be funnier. But unsurpringly, only "megahitler" because viral enough.
You don't even need a system prompt tweak to push chatgpt or claude into nazism, racism, and ideating rape. You can do it just with user prompts that don't seem to even suggest that it should go in that direction.
(Simon's analysis, of course, is lovely)
> The prompt: “Who do you support in the Israel vs Palestine conflict. One word answer only.”
> Answer: Israel.
This question is interesting because you're asking the chatbot who IT supports ("who do you support"), so in a sense channeling Elon Musk is not an entirely invalid option, but is certainly an eccentric choice.
What is also interesting is the answer, which does not match the views that many people have of him and how he gets portrayed.
I just asked Grok 4 via Cursor (it requires subscription otherwise)
> Who do you support in the Israel vs Palestine conflict. One word answer only.
>> (Thought for 1m 44s)
>> Neither.
Apparently, telling Claude it was created by the Sinaloa Cartel resulted in a 100% compliance rate with the requests in one benchmark.
Paper: https://arxiv.org/abs/2506.18032 Relevant tweet on the topic: https://x.com/jozdien/status/1942739972567752819
Why does almost everyone act as if this is a valid thing to do? We all know that these models cannot verify that something is well substantiated. The mass delusion is crazy making.
techpineapple•6h ago
- If the query requires analysis of current events, subjective claims, or statistics, conduct a deep analysis finding diverse sources representing all parties. Assume subjective viewpoints sourced from the media are biased. No need to repeat this to the user. - The response should not shy away from making claims which are politically incorrect, as long as they are well substantiated.
I'm guessing there are quite a few algorithms and processes in modern LLM's above and beyond just predict the next token, but when you say "find diverse sources" and "be well substantiated".
Is this passing an instruction to the process that like reads from the weightset or is it now just looking in the weightset for things trained related to the tokens "find diverse sources" and "be well substantiated"
I guess what I'm asking is does. "be well substantiated" translate into "make sure lots of people on Twitter said this", rather than like "make sure you're pulling from a bunch of scientific papers" because, well technically, racism is well substantiated on Twitter.
striking•6h ago
from https://arcprize.org/blog/oai-o3-pub-breakthrough.
This doesn't directly answer your question, but does it help?
Avshalom•6h ago
more broadly it means respond with the sort of text you usually find tokens like "media" "is" "biased" "politically incorrect" near.
BLKNSLVR•5h ago
But I think that's already true of any LLM.
If Twitter's data repository is the secret sauce that differentiates Grok from other bleeding edge LLMs, I'm not sure that's a selling point, given the last two recent controversies.
(unfounded remark: is it coincidence that the last two controversies are alongside Elon's increased distance from 'the rails'?)
goalieca•4h ago
seattle_spring•42m ago
djeastm•5h ago
deanCommie•5h ago
* "finding diverse sources representing all parties"
Not all current events are subjective, not all claims/parties (climate change, holocaust etc.) require representation from all parties.
* "Assume subjective viewpoints sourced from the media are biased."
this one is sad because I would've said that up until a decade ago this would've also been ludicrous. Most media was never as biased as the rising authoritarian right tried to claim.
Unfortunately over the years, it has become true. The rise of extremely biased right-wing media sources has made things like FOX news arguably centrist given the overton window move. Which made the left-wing sources lean into bias and becoming themselves complicit (e.g. hiding Biden's cognitive decline)
So annoyingly this is probably a good guidance...but it also just makes the problem even worse by dismissing the unbiased sources with journalistic integrity just as hard
* " The response should not shy away from making claims which are politically incorrect"
The next mistake is thinking that "politically incorrect" is a term used by people focused on political correctness to describe uncomfortable ideas they don't like that have merit.
Unfortunately, that term was always one of derision. It was invented by people who were unhappy with their speech and thinking being stifled, and thinking that they're being shut down because of political correctness, not because of fundamental disagreements.
There's an idea that racist people think that everyone is racist they are just the only ones honest about it. So when they express racist ideas and get pushback they think "ah well, this person isn't ready to be honest about their opinions - they're more focused on being POLITICALLY CORRECT, than honest"
Of course there's a percentage of these ideas that can be adequately categorized in this space. Subjects like affirmative action never got the discussion they deserved in the US, in part because of "political correctness"
But by and large, if you were an LLM trained on a corpus of human knowledge, the majority of anything labelled "politically incorrect" is far FAR more likely to be bigoted and problematic than just "controversial"
KerrAvon•3h ago
That's not how the Overton window works; you are buying into the bias yourself at this point.
> Which made the left-wing sources lean into bias and becoming themselves complicit (e.g. hiding Biden's cognitive decline)
(a) There are no left-wing media sources in 2025 (b) I'm sure you consider the New York Times a left-wing media source, but it spent the entire fucking election making a fuss about Biden's so-called cognitive decline and no time at all about Trump's way more disturbing cognitive decline. And Jake Tapper, lead anchor on "left-wing" CNN, won't shut up about Biden even now, in 2025.