Going to download Gemini CLI right now™ and see how it performs™ against Cursor, Claude Code, Aider, OpenCode, Droid, Warp, Devin, and ForgeCode.
In the end, the model only ran `git bisect` (if we're to believe the video at least) for various pointless reasons, it isn't being used for what it's usually used for. Why did it run bisect at all? Well, the user asked the LLM to use `git bisect` to find a specific commit, but that doesn't make sense, `git bisect` is not for that, so what the user is asking for, isn't possible.
Instead of the model stopping and saying "Hey, that's not the right idea, did you mean ... ?" so to ensure it's actually possible and what the user wants, the model runs its own race and start invoking a bunch of other git commands, because that's how you'd find that commit the user is looking for, and then finally does some git bisecting stuff just for fun, it had already found the right commit.
I think I see the same thing when letting LLMs code as well. If you give them some work to do that is actually impossible, but the words kind of make sense, and it'll produce something but not what you wanted, I think they're doing exactly the same thing, bypassing what you clearly instructed so they at least do something.
I'm not sure if I'm just hallucinating that they're acting like that, but LLMs doing "the wrong thing" has been hitting me more than once, and imagining something more dangerous than `do a git bisect`, it seems to me like that video is telling us Gemini 3 Pro will act exactly the same way, no improvements on that front.
Also, do these blog posts not go through review from engineering before they're published? Besides the video not really showcasing anything of interest, the prompt itself doesn't make any sense and would have been caught if a engineer who uses git at least weekly reviewed it before.
Video is really a terrible format for terminal demos, you've got to pause it as the screen flashes text faster than you can read...
But what is that actually doing? It looks like when it's running the git bisect, it already knows what the commit is, and could have just returned it. The only reason it ran any bisecting at all, was because the user (erroneously) asked it specifically to use git bisect. It didn't have to.
> gemini
It seems like you don't have access to Gemini 3. Learn more at https://goo.gle/enable-preview-features To disable Gemini 3, disable "Preview features" in /settings. • 1. Switch to gemini-2.5-pro • 2. Stop Note: You can always use /model to select a different option.
Google never disappoints with their half-ass-launches.
chis•44m ago
Currently my ranking is
* Cursor composer: impressively fast and able but not tuned to be that agentic, so it's better for one-shot code changes than long-running tasks. Fantastic UI.
* Claude Code: Works great if you can set up a verifiable environment, a clear plan and set it loose to build something for an hour
* Grok: Similar to cursor composer but slower and more agentic. Not currently using.
* ChatGPT Codex, Gemini: Haven't tried yet.
all2•42m ago
malnourish•37m ago
xnx•36m ago
bionhoward•24m ago
bobson381•24m ago
embedding-shape•23m ago
Gemini CLI has the lowest rate limits, lowest inability to steer the models (not sure that's a model or tooling thing, but I cannot get any of the Google models to stop outputting code comments constantly and everywhere) and seemingly the API frequently becomes unavailable for some reason.
Claude Code is fast, easy to steer, but the quality really degrades really quickly and randomly, seemingly by time of day. I'm not sure if they're running differently quanitized models during different times, but there is a clear quality difference depending on when in the day I use it, strangely. Haven't found a way of verifying this though, ideas welcome.
Codex CLI is probably what I use the most, with "gpt-5+high", which is kind of slow, a lot slower than Claude Code, but it almost always gets it right on the first try, and seemingly no other model+tool does instruction following as good, even if your AGENTS.md is almost overflowing with rules and requirements, it seems to nail things anyways.
joedevon•6m ago
esafak•15m ago
dinkleberg•7m ago