Maybe it's just the kind of work I'm doing, a lot of web development with html/scss, and Google has crawled the internet so they have more data to work with.
I reckon different models are better at different kinds of work, but Gemini is pretty excellent at UI/UX web development, in my experience
Very excited to see what 3.0 is like
You need to give it detailed instructions and be willing to do the plumbing yourself, but we've found it to be very good at it
I default to using ChatGPT since I like the Projects feature (missing from Gemini I think?).
I occasionally run the same prompts in Gemini to compare. A couple notes:
1) Gemini is faster to respond in 100% of cases (most of my prompts kick ChatGPT into thinking mode). ChatGPT is slow.
2) The longer thinking time doesn’t seem to correlate with better quality responses. If anything, Gemini provides better quality analyses despite shorter response time.
3) Gemini (and Claude) are more censored than ChatGPT. Gemini/Claude often refuse medical related prompts, while ChatGPT will answer.
I went back to the censored chat I mentioned earlier, and got it to give me an answer when adding "You are a lifestyle health coach".
* Creative writing: Gemini is the unmatched winner here by a huge margin. I would personally go so far as to say Gemini 2.5 Pro is the only borderline kinda-sorta usable model for creative writing if you squint your eyes. I use it to criticize my creative writing (poetry, short stories) and no other model understands nuances as much as Gemini. Of course, all models are still pretty much terrible at this, especially in writing poetry.
* Complex reasoning (e.g. undergrad/grad level math): Gemini is the best here imho by a tiny margin. Claude Opus 4.1 and Sonnet 4.5 are pretty close but imho Gemini 2.5 writes more predictably correct answers. My bias is algebra stuff, I usually ask things about commutative algebra, linear algebra, category theory, group theory, algebraic geometry, algebraic topology etc.
On the other hand Gemini is significantly worse than Claude and GPT-5 when it comes to agentic behavior, such as searching a huge codebase to answer an open ended question and write a refactor. It seems like its tool calling behavior is buggy and doesn't work consistently in Copilot/Cursor.
Overall, I still think Gemini 2.5 Pro is the smartest overall model, but of course you need to use different models for different tasks.
It doesn't perform nearly as well as Claude or even Codex for my programming tasks though
The other big use-case I like Gemini for is summarizing papers or teaching me scholarly subjects. Gemini's more verbose than GPT-5, which feels nice for these cases. GPT-5 strikes me as terrible at this, and I'd also put Claude ahead of GPT-5 in terms of explaining things in a clear way (maybe GPT-5 could meet what I expect better though with some good prompting)
Joking obviously but I've noticed this too, I put up with it because the output is worth it.
But yeah it does do that otherwise. At one point it told me I'm a genius.
It isn't Gemini (the product, those are different orgs) though there may (deliberately left ambiguous) be overlap in LLM level bytes.
My recommendation for you in this use-case comes from the fact that AI Mode is a product that is built to be a good search engine first, presented to you in the interface of an AI Chatbot. Rather than Gemini (the app/site) which is an AI Chatbot that had search tooling added to it later (like its competitors).
AI Mode does many more searches (in my experience) for grounding and synthesis than Gemini or ChatGPT.
How often do you encounter loops?
I used Pro Mode in ChatGPT since it was available, and tried Claude, Gemini, Deepseek and more from time to time, but none of them ever get close to Pro Mode, it's just insanely better than everything.
So when I hear people comparing "X to ChatGPT", are you testing against the best ChatGPT has to offer, or are you comparing it to "Auto" and calling it a day? I understand people not testing their favorite models against Pro Mode as it's kind of expensive, but it would really help if people actually gave some more concrete information when they say "I've tried all the models, and X is best!".
(I mainly do web dev, UI and UX myself too)
I am, continuously, and have been since ChatGPT Pro appeared.
- Convert the whole codebase into a string
- Paste it into Gemini
- Ask a question
People seem to be very taken with "agentic" approaches were the model selects a few files to look at, but I've found it very effective and convenient just to give the model the whole codebase, and then have a conversation with it, get it to output code, modify a file, etc.
Then for each subsequent conversation I would ask the model to use this file as reference.
The overall idea is the same, but going through an intermediate file allows for manual amendments to the file in case the model consistently forgets some things, it also gives it a bit of an easier time to find information and reason about the codebase in a pre-summarized format.
It's sort of like giving a very rich metadata and index of the codebase to the model instead of dumping the raw data to it.
Also, use Google AI Studio, not the regular Gemini plan for the best results. You'll have more control over results.
I "grew up", as it were, on StackOverflow, when I was in my early dev days and didn't have a clue what I was doing I asked question after question on SO and learned very quickly the difference between asking a good question vs asking a bad one
There is a great Jon Skeet blog post from back in the day called "Writing the perfect question" - https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-...
I think this is as valid as ever in the age of AI, you will get much better output from any of these chatbots if you learn and understand how to ask a good question.
For writing and editorial work, I use Gemini 2.5 Pro (Sonnet seems simply worse, while GPT5 too opinionated).
For coding, Sonnet 4.5 (usually).
For brainstorming and background checks, GPT5 via ChatGPT.
For data extraction, GPT5. (Seems to be the best at this "needle in a haystack".)
However if you get the hang of it, it can be very powerful
Between the two, 100% of my code is written by AI now, and has been since early July. Total gamechanger vs. earlier models, which weren't usable for the kind of code I write at all.
I do NOT use either as an "agent." I don't vibe code. (I've tried Claude Code, but it was terrible compared to what I get out of GPro 2.5.)
But the past few days I started getting an "AI Mode" in Google Search that rocks. Way better than GPT-5 or Sonnet 4.5 for figuring out things and planning. And I've been using without my account (weird, but I'm not complaining). Maybe this is Gemini 3.0. I would love for it to be good at coding. I'm near limits on my Anthropic and OpenAI accounts.
Somewhat amusing 4th wall breaking if you open Python from the terminal in the fake Windows. Examples: 1. If you try to print something using the "Python" print keyword, it opens a print dialog in your browser. 2. If you try to open a file using the "Python" open keyword, it opens a new browser tab trying to access that file.
That is, it's forwarding the print and open calls to your browser.
} else if (mode === 'python') { if (cmd === 'exit()') { mode = 'sh'; } else { try { // Safe(ish) eval for demo purposes. // In production, never use eval. Use a JS parser library. // Mapping JS math to appear somewhat pythonesque let result = eval(cmd); if (result !== undefined) output(String(result)); } catch (e) { output(`Traceback (most recent call last):\n File "<stdin>", line 1, in <module>\n${e.name}: ${e.message}`, true); } }
In the Gemini app 2.5 Pro also regularly repeats itself VERBATIM after explicitly being told not to multiple times to the point of uselessness.
It's my goto coder; it just jives better with me than claude or gpt. Better than my home hardware can handle.
What I really hope for 3.0. Their context length is real 1 million. In my experience 256k is the real limit.
Based on what I'm hearing from friends who work at Google and are using it for coding, we're all going to be very disappointed.
Edit: It sound like they don't actually have Gemini 3 access, which would explain why they aren't happy with it.
Source: I work at Google (on payments, not any AI teams). Opinions mine not Google's.
So I get ChatGPT to spec out the work as a developer brief including suggested code then I give it to Gemini to implement.
This has been the same for every single LLM I've used, ever, they're all terrible at that.
So terrible that I've stopped going beyond two messages in total. If it doesn't get it right at the first try, its more and more unlikely to get it right for every message you add.
Better to always start fresh, iterate on the initial prompt instead.
Topfi•1h ago
More importantly, because of the way AIStudio does A/B testing, the only output we can get is for a single prompt and I personally maintain that outside of getting some basic understanding on speed, latency and prompt adherence, output from one single prompt is not a good measure for performance in the day-to-day. It also, naturally, cannot tell us a thing about handling multi file ingest and tool calls, but hype will be hype.
That there are people who are ranking alleged performance solely by one-prompt A/B testing output says a lot about how unprofessionally some evaluate model performance.
Not saying the Gemini 3.0 models couldn't be competitive, I just want to caution against getting caught up in over-excitement and possible disappointment. Same reason I dislike speculative content in general, it rarely is put into the proper context cause that isn't as eyecatching.