Computer use in Gemini 3.5 Flash

https://blog.google/innovation-and-ai/models-and-research/gemini-models/introducing-computer-use-gemini-3-5-flash/

75•swolpers•2h ago

Comments

satvikpendem•1h ago

There's still no MCP support in the Gemini app, which is very useful to get various pieces of info as a user just via chatting. For example I recently wanted to get an Airbnb and wanted to filter by specific criteria including house image analysis and Gemini couldn't do it so I had to do it in Codex.

tonyrice•1h ago

This is why I don't always use the official Gemini Web app. Lately I've found that it's more useful to utilize a CLI. I'm looking forward to the day they add MCP in the web.

pregseahorses•1h ago

Gemini CLi now requires antigravity subscription..

singingtoday•1h ago

CLI doesn't work with my subscription..

anticorporate•1h ago

Yeah, it seems like this is the biggest missing feature from the Gemini ecosystem.

If I can't connect MCP, there's really no selling point for me to use Gemini from my watch, car, smart speaker, etc. If I'm already bound to using my own front end, then I'm only evaluating Gemini as a model/API, at which point it has many competitors that may be cheaper or better fit for the task.

thejaycampbell•1h ago

agreed... this is where they lost me too

mitchell_h•25m ago

I'm fairly convinced Claude's strongest point is the app. AI users aren't anywhere near as mature or smart as youtube/hn would have folks believe. The claude app is amazing for bridging that gap.

airstrike•1h ago

Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

I guess if you're trying to get people to tokenmaxx it may look like a valid strategy, but ain't no way this will be delightful to users.

I think it's a symptom of just not understanding how LLMs should interface with the OS because we're still in their early days.

Eventually there'll be an iPhone moment for the ergonomics of LLM usage outside of coding

nzach•58m ago

> Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

And yet having an agent able yo use a computer on your behalf is really useful.

Recently I gave a Nix OS vm to my hermes agent and it has been a good experience. I don't really care if destroy the machine I can just rollback to an earlier version, and for any meaningful data he creates for me I make sure he creates a repo, commit and pushes to my private Gitea instance.

airstrike•39m ago

> And yet having an agent able yo use a computer on your behalf is really useful.

It is, but there's no need for it to be viewing your screen, browsing websites and watching ads.

That stuff is for humans, not for LLMs.

nzach•20m ago

Sure, I don't want an agent watching MY screen. That's why I gave him his own environment, and pretty quickly he discovered that you can open chrome and make it render to a framebuffer, this way he is able to 'view' the website. And apparently with this he is able to bypass a lot of 'anti-bot' measures.

dbbk•

mlmonkey•1h ago

It's funny how in their own graph, https://storage.googleapis.com/gweb-uniblog-publish-prod/ima... Gemini 3.5 Flash is beat hands down by both Opus 4.8 and GPT 5.5, and yet the graph is drawn as if Gemini wins ... :-D

sheept•1h ago

It highlights the Gemini models blue since that's what the article is about. The bar heights seem consistent with the values.

mroche•1h ago

The graph has Gemini 3.5 Flash matching Sonnet 4.6, losing to Opus 4.8, and slightly behind GPT-5.5 by 0.3 points... That's not that much of a hands-down loss for Gemini for this specific workload benchmark.

The methodology used:

https://deepmind.google/models/evals-methodology/gemini-3-5-...

Methodology: All Gemini scores are pass @1 except where otherwise noted. "Single attempt" settings allow no majority voting or parallel test-time compute. All of the results are all run with the Gemini API for the model-id gemini-3.5-flash with default sampling settings unless indicated otherwise below. To reduce variance, we average over multiple trials for smaller benchmarks.

All the results for non-Gemini models are sourced from providers' self reported numbers unless otherwise mentioned below. For Claude Opus 4.7 , Sonnet 4.6, and GPT-5.5 we default to reporting maximum thinking/reasoning settings available, but when reported results are not available we use best available reasoning results.

gb2d_hn•1h ago

It's honest - people who know what they are looking at will take speed and token costs into account. I don't use Gemini 3.5 for coding, but I use it as something in between a search engine and agent.

beastman82•1h ago

No UI like their competitors Claude CoWork or Codex. This is vaporware

villgax•1h ago

Will it skip Ads lol

humblyCrazy•1h ago

I looked at their demo and it does not

chatmasta•6m ago

Better question might be will it skip recaptcha?

knollimar•39m ago

Where is 3.5 pro?

zuzululu•33m ago

performance is quite impressive given that its 3x cheaper than 5.5

fridder•25m ago

I wonder if it will be better at building TUI's. It has been absolutely abysmal at interacting with them and building them

chatmasta•8m ago

Claude can build UI but it sucks at testing it and iterating on it. Fable showed some improvements in this regard but alas.

revolvingthrow•16m ago

People using google’s models: am I holding it wrong or are the guardrails really overtuned?

I had the dubious pleasure of testing gemini of late and I kept running into refusals. How do I transfer a sim number from one provider to another? No. What should I consider when making backups on ntfs less prone to data loss and more bitrot resistant? No. Evaluate this piece of code? No.

I’m not sure if it’s cold feet from the mythos situation or what, but it reminds me of the dark days where you couldn’t use ai for much of anything. But then I go to chatgpt 5.5 and it does mostly everything I want outside of the usual cybersecurity boogeyman that you run into now and then.

kordlessagain•4m ago

I love antigravity. I’ve had zero issues with it.

OpenAI unveils its first custom chip, built by Broadcom

RubyLLM: A Ruby framework for all major AI providers

Thomann takes legal action against Fender

We’re making Bunny DNS free

PR spam today looks like email spam in the early 2000s

There are a few things that I look back on as my mistakes in the early days

Computer use in Gemini 3.5 Flash

Show HN: Nub – A Bun-like all-in-one toolkit for Node.js

I taught a bucket to speak Git

Stealing Is a Skill

Krea 2: SOTA open-weights 12B image model

Running Windows Games on a Hobby OS with Wine

Pull request limits are cutting down the noise

Big AI labs are hiring philosophers

Show HN: Monolisa v3 – a typeface for developers and creatives

Journalism is rearranging the deckchairs. It needs to reinvent itself

Genuinely, my all-time favourite image: Mamenchisaurus hochuanensis

A Practical Guide to SSH Tunnels: Local and Remote Port Forwarding

Self-Harness: Harnesses That Improve Themselves

NSA lost access to Mythos amid Anthropic dispute

Why eval startups fail (2025)

Too many R packages: CRAN is inundated with submissions

Show HN: peerd – AI agent harness that runs entirely in your browser

I rewrote PostHog's SQL parser, 70x faster, while barely looking at the code

For Most of the World, Open-Source AI Is the Only Way Forward

Boffin claims Microsoft’s “quantum leap” is invalid due to “basic Python errors”

Show HN: Pure Effect – Reproduce production bugs on your laptop without a DB

Ashby (YC W19) Is Hiring EMEA Engineers Who Can Design

Raspberry Pi Pico W as USB Wi-Fi Adapter

The Xteink X4 E-Ink Reader

OpenAI unveils its first custom chip, built by Broadcom

RubyLLM: A Ruby framework for all major AI providers

Thomann takes legal action against Fender

We’re making Bunny DNS free

PR spam today looks like email spam in the early 2000s

There are a few things that I look back on as my mistakes in the early days

Computer use in Gemini 3.5 Flash

Show HN: Nub – A Bun-like all-in-one toolkit for Node.js

I taught a bucket to speak Git

Stealing Is a Skill

Krea 2: SOTA open-weights 12B image model

Running Windows Games on a Hobby OS with Wine

Pull request limits are cutting down the noise

Big AI labs are hiring philosophers

Show HN: Monolisa v3 – a typeface for developers and creatives

Journalism is rearranging the deckchairs. It needs to reinvent itself

Genuinely, my all-time favourite image: Mamenchisaurus hochuanensis

A Practical Guide to SSH Tunnels: Local and Remote Port Forwarding

Self-Harness: Harnesses That Improve Themselves

NSA lost access to Mythos amid Anthropic dispute

Why eval startups fail (2025)

Too many R packages: CRAN is inundated with submissions

Show HN: peerd – AI agent harness that runs entirely in your browser

I rewrote PostHog's SQL parser, 70x faster, while barely looking at the code

For Most of the World, Open-Source AI Is the Only Way Forward

Boffin claims Microsoft’s “quantum leap” is invalid due to “basic Python errors”

Show HN: Pure Effect – Reproduce production bugs on your laptop without a DB

Ashby (YC W19) Is Hiring EMEA Engineers Who Can Design

Raspberry Pi Pico W as USB Wi-Fi Adapter

The Xteink X4 E-Ink Reader

Computer use in Gemini 3.5 Flash

Comments