frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tell HN: Google increased existing finetuned model latency by 5x

9•deaux•1d ago
Since 5 days ago, the latency of our Finetuned 2.5 Flash models has suddenly jumped by 5x. For those less familiar, such finetuned models are often used to get close to the performance of a big model at one specific task with much less latency and cost. This means they're usually used for realtime, production use cases that see a lot of use and where you want to respond to the user quickly. Otherwise, finetuning generally isn't worth it. Many spend a few thousand dollars (at a minimum) on finetuning a model for one such task.

Five days ago, Google released Nano Banana Pro (Gemini 3.0 Image Preview) to the world. And since five days ago, the latency of our existing finetuned models has suddenly quintupled. We've talked with other startups who also make use of finetuned 2.5 Flash models, and they're seeing the exact same, even those in different regions. Obviously this has a big impact on all of our products.

From Google's side, nothing but silence, and this is talking about paid support. The reply to the initial support ticket is a request for basic information that has already been provided in that ticket or is trivially obvious. Since then, it's been more than 48 hours of nothingness.

Of course the timing could be a pure coincidence - though we've never seen any such latency instability before - but we can all see what's most likely here; Nano Banana Pro and Gemini 3 Preview consuming a huge amount of compute, and they're simply sacrificing finetuned model output for those. It's impossible to take them seriously for business use after this, who knows what they'll do next time. For all their faults, OpenAI have been a bastion of stability, despite being the most B2C-focused of all the frontier model providers. Google with Vertex claims to be all about enterprise and then breaks product of their business customers to get consumers their Ghibli images 1% faster. They've surely gotten plenty of tickets about this, and given Google's engineering, they must have automated monitoring that catches such a huge latency increase immediately. Temporary outages are understandable and happen everywhere, see AWS and Cloudflare recently, but 5+ days - if they even fix it - of 5x latency is effectively a 5+ day outage of a service.

I'm posting this mostly as a warning to other startups here to not rely on Google Vertex for user-facing model needs going forward.

Color.io Is Going Offline

7•hilti•1h ago•2 comments

Ask HN: Should account creation/origin country be displayed on HN profiles?

21•megraf•19h ago•29 comments

Ask HN: Hearing aid wearers, what's hot?

350•pugworthy•2d ago•208 comments

Ask HN: Scheduling stateful nodes when MMAP makes memory accounting a lie

22•leo_e•1d ago•17 comments

Ask HN: Have major security breeches been less common lately?

3•Wowfunhappy•9h ago•3 comments

Ask HN: Opinions on facial recognition at air ports?

4•bjourne•18h ago•22 comments

Ask HN: What did Stripe change (Value Add)?

4•dzonga•15h ago•4 comments

Google attacking human thought with Gemini in Google Keep

8•fellowniusmonk•15h ago•1 comments

Ask HN: Is Techmeme getting paid to boost certain articles?

3•dabockster•11h ago•1 comments

Ask HN: Hetzner asking for passport for new account? just me, or everyone?

2•casenmgreen•11h ago•4 comments

Ask HN: Good resources to learn financial systems engineering?

134•_1tan•2d ago•27 comments

Ask HN: How does one move from BigTech to more fullfilling places?

8•conqrr•8h ago•3 comments

Thoughts of a Neopagan /the Metal Ages

2•5wizard5•12h ago•3 comments

Ask HN: What is your monitor setup?

6•iwebdevfromhome•14h ago•9 comments

Ask HN: What work problems would your company pay to solve?

12•aryanchaurasia•1d ago•11 comments

A logging loop in GKE cost me $1,300 in 3 days – 9.2x my actual infrastructure

8•nthypes•1d ago•4 comments

Tell HN: Wanted to give dang appreciation

51•razodactyl•2d ago•4 comments

NeuroCode – A Structural Neural IR for Codebases

3•gabrielekarra•1d ago•0 comments

Tell HN: Cursor charged 19 subscriptions, won't refund

10•devtailz•1d ago•5 comments

Don't obsess with security and privacy unless they are your core business

8•amano-kenji•2d ago•13 comments

Tell HN: Google increased existing finetuned model latency by 5x

9•deaux•1d ago•0 comments

Ask HN: What tools do you pay for today that feel overpriced or frustrating?

7•psicombinator•2d ago•10 comments

GhostBin A lightweight pastebin, built with Go and Redis

3•sanaf•1d ago•0 comments

Ask HN: How do you balance creativity, love for the craft, and money?

19•introvertmac•3d ago•10 comments

Ask HN: Is America in Recession?

23•register•3d ago•35 comments

Ask HN: Photos corrupted on Google Pixel phones over time?

6•poolnoodle•1d ago•8 comments

Malicious Bun Script Found in NPM Package Bumps

4•kothariji•1d ago•1 comments

Ask HN: Working in a language that isn't your native one. How hard was it?

10•william-cooke•3d ago•17 comments

Why isn't There a open-source (project) game?

4•triilman•2d ago•6 comments

ZetaShare Building private file transfer with WebRTC

3•masterdegrees•2d ago•0 comments