frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Batch Mode in the Gemini API: Process More for Less

https://developers.googleblog.com/en/scale-your-ai-workloads-batch-mode-gemini-api/
62•xnx•3d ago

Comments

tripplyons•3h ago
For those who aren't aware, OpenAI has a very similar batch mode (50% discount if you wait up to 24 hours): https://platform.openai.com/docs/api-reference/batch

It's nice to see competition in this space. AI is getting cheaper and cheaper!

fantispug•1h ago
Yes, this seems to be a common capability - Anthropic and Mistral have something very similar as do resellers like AWS Bedrock.

I guess it lets them better utilise their hardware in quiet times throughout the day. It's interesting they all picked 50% discount.

qrian•1h ago
Bedrock has a batch mode but only for claude 3.5 which is like one year old, which isn't very useful.
bayesianbot•25m ago
DeepSeek has gone a bit different route - they give automatic 75% discount between UTC 16:30-00:30

https://api-docs.deepseek.com/quick_start/pricing

dlvhdr•24m ago
The latest price increases beg to differ
dsjoerg•2h ago
We used the previous version of this batch mode, which went through BigQuery. It didn't work well for us at the time because we were in development mode and we needed faster cycle time to iterate and learn. Sometimes the response would come back much faster than 24 hours, but sometimes not. There was no visibility offered into what response time you would get; just submit and wait.

You have to be pretty darn sure that your job is going to do exactly what you want to be able to wait 24 hours for a response. It's like going back to the punched-card era. If I could get even 1% of the batch in a quicker response and then the rest more slowly, that would have made a big difference.

cpard•2h ago
It seems that the 24h SLA is standard for batch inference among the vendors and I wonder how useful it can be when you have no visibility on when the job will be delivered.

I wonder why they do that and who is actually getting value out of these batch APIs.

Thanks for sharing your experience!

vineyardmike•2h ago
It’s like most batch processes, it’s not useful if you don’t know what the response will be and you’re iterating interactively. It for data pipelines, analytics workloads, etc, you can handle that delay because no one is waiting on the response.

I’m a developer working on a product that lets users upload content. This upload is not time sensitive. We pass the content through a review pipeline, where we did moderation and analysis, and some business-specific checks that the user uploaded relevant content. We’re migrating some of that to an LLM based approach because (in testing) the results are just as good, and tweaking a prompt is easier than updating code. We’ll probably use a batch API for this and accept that content can take 24 hours to be audited.

3eb7988a1663•2h ago
Think of it like you have a large queue of work to be done (eg summarize N decades of historical documents). There is little urgency to the outcome because the bolus is so large. You just want to maintain steady progress on the backlog where cost optimization is more important than timing.
YetAnotherNick•1h ago
Contrary to other comments it's likely not because of queue or general batch reasons. I think it is because that LLMs are unique in the sense that it requires lot of fixed nodes because of vRAM requirements and hence it is harder to autoscale. So likely the batch jobs are executed when they have free resources from interactive servers.
jampa•50m ago
> who is actually getting value out of these batch APIs

I used the batch API extensively for my side project, where I wanted to ingest a large amount of images, extract descriptions, and create tags for searching. After you get the right prompt, and the output is good, you can just use the Batch API for your pipeline. For any non-time-sensitive operations, it is excellent.

serjester•1h ago
We've submitted tens of millions of requests at a time and never had it take longer than a couple hours - I think the zone you submit to plays a role.
Jensson•36m ago
> If I could get even 1% of the batch in a quicker response and then the rest more slowly, that would have made a big difference.

You can do this, just send 1% using the regular API.

nnx•2h ago
It would be nice if OpenRouter supported batch mode too, sending a batch and letting OpenRouter find the best provider for the batch within given price and response time.
pugio•43m ago
Hah, I've been wrestling with this ALL DAY. Another example of Phenomenal Cosmic Powers (AI) combined with itty bitty docs (typical of Google). The main endpoint ("https://generativelanguage.googleapis.com/v1beta/models/gemi...") doesn't even have actual REST documentation in the API. The Python API has 3 different versions of the same types. One of the main ones (`GenerateContentRequest`) isn't available in the newest path (`google.genai.types`) so you need to find it in an older version, but then you start getting version mismatch errors, and then pydantic errors, until you finally decide to just cross your fingers and submit raw JSON, only to get opaque API errors.

So, if anybody else is frustrated and not finding anything online about this, here are a few things I learned, specifically for structured output generation (which is a main use case for batching) - the individual request JSON should resolve to this:

```json { "request": { "contents": [ { "parts": [ { "text": "Give me the main output please" } ] } ], "system_instruction": { "parts": [ { "text": "You are a main output maker." } ] }, "generation_config": { "response_mime_type": "application/json", "response_json_schema": { "type": "object", "properties": { "output1": { "type": "string" }, "output2": { "type": "string" } }, "required": [ "output1", "output2" ] } } }, "metadata": { "key": "my_id" } } ```

To get actual structured output, don't just do `generation_config.response_schema`, you need to include the mime-type, and the key should be `response_json_schema`. Any other combination will either throw opaque errors or won't trigger Structured Output (and will contain the usual LLM intros "I'm happy to do this for you...").

So you upload a .jsonl file with the above JSON, and then you try to submit it for a batch job. If something is wrong with your file, you'll get a "400" and no other info. If something is wrong with the request submission you'll get a 400 with "Invalid JSON payload received. Unknown name \"file_name\" at 'batch.input_config.requests': Cannot find field."

I got the above error endless times when trying their exact sample code: ``` BATCH_INPUT_FILE='files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemi... \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${BATCH_INPUT_FILE} } } } }" ```

Finally got the job submission working via the python api (`file_batch_job = client.batches.create()`), but remember, if something is wrong with the file you're submitting, they won't tell you what, or how.

great_psy•29m ago
Is this an indication of the peak of the AI bubble ?

In a way this is saying that there are some GPUs just sitting around so they would rather get 50% than nothing for their use.

graeme•15m ago
Seems more like electricity pricing, which has peak and offpeak pricing for most business customers.

To handle peak daily load you need capacity that goes unused in offpeak hours.

reasonableklout•15m ago
Why do you think that this means "idle GPU" rather than a company recognizing a growing need and allocating resources toward it?

It's cheaper because it's a different market with different needs which can be served by systems optimizing for throughput instead latency. Feels like you're looking for something that's not there.

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels

https://github.com/fosrl/pangolin
110•miloschwartz•7h ago•16 comments

Postgres LISTEN/NOTIFY does not scale

https://www.recall.ai/blog/postgres-listen-notify-does-not-scale
352•davidgu•3d ago•134 comments

Batch Mode in the Gemini API: Process More for Less

https://developers.googleblog.com/en/scale-your-ai-workloads-batch-mode-gemini-api/
63•xnx•3d ago•18 comments

Australia is quietly introducing age checks for search engines like Google

https://www.abc.net.au/news/2025-07-11/age-verification-search-engines/105516256
45•ahonhn•1h ago•21 comments

The ChompSaw: A Benchtop Power Tool That's Safe for Kids to Use

https://www.core77.com/posts/137602/The-ChompSaw-A-Benchtop-Power-Tool-Thats-Safe-for-Kids-to-Use
122•surprisetalk•3d ago•81 comments

Series of posts on HTTP status codes

https://evertpot.com/http/
18•antonalekseev•1d ago•4 comments

What is Realtalk’s relationship to AI? (2024)

https://dynamicland.org/2024/FAQ/#What_is_Realtalks_relationship_to_AI
242•prathyvsh•13h ago•81 comments

Show HN: Open source alternative to Perplexity Comet

https://www.browseros.com/
188•felarof•11h ago•65 comments

FOKS: Federated Open Key Service

https://foks.pub/
197•ubj•16h ago•43 comments

Apple-1 Computer, handmade by Steve Jobs [video]

https://www.youtube.com/watch?v=XdBKuBhdZwg
33•guiambros•2d ago•6 comments

Graphical Linear Algebra

https://graphicallinearalgebra.net/
205•hyperbrainer•13h ago•15 comments

Flix – A powerful effect-oriented programming language

https://flix.dev/
240•freilanzer•15h ago•96 comments

Measuring the impact of AI on experienced open-source developer productivity

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
556•dheerajvs•12h ago•357 comments

Red Hat Technical Writing Style Guide

https://stylepedia.net/style/
180•jumpocelot•14h ago•77 comments

America's fastest-growing suburbs are about to get expensive

https://www.vox.com/future-perfect/417892/suburbs-sunbelt-housing-affordability-yimby
6•littlexsparkee•28m ago•1 comments

Launch HN: Leaping (YC W25) – Self-Improving Voice AI

57•akyshnik•11h ago•27 comments

eBPF: Connecting with Container Runtimes

https://h0x0er.github.io/blog/2025/06/29/ebpf-connecting-with-container-runtimes/
44•forxtrot•9h ago•1 comments

Analyzing database trends through 1.8M Hacker News headlines

https://camelai.com/blog/hn-database-hype/
133•vercantez•3d ago•66 comments

Grok 4

https://simonwillison.net/2025/Jul/10/grok-4/
239•coloneltcb•9h ago•178 comments

AI coding tools can reduce productivity

https://secondthoughts.ai/p/ai-coding-slowdown
89•gk1•5h ago•59 comments

Diffsitter – A Tree-sitter based AST difftool to get meaningful semantic diffs

https://github.com/afnanenayet/diffsitter
108•mihau•16h ago•28 comments

Belkin ending support for older Wemo products

https://www.belkin.com/support-article/?articleNum=335419
68•apparent•10h ago•54 comments

Nerve pain drug gabapentin linked to increased dementia, cognitive impairment

https://medicalxpress.com/news/2025-07-nerve-pain-drug-gabapentin-linked.html
38•clumsysmurf•3h ago•24 comments

Researchers create 3D interactive digital room from simple video

https://news.cornell.edu/stories/2025/06/researchers-create-3d-interactive-digital-room-simple-video
5•rbanffy•3d ago•0 comments

Matt Trout has died

https://www.shadowcat.co.uk/2025/07/09/ripples-they-cause-in-the-world/
168•todsacerdoti•21h ago•47 comments

Regarding Prollyferation: Followup to "People Keep Inventing Prolly Trees"

https://www.dolthub.com/blog/2025-07-03-regarding-prollyferation/
47•ingve•3d ago•1 comments

The Lumina Probiotic May Cause Blindness in the Same Way as Methanol

https://substack.com/home/post/p-168042147
58•exolymph•1h ago•21 comments

Is Gemini 2.5 good at bounding boxes?

https://simedw.com/2025/07/10/gemini-bounding-boxes/
264•simedw•16h ago•58 comments

Foundations of Search: A Perspective from Computer Science (2012) [pdf]

https://staffwww.dcs.shef.ac.uk/people/J.Marshall/publications/SFR09_16%20Marshall%20&%20Neumann_PP.pdf
11•mooreds•3d ago•0 comments

Show HN: Cactus – Ollama for Smartphones

128•HenryNdubuaku•9h ago•48 comments