You have to be pretty darn sure that your job is going to do exactly what you want to be able to wait 24 hours for a response. It's like going back to the punched-card era. If I could get even 1% of the batch in a quicker response and then the rest more slowly, that would have made a big difference.
I wonder why they do that and who is actually getting value out of these batch APIs.
Thanks for sharing your experience!
I’m a developer working on a product that lets users upload content. This upload is not time sensitive. We pass the content through a review pipeline, where we did moderation and analysis, and some business-specific checks that the user uploaded relevant content. We’re migrating some of that to an LLM based approach because (in testing) the results are just as good, and tweaking a prompt is easier than updating code. We’ll probably use a batch API for this and accept that content can take 24 hours to be audited.
I used the batch API extensively for my side project, where I wanted to ingest a large amount of images, extract descriptions, and create tags for searching. After you get the right prompt, and the output is good, you can just use the Batch API for your pipeline. For any non-time-sensitive operations, it is excellent.
You can do this, just send 1% using the regular API.
So, if anybody else is frustrated and not finding anything online about this, here are a few things I learned, specifically for structured output generation (which is a main use case for batching) - the individual request JSON should resolve to this:
```json { "request": { "contents": [ { "parts": [ { "text": "Give me the main output please" } ] } ], "system_instruction": { "parts": [ { "text": "You are a main output maker." } ] }, "generation_config": { "response_mime_type": "application/json", "response_json_schema": { "type": "object", "properties": { "output1": { "type": "string" }, "output2": { "type": "string" } }, "required": [ "output1", "output2" ] } } }, "metadata": { "key": "my_id" } } ```
To get actual structured output, don't just do `generation_config.response_schema`, you need to include the mime-type, and the key should be `response_json_schema`. Any other combination will either throw opaque errors or won't trigger Structured Output (and will contain the usual LLM intros "I'm happy to do this for you...").
So you upload a .jsonl file with the above JSON, and then you try to submit it for a batch job. If something is wrong with your file, you'll get a "400" and no other info. If something is wrong with the request submission you'll get a 400 with "Invalid JSON payload received. Unknown name \"file_name\" at 'batch.input_config.requests': Cannot find field."
I got the above error endless times when trying their exact sample code: ``` BATCH_INPUT_FILE='files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemi... \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY" \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${BATCH_INPUT_FILE} } } } }" ```
Finally got the job submission working via the python api (`file_batch_job = client.batches.create()`), but remember, if something is wrong with the file you're submitting, they won't tell you what, or how.
In a way this is saying that there are some GPUs just sitting around so they would rather get 50% than nothing for their use.
To handle peak daily load you need capacity that goes unused in offpeak hours.
It's cheaper because it's a different market with different needs which can be served by systems optimizing for throughput instead latency. Feels like you're looking for something that's not there.
tripplyons•3h ago
It's nice to see competition in this space. AI is getting cheaper and cheaper!
fantispug•1h ago
I guess it lets them better utilise their hardware in quiet times throughout the day. It's interesting they all picked 50% discount.
qrian•1h ago
bayesianbot•25m ago
https://api-docs.deepseek.com/quick_start/pricing
dlvhdr•24m ago