If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
> If you want to disable thinking, you can set the reasoning effort to "none".
For other APIs, you can set the thinking tokens to 0 and that also works.
For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json
If you use cloud build you shouldn't need to do anything
- There are principals. (users, service accounts)
- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.
- Permissions guard the things you can do in Google cloud.
- There are builtin roles that wrap up sets of permissions.
- you can create your own custom roles.
- attach roles to principals to give them parcels of permissions.
And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
creds = service_account.Credentials.from_service_account_file(
SA_FILE,
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/generative-language",
]
)
google.genai.Client(
vertexai=True,
project=PROJECT_ID,
location=LOCATION,
http_options={"api_version": "v1beta1"},
credentials=sa_creds,
)
That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.
(ref: https://cloud.google.com/iam/docs/migrate-from-service-accou...)
https://github.com/ryao/gemini-chat
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
Example for 1.5:
https://github.com/googleapis/python-aiplatform/blob/main/ve...
I agree the API docs are not high on the usability scale. No examples, just reference information with pointers to types, which embed other types, which use abstract descriptions. Figuring out what sort of json payload you need to send, can take...a bunch of effort.
It's the best model out there.
That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!
It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.
You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.
Here's the code: https://github.com/simonw/tools/blob/main/gemini-mask.html
I agree though, their marketing and product positioning is super confusing and weird. They are running their AI business in a very very very strange way. This has created a delay, I don't think opportunity for others, in their dominance in this space.
Using Gemini inside BigQuery (this is via Vertex) is such a stupid good solution. Along with all of the other products that support BigQuery (datastream from cloudsql MySQL/postgres, dataform for query aggregation and transformation jobs, BigQuery functions, etc.), there's an absolutely insane amount of power to bring data over to Gemini and back out.
It's literally impossible for OpenAI to compete because Google has all of the other ingredients here already and again, the user base.
I'm surprised AWS didn't come out stronger here, weird.
jauntywundrkind•7h ago
In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.
It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.
I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.
simonw•6h ago
jasonfarnon•5h ago
simonw•5h ago
Google's API's have a way steeper learning curve than is necessary. So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.
Their permission model is diabolically complex to figure out too - same vibes as AWS, Google even used the same IAM acronym.
Aeolun•4h ago
PantaloonFlames•3h ago
I don't see that dependency. With ANY of the APIs. They're all documented. I invoke them directly from within emacs . OR you can curl them. I almost never use the wrapper libraries.
I agree with your point that the client libraries are large and complicated, for my tastes. But there's no inherent dependency of the API on the library. The dependency arrow points the other direction. The libraries are optional; and in my experience, you can find 3p libraries that are thinner and more targeted if you like.
tyre•4h ago
PantaloonFlames•3h ago
candiddevmike•4h ago
aaronbrethorst•4h ago
Maybe we’ll get a do-over with Google.
caturopath•4h ago
huntertwo•3h ago