We’ve been building a demo of the multi-agent application that self-host a Gemma LLM for the Cloud Next '26 conference and found that while ADK has a built-in support when the agents calls other remote agents or MCP servers, there is nothing similar when it is configured to use a custom LLM endpoints to Cloud Run (running Ollama/vLLM).
Cloud Run's built-in IAM auth is great for security, but the ADK's LiteLLM connector doesn't natively manage the Google-signed ID token lifecycle. This causes agents to fail with 401s once the initial token expires (usually at the 1-hour mark).
This post explores three ways to solve this:
Static headers for ephemeral, scale-to-zero workloads.
Subclassing LiteLLMClient in Python to dynamically intercept acompletion requests, fetch credentials via ADC, and handle automated token refreshing on HTTP 401 errors.
Using a litellm-proxy sidecar container to abstract the authentication entirely.
Would love to get any feedback on these patterns!
minherz•46m ago
This post explores three ways to solve this: Static headers for ephemeral, scale-to-zero workloads. Subclassing LiteLLMClient in Python to dynamically intercept acompletion requests, fetch credentials via ADC, and handle automated token refreshing on HTTP 401 errors. Using a litellm-proxy sidecar container to abstract the authentication entirely. Would love to get any feedback on these patterns!