GitHub: https://github.com/arifozgun/OpenGem
The Context: Like many developers, I was constantly hitting "429 Quota Exceeded" errors while building AI agents and processing large payloads on free tiers. I wanted to build freely without calculating API costs for every test request.
How it works: I reverse-engineered the official Gemini CLI authentication to get standard API access. However, a single free Google account quota depletes quickly. To solve this, I built a Smart Load Balancer at the core of OpenGem.
What it does: - You connect multiple idle/free Google accounts to the dashboard via OAuth. - OpenGem acts as a standard endpoint (`POST /v1beta/models/{model}`). - It routes traffic to the least-used account. If an account hits a real 429 quota limit, OpenGem instantly detects it, puts that account on a 60-minute cooldown, and seamlessly retries with the next available account. It differentiates between simple RPM bursts and actual limits.
Tech specs: - Fully compatible with official Google SDKs (`@google/genai`), LangChain, and standard SSE streaming (no broken [DONE] chunks). - Supports native "tools" (Function Calling) for agentic workflows. - Raised payload limit to 50MB for massive contexts. - AES-256-GCM encryption for all sensitive configs and OAuth tokens at rest. - Toggle between Firebase Firestore or a fully offline Local JSON database.
It’s strictly for educational purposes and personal research to bypass the friction of testing/prototyping. The entire project is MIT licensed.
I’m currently running it with my own side projects and it handles heavy agent tasks flawlessly. I would love any feedback on the load balancing logic, security implementations, or just general thoughts!