I'm building Nexus Gateway, an AI gateway that helps developers reduce LLM API costs.
Problem: Many applications send repeated or semantically similar prompts to LLMs, which leads to unnecessary API calls and higher costs.
Solution: Nexus Gateway uses semantic caching to detect similar prompts and serve cached responses instead of calling the LLM again.
Features: • Semantic caching to reduce repeated API calls • Multi-model support (OpenAI, Gemini, Llama, Anthropic) • BYOK support • PII protection and sovereign AI layer (in progress)
Goal: Reduce LLM costs by 40–70% while improving latency.
I’d really appreciate feedback from the community.
Website: https://www.nexus-gateway.org
Sunnyanand_dev•1h ago
The main reason I built this is because I noticed many AI applications repeatedly send very similar prompts to LLM APIs. That means developers end up paying for the same reasoning multiple times.
Nexus Gateway tries to solve this using semantic caching. Instead of only checking for exact prompt matches, it detects semantically similar prompts and can serve cached responses when appropriate.
Current features include: • Multi-model support (OpenAI, Gemini, Anthropic, Llama) • BYOK (Bring Your Own Key) • Semantic caching to reduce repeated API calls • Model routing
I'm currently also working on: • PII protection layers • Sovereign AI support for regulated industries like banks and hospitals
My goal is to build an infrastructure layer that helps teams reduce LLM costs and improve latency without changing much of their existing code.
I’d love feedback from the community — especially around: • semantic caching strategies • similarity thresholds • enterprise security requirements
Happy to answer any technical questions.