I built this because I kept running into two big headaches while scaling AI apps. First was the anxiety of accidentally leaking customer PII to OpenAI. Second was paying providers over and over for the exact same semantic queries.
I looked at existing routers, but I wanted something optimized for speed and strict privacy. So I built Sentinel Gateway using Go and Redis.
How it works:
Real-Time Trace Observability: Every request is logged in a central dashboard. You can see exactly what users are asking, which provider handled it, and confirm that PII was scrubbed in real-time. It’s a black box recorder for your LLM implementation.
Network-Level PII Guard: Instead of writing regex scrubbers in every app, you route through the gateway. It redacts SSNs, credit cards, and emails before the payload ever touches a public API.
Semantic Caching: A Redis layer checks for semantic similarity. If a user asks the same question, it serves the response in ~13ms and bypasses LLM costs entirely.
Multi-Provider Billing: It tracks token throughput and raw costs for OpenAI, Anthropic, Gemini, and Groq in one place.
I just finished a clean wipe of the database logs to get ready for launch. There is a free Hobby tier live now where you can test the PII redaction and watch the 13ms cache hits in real-time.
I am a solo dev and would love your honest feedback. What obvious edge cases am I missing in the caching logic?
ChipShotz•1h ago
I built this because I kept running into two big headaches while scaling AI apps. First was the anxiety of accidentally leaking customer PII to OpenAI. Second was paying providers over and over for the exact same semantic queries.
I looked at existing routers, but I wanted something optimized for speed and strict privacy. So I built Sentinel Gateway using Go and Redis.
How it works:
Real-Time Trace Observability: Every request is logged in a central dashboard. You can see exactly what users are asking, which provider handled it, and confirm that PII was scrubbed in real-time. It’s a black box recorder for your LLM implementation.
Network-Level PII Guard: Instead of writing regex scrubbers in every app, you route through the gateway. It redacts SSNs, credit cards, and emails before the payload ever touches a public API.
Semantic Caching: A Redis layer checks for semantic similarity. If a user asks the same question, it serves the response in ~13ms and bypasses LLM costs entirely.
Multi-Provider Billing: It tracks token throughput and raw costs for OpenAI, Anthropic, Gemini, and Groq in one place.
I just finished a clean wipe of the database logs to get ready for launch. There is a free Hobby tier live now where you can test the PII redaction and watch the 13ms cache hits in real-time.
I am a solo dev and would love your honest feedback. What obvious edge cases am I missing in the caching logic?