This got me thinking about a related trust boundary issue though: even with credentials protected, the agent can still be manipulated through its inputs. Prompt injection via tool outputs or RAG retrieval can trick an agent into calling those credentialed endpoints in unintended ways. Your calendar API key is safe, but a malicious payload in an email body could still instruct the agent to "delete all meetings" through the legitimate Wardgate-protected endpoint.
I've been working on PromptShield which tackles the input validation layer (sanitizing what comes back from tools/retrieval before it hits the model). Feels like these are complementary pieces of the same puzzle.
Curious about your threat model assumptions - are you primarily defending against credential exfiltration, or also thinking about the abuse-through-legitimate-channels vector? The access rules and logging you mention could be really powerful for the latter too (rate limiting, anomaly detection, etc).
So you would configure this:
endpoints:
calendar:
preset: google-calendar
auth:
credential_env: WARDGATE_CRED_GOOGLE_CALENDAR
capabilities:
read_data: allow
create_events: allow
update_events: ask
delete_events: ask
So updating or deleting events requires human permission.There are already time controls and rate-limiting included.
On the list for things to develop is an LLM model adapter as well, that could detect prompt injection, but also identity-masking and credential-triggering-approvals. Anomaly detection is on the todo.
The threat model is agents deliberately (because of gullibility, prompt injection, or dumb actions) leaking data and either detecting that early on or preventing such things.
avoutic•1h ago
I built Wardgate [1] because I wanted agents to access my calendar, tasks, e-mail and other services, but not by giving them my actual credentials or giving them full access.
For some services you can create API keys with limited scope, but most often API keys just get full capabilities.
Wardgate is a proxy: agents call Wardgate endpoints, Wardgate injects real credentials, enforces access rules, and logs everything. The agent never sees your keys.
Written in Go, easy to self-host. Has presets for common services (Todoist, GitHub, Gmail, etc.) and IMAP/SMTP adapters for email.
Happy to discuss the architecture or take feedback.
[1] https://github.com/wardgate/wardgate