I initially built this to solve a personal headache: I wanted an AI agent to handle project management tasks on my company’s intranet. I needed it to persist cookies across sessions (to handle SSO) and then scrape a Kanban board.
Existing AI browser tools (like current MCP implementations) often force unsolicited data into the context window—dumping the full accessibility tree, console logs, and network errors whether you asked for them or not.
webctl is an attempt to solve this with a Unix-style CLI:
- Filter before context: You pipe the output to standard tools. webctl snapshot --interactive-only | head -n 20 means the LLM only sees exactly what I want it to see.
- Daemon Architecture: It runs a persistent background process. The goal is to keep the browser state (cookies/session) alive while you run discrete, stateless CLI commands.
- Semantic targeting: It uses ARIA roles (e.g., role=button name~="Submit") rather than fragile CSS selectors.
Disclaimer: The daemon logic for state persistence is still a bit experimental, but the architecture feels like the right direction for building local, token-efficient agents.
It’s basically "Playwright for the terminal."
philipbjorge•3w ago
How is it different?
hugs•3w ago
"browser automation for ai agents" is a popular idea these days.
cosinusalpha•3w ago
The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles/semantics (e.g. role=button name="Save") rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.
Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I'd love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.