We are the team behind Tabstack (https://tabstack.ai) - part of Mozilla. We just open sourced Pilo (pronounce PIE-low), the core engine that powers our automation platform. You can check it out on Github at https://github.com/mozilla/pilo.
Pilo is an agentic web automation library. Instead of writing rigid scripts with CSS selectors, you give it a natural language goal (e.g., "Find the best pizza in Seattle and extract the ratings") and it autonomously navigates the browser to achieve it.
We built this because we were struggling to make reliable agents for our own /automate endpoint. Existing tools were either too brittle (breaking on minor DOM changes) or too heavy (feeding raw HTML to LLMs, blowing up context windows).
Here is how Pilo solves those problems:
- Accessibility Tree over HTML: Instead of parsing raw HTML "soup," Pilo captures the browser's accessibility tree (via Playwright's _snapshotForAI). This gives the LLM a semantic, stable view of the page (buttons, links, inputs) rather than div hell.
- Context Compression: We pipe that tree through a compression engine. We map verbose tags (like listitem -> li), shorten reference IDs, and deduplicate repetitive text. This reduces token usage by 60-80% without losing interactive elements, allowing for much longer agent loops.
- Layered Error Handling: The web is flaky. Pilo treats navigation failures as distinct from interaction failures. It uses timeout escalation for network issues (doubling wait times) and will automatically restart the browser instance if it detects a "stuck" state or DNS failure.
- Agentic Loop: It follows a strict Plan -> Observe -> Act -> Validate loop. It even includes a separate validation step where a second LLM "grades" the final output against the original success criteria before returning it.
The "Cool" Part (Browser Extension) Since the core logic is decoupled from the runtime, we packaged it into a browser extension. You can install it, type a prompt, and literally watch the agent drive your local browser tab in real-time. It’s a great way to debug how the LLM "sees" the page.
Why Open Source? We sell the managed infrastructure (scaling browsers, persistent sessions, etc.) at Tabstack. But the execution engine itself, the thing that decides "click here" or "scroll there", should be open. You can run Pilo entirely on your own machine with your own API keys without paying us a dime.
You can read more about it on our blog https://tabstack.ai/blog/introducing-pilo-browser-automation.
Or check out the repo, install it, and give it a try - https://github.com/mozilla/pilo
We’d love to hear your feedback on the compression pipeline or how you’re handling agent state in your own projects.
Happy to answer any questions!
verdverm•1h ago
1. I now have to understand N frameworks, their quirks and handles, their prompts and tools. I certainly don't want to be locked into their strict loop definition.
2. Most of them could be extensions, even just a skill, within other frameworks
I prefer to remain a minimalist for now and use projects like this for inspiration
MrTravisB•1h ago
Choices are great, and our goal is to let you piece together a setup to your own liking. We want Pilo to work with your existing tools, not against them. If you just want to rip out our accessibility tree compression pipeline and use it as a standalone skill in your own custom framework, we consider that a massive win.
That is exactly why we are open sourcing it. We want to see what others can do with it.
If there is a framework or tool this could work with but does not currently, we would love to hear about it.
verdverm•19m ago
I'll open an issue for tracking
---
said issue: https://github.com/mozilla/pilo/issues/318