The subroutine itself is a deterministic script composed of discovered network calls hitting the site's backend as well as page interactions like click/type/find.
The key architectural decision: the script executes inside the webpage itself, not through a proxy, not in a headless worker, not out of process. The script dispatches requests from the tab's execution context, so auth, CSRF, TLS session, and signed headers get added to all requests and propagate for free. No certificate installation, no TLS fingerprint modification, no separate auth stack to maintain.
During recording, the extension intercepts network requests (MAIN-world fetch/XHR patch + webRequest fallback). We score and trim ~300 requests down to ~5 based on method, timing relative to DOM events, and origin. Volatile GraphQL operation IDs are detected and force a DOM-only fallback before they break silently on the next run.
The generated code combines network calls with DOM actions (click, type, find) in the same function via an rtrvr.* helper namespace. Point the agent at a spreadsheet of 500 rows and with just one LLM call parameters are assigned and 500 Subroutines kicked off.
Key use cases:
- record sending IG DM, then have reusable and callable routine to send DMs at zero token cost
- create routine getting latest products in site catalog, call it to get thousands of products via direct graphql queries
- setup routine to file EHR form based on parameters to the tool, AI infers parameters from current page context and calls tool
- reuse routine daily to sync outbound messages on LinkedIn/Slack/Gmail to a CRM using a MCP server
We see the fundamental reason that browser agents haven't taken off is that for repetitive tasks going through the inference loop is unnecessary. Better to just record once, and get the LLM to generate a script leveraging all the possible ways to interact with a site and the wider web like directly calling backed API's, interacting with the DOM, and calling 3P tools/APIs/MCP servers.
quarkcarbon279•1h ago