It can also adapt by creating its own tools if its 60+ built in tools fall short. It runs in a debian docker container with chromium and you interact with it through the terminal. Watch it work through a VNC viewer
Comments
Unical-A•1h ago
Interesting approach. How are you handling the DOM processing inside the sandbox without spiking CPU usage? If it's not making constant API calls, is the vision model running locally (WASM/WebGPU), or are you using a clever way to diff the page state before sending it to the LLM?
grimm8080•1h ago
Yes, it is kind of a clever way to diff the page state. The ai agent has some built in tools to deal with this. The user gives a prompt say "Watch for X words" The LLM then runs the provided tool with the necessary args. The tool then runs a python loop to check for it in the DOM while the LLM sleeps. Then once it's found the LLM is awoken. Also, there's a tool for watching for changes in pixels in certain regions. It works in a similar way.
Unical-A•1h ago
grimm8080•1h ago