We know there's a lot of noise about different browser agents. If you've tried any of them, you know they're slow, expensive, and inconsistent. That's why we built an agent specifically for running test cases and optimized it just for that:
- Pure vision instead of error prone "set-of-marks" system (the colorful boxes you see in browser-use for example)
- Use tiny VLM (Moondream) instead of OpenAI/Anthropic computer use for dramatically faster and cheaper execution
- Use two agents: one for planning and adapting test cases and one for executing them quickly and consistently.
The idea is the planner builds up a general plan which the executor runs. We can save this plan and re-run it with only the executor for quick, cheap, and consistent runs. When something goes wrong, it can kick back out to the planner agent and re-adjust the test.
It’s completely open source. Would love to have more people try it out and tell us how we can make it great.
grbsh•12h ago
anerli•11h ago
Where it gets interesting, is that we can save the execution plan that the big model comes up with and run with ONLY Moondream if the plan is specific enough. Then switch back out to the big model if some action path requires adjustment. This means we can run repeated tests much more efficiently and consistently.
grbsh•11h ago
tough•11h ago
anerli•11h ago
tough•11h ago
there's also https://github.com/lm-sys/RouteLLM
and other similar
I guess your system is not as open-ended task oriented so you can just build workflows deciding which model to use at each step, these routing mechanisms are more useful for open-ended tasks that dont fit on a workflow so well (maybe?)