I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?
From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.
We can ask the vision based models to output why they are doing what they are doing, and fallback to code-based approaches for subsequent runs
Is AI capable of saying, "This website sucks, and doesn't work - file a complaint with the webmaster?"
I once had similar problems with the CIA's World Factbook. I shudder to think what an I would do there.
Not fully equivalent to what is doing Skyvern, but still an interesting approach.
[1] https://www.reddit.com/r/LocalLLaMA/comments/1o8m0ti/we_buil...
showerst•56m ago
If a website isn't using Cloudflare or a JS-only design, it's generally better to skip playwright. All the major AIs understand beautifulsoup pretty well, and they're likely to write you a faster, less brittle scraper.
pavel_lishin•54m ago
Etheryte•36m ago
showerst•29m ago
At scale, dropping the heavier dependencies and network traffic of a browser is meaningful.
suchintan•25m ago
suchintan•25m ago
They aren't enough for anything that's login-protected, or requires interacting with wizards (eg JS, downloading files, etc)