I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?
From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.
We can ask the vision based models to output why they are doing what they are doing, and fallback to code-based approaches for subsequent runs
Is AI capable of saying, "This website sucks, and doesn't work - file a complaint with the webmaster?"
I once had similar problems with the CIA's World Factbook. I shudder to think what an I would do there.
Skyvern kept suggesting improvements unrelated to the issue they were testing for
The AI isn’t mad, and won’t refuse to renew. Unless it’s being run by the client of course.
Are clients using your platform to assess vendors?
Not fully equivalent to what is doing Skyvern, but still an interesting approach.
[1] https://www.reddit.com/r/LocalLLaMA/comments/1o8m0ti/we_buil...
Thanks for sharing!
And then the third or fourth time its automatic. Its weird but sometimes I feel like the best way to make agents work is to metathink about how I myself work.
You don’t get that whole uncanny valley disconnect do you?
The person is the data that they have ingested and trained on through the senses that are exposed by their body. Body is just an interface to reality.
That being said...
LLMS are amazing for some coding tasks and fail miserably at others. My hypothesis is that there is some sort of practical limit to how many concepts an LLM can hold into account no matter the context window given the current model architectures.
For a long time I wanted to find some sort of litmus test to measure this and I think I found one that is an easy to understand programming problem, can be done in a single file, yet complex enough. I have not found a single LLM to be able to build a solution without careful guidance.
I wrote more about this here if you are interested: https://chatbotkit.com/reflections/where-ai-coding-agents-go...
What used to be a constant almost daily chore with them breaking all the time at random intervals is now a self-healing system that rarely ever fails.
showerst•10h ago
If a website isn't using Cloudflare or a JS-only design, it's generally better to skip playwright. All the major AIs understand beautifulsoup pretty well, and they're likely to write you a faster, less brittle scraper.
pavel_lishin•10h ago
Etheryte•9h ago
showerst•9h ago
At scale, dropping the heavier dependencies and network traffic of a browser is meaningful.
suchintan•9h ago
suchintan•9h ago
They aren't enough for anything that's login-protected, or requires interacting with wizards (eg JS, downloading files, etc)