The issue is that for any serious use of this concept, some manual adjustment is almost always needed. This service says, "Refine your scraper at any time by chatting with the AI agent," but from what I can tell, you can't actually see the code it generates.
Relying solely on the results and asking the AI to tweak them can work, but often the output is too tailored to a specific page and fails to generalize (essentially "overfitting.") And surprisingly, this back-and-forth can be more tedious and time-consuming than just editing a few lines of code yourself. Also if you can't directly edit the code behind the scenes, there are situations where you'll never be able to get the exact result you want, no matter how much you try to explain it to the AI in natural language.
runningmike•6h ago
renegat0x0•4h ago
My project allows to define rules for various sites, so eventually everything is scraped correctly. For YouTube yet dlp is also used to augment results.
I can crawl using requests, selenium, Httpx and others. Response is via json so it easy to process.
The downside is that it may not be the fastest solution, and I have not tested it against proxies.
https://github.com/rumca-js/crawler-buddy