We started building Motie a few months back with the goal of creating an “AI Data Engineer.” We took a ‘forward deployed engineer’-style approach to refine our scope (and to avoid "boiling the ocean”) and noticed that web extraction requests came up time and time again.
We also noticed that many existing tools required a lot of upfront work (defining schemas, specifying CSS selectors), while others offered data without providing the code to scrape it.
With this release, we hope to make it incredibly easy to scrape any website* while giving technical users code to build upon and less technical users an easy interface to extract the data they need.
Features
> Natural language-based extraction: simply provide a URL (https://news.ycombinator.com/) and a prompt (“Find the top 5 stories that have more than 100 points.”) > Full code ownership: all web scraping code can be exported > CSV and JSON output formats > Hosted scheduling and orchestration
Current Limitations
> This release does not include support for proxies. *Scraping websites like Amazon and eBay is thus not well supported at this time. (That said, we’ve noticed a very long tail of websites that don’t require proxies!)
We’ve tried to make getting started as easy and frictionless as possible (e.g., you can use Google or GitHub SSO), and we’d love to hear the HN community’s thoughts!