Show HN: Motie – Replit for Web Scraping

https://app.motie.dev

4•jb_hn•1mo ago

Hey HN, Justin here. We’re building Motie (https://app.motie.dev), an AI agent that extracts structured data from the web and generates web scraping code, using natural language.

We started building Motie a few months back with the goal of creating an “AI Data Engineer.” We took a ‘forward deployed engineer’-style approach to refine our scope (and to avoid "boiling the ocean”) and noticed that web extraction requests came up time and time again.

We also noticed that many existing tools required a lot of upfront work (defining schemas, specifying CSS selectors), while others offered data without providing the code to scrape it.

With this release, we hope to make it incredibly easy to scrape any website* while giving technical users code to build upon and less technical users an easy interface to extract the data they need.

Features

> Natural language-based extraction: simply provide a URL (https://news.ycombinator.com/) and a prompt (“Find the top 5 stories that have more than 100 points.”) > Full code ownership: all web scraping code can be exported > CSV and JSON output formats > Hosted scheduling and orchestration

Current Limitations

> This release does not include support for proxies. *Scraping websites like Amazon and eBay is thus not well supported at this time. (That said, we’ve noticed a very long tail of websites that don’t require proxies!)

We’ve tried to make getting started as easy and frictionless as possible (e.g., you can use Google or GitHub SSO), and we’d love to hear the HN community’s thoughts!

Comments

xmcp123•1mo ago

Ya know, I was ready to downvote this (AI scraping is not my favorite) but I’m not going to.

It really does have its niche - one off complex scrapes where it’s kind of questionable if it’s worth writing a scraper.

jb_hn•1mo ago

Haha I appreciate that! And that’s exactly right. Our goal is to make it so that you don’t have to ask the question “but is it worth the time and effort…” when you want to use or explore a new dataset.

theanonymousone•1mo ago

> we’ve noticed a very long tail of websites that don’t require proxies

That tail seems to be getting harshly slaughtered by Cloudflare.

jb_hn•1mo ago

Good point – we’ve definitely noticed a lot more Cloudflare representation these days. That said, there seems to be tiers in terms of the protection they offer (and thus the protection used by the websites in this long-tail), where lower tiers (so far) haven’t required proxies.

Curious if you’ve noticed any particularly well defined, obscure websites? Would love to take a look if so.

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?