Turn any website into an API

https://www.parse.bot

34•pcl•7h ago

Comments

runningmike•6h ago

Nice idea. In practice many sites have different methods to prevent scraping. Large risk on doing things manually imho.

renegat0x0•4h ago

Huh, I I have been working on solution to that problem.

My project allows to define rules for various sites, so eventually everything is scraped correctly. For YouTube yet dlp is also used to augment results.

I can crawl using requests, selenium, Httpx and others. Response is via json so it easy to process.

The downside is that it may not be the fastest solution, and I have not tested it against proxies.

https://github.com/rumca-js/crawler-buddy

with•5h ago

pretty cool idea. using stagehand under the hood?

vin047•4h ago

No information on pricing on the site.

thrdbndndn•3h ago

I scrape website content regularly (usually as one-offs) and have a hand-crafted extractor template where I just fill in a few arguments (mainly CSS selectors and some options) to get it working quickly. These days, I do sometimes ask AI to do this for me by giving it the HTML.

The issue is that for any serious use of this concept, some manual adjustment is almost always needed. This service says, "Refine your scraper at any time by chatting with the AI agent," but from what I can tell, you can't actually see the code it generates.

Relying solely on the results and asking the AI to tweak them can work, but often the output is too tailored to a specific page and fails to generalize (essentially "overfitting.") And surprisingly, this back-and-forth can be more tedious and time-consuming than just editing a few lines of code yourself. Also if you can't directly edit the code behind the scenes, there are situations where you'll never be able to get the exact result you want, no matter how much you try to explain it to the AI in natural language.

websiteapi•17m ago

I'm surprised (and could be wrong), no one has made a chrome extension that just controls a page and exposes the output to localhost for consumption as an API. Similar to using chrome web driver, but without the setup.

Ultrathin business card runs a fluid simulation

GPT-5

Linear sent me down a local-first rabbit hole

Window Activation

Amtrak NextGen Acela Debuts on August 28

Flipper Zero dark web firmware bypasses rolling code security

Historical Tech Tree

How we enforce .NET coding standards to improve productivity

Cursor CLI

OpenAI's new open-source model is basically Phi-5

GPT-5: Key characteristics, pricing and system card

How Attention Sinks Keep Language Models Stable

A love letter to my future employer (2020)

Virtual Linux Devices on ARM64

What Is Popover=Hint?

Exit Tax: Leave Germany before your business gets big

GPT-5 for Developers

FLUX.1-Krea and the Rise of Opinionated Models

Turn any website into an API

Writing a storage engine for Postgres: An in-memory table access method (2023)

Encryption made for police and military radios may be easily cracked

Cursed Knowledge

Achieving 10,000x training data reduction with high-fidelity labels

Building Bluesky comments for my blog

How AI conquered the US economy: A visual FAQ

Windows XP Professional

Claude Code IDE integration for Emacs

Benchmark Framework Desktop Mainboard and 4-node cluster

I don't read your email threads

Infinite Pixels

Ultrathin business card runs a fluid simulation

GPT-5

Linear sent me down a local-first rabbit hole

Window Activation

Amtrak NextGen Acela Debuts on August 28

Flipper Zero dark web firmware bypasses rolling code security

Historical Tech Tree

How we enforce .NET coding standards to improve productivity

Cursor CLI

OpenAI's new open-source model is basically Phi-5

GPT-5: Key characteristics, pricing and system card

How Attention Sinks Keep Language Models Stable

A love letter to my future employer (2020)

Virtual Linux Devices on ARM64

What Is Popover=Hint?

Exit Tax: Leave Germany before your business gets big

GPT-5 for Developers

FLUX.1-Krea and the Rise of Opinionated Models

Turn any website into an API

Writing a storage engine for Postgres: An in-memory table access method (2023)

Encryption made for police and military radios may be easily cracked

Cursed Knowledge

Achieving 10,000x training data reduction with high-fidelity labels

Building Bluesky comments for my blog

How AI conquered the US economy: A visual FAQ

Windows XP Professional

Claude Code IDE integration for Emacs

Benchmark Framework Desktop Mainboard and 4-node cluster

I don't read your email threads

Infinite Pixels

Turn any website into an API

Comments