frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: Is there a market for agentic scraping tools?

3•mxfeinberg•6h ago
As a long time data scientist and engineer, I've had to write a couple of quick and dirty scrapers and bots over the years using selenium and more recently playwright. I haven't really been tracking it, but I've also been reading about the crawl4ai project.

With the explosion of AI agents, I've been playing around with building agentic scrapers that can simply be given a prompt and a target site and are able to return structured data in a specified format. I've also been playing around with adding in steps that have a different model/step attempt to define the structured format dynamically.

However, as with most AI projects, the token consumption can scale pretty aggressively.

Has anyone else been working on similar projects? Would people realistically pay $0.025 to $0.03 per request?

Comments

PaulHoule•6h ago
I've been building those since 1999.

One of the weird anomalies I've been following is that people consistently overestimate how hard scraping is, in fact the horrible difficulties that people have developing GUI applications work in favor of scraping because, even though your boss is afraid that the target site is going to change and you're going to have to maintain your scraper, the boss of the guy who maintains that site is afraid that projects to make changes to it will get bogged down and besides if they change anything it will tank their SEO.

I am amazed at the poor judgement that scraper developers seem to have. At work I work on a project where you can go to

   https://our.site/item/39349109
and get really nice structured JSON and you can even add one to that number and get another valid URL. Instead people use something that downloads our React application without a cache and probably scrapes the DOM produced by it to make something like the JSON they could get just by asking.

Now... Web crawlers are totally vibe codable and people aren't intimidated anymore and are discovering just how easy it is. To be fair, in 2025 it should be possible to extract facts out of unstructured text with LLMs (at great expense) but most you can get structured data out of most web sites with CSS selectors.

I've frequently had the experience of, "I could use an API that gives me 80% of what I want with a really low rate limit if I debug their buggy OAuth implementation" vs "I can change three lines from my Flickr scraper I wrote in 2009 and it just works"

What's bothering me though is that those Cloudflare nag screens that used to be performative are really starting to screw up my crawlers [1]... and the people who are slow on the draw are waking up to the dangers of web crawlers 25 years after the cool kids did. So it is getting a lot harder, which is too bad, because Cloudflare is really locking in the Google monopoly and slamming the door in front of those trying to escape the enshittification economy.

[1] could tell you what I am doing about it but then I'd have to kill you

heldrida•4h ago
If you're concerned about the costs, you could provide the process/service but require clients to provide their own LLM token. With that being said, you'd have to rethink your service charge.

Busting the top myths about the Big Bang

https://bigthink.com/starts-with-a-bang/busting-5-myths-big-bang/
1•VignuB•1m ago•0 comments

Leveraging tokenisation for payments and financial transactions [pdf]

https://www.bis.org/publ/othp92.pdf
1•kelseyfrog•3m ago•0 comments

A Canadian's AI hoax duped the media and propelled a 'band' to success

https://www.cbc.ca/news/entertainment/ai-band-hoax-velvet-sundown-1.7575874
1•empressplay•5m ago•0 comments

Tesla opens 168 stall Supercharger station, with solar farm and big batteries

https://thedriven.io/2025/07/05/tesla-opens-168-stall-supercharger-station-with-solar-farm-and-big-batteries/
2•decimalenough•18m ago•0 comments

Ask HN: Anyone interested in improving scheduling?

1•mradek•21m ago•0 comments

A long-lost Chinese typewriter changed modern computing

https://www.npr.org/2025/07/05/nx-s1-5405452/chinese-typewriter-mingkwai-stanford
1•colinprince•22m ago•0 comments

Claudia – Desktop Companion for Claude Code

https://github.com/getAsterisk/claudia
2•thushanfernando•38m ago•0 comments

Ask HN: Martinfowler.com seems to be down – does he know?

2•mcapodici•43m ago•1 comments

Publicis Groupe Acquires Captiv8: A New Era for Influencer Marketing

https://thefinancefrontier.substack.com/p/publicis-groupe-acquires-captiv8
1•Shivam_Verma_•45m ago•1 comments

Outdated regulations are hindering smartphone battery development in Europe, US

https://www.notebookcheck.net/How-outdated-regulations-are-hindering-smartphone-battery-development-in-Europe-and-the-US.1051947.0.html
2•thunderbong•51m ago•0 comments

Police in Brazil arrest a suspect over $100M banking hack

https://apnews.com/article/brazil-hack-cyberattack-bank-5e39633b2ce3a662b90978dcf4647510
2•davikr•55m ago•0 comments

A Emoji Reverse Polish Notation Calculator Written in COBOL

https://github.com/ghuntley/cobol-emoji-rpn-calculator
2•ghuntley•1h ago•0 comments

I Shipped a macOS App Built by Claude Code

https://www.indragie.com/blog/i-shipped-a-macos-app-built-entirely-by-claude-code
1•phirschybar•1h ago•0 comments

AI Birthday Letter Blew Me Away: Google is ushering in era of custom chatbots

https://www.theatlantic.com/technology/archive/2025/07/google-drive-personalized-chatbot/683436/
2•labrador•1h ago•1 comments

Ask HN: Advice for Starting a Hacker Space?

6•pkdpic•1h ago•2 comments

Mirage: First AI-Native UGC Game Engine Powered by Real-Time World Model

https://blog.dynamicslab.ai
6•zhitinghu•1h ago•3 comments

Zig language and toolchain packaged as a deb for Debian and Ubuntu amd64/ARM64

https://github.com/clayrisser/debian-zig
4•clayrisser•1h ago•1 comments

'It's too late': David Suzuki says the fight against climate change is lost

https://www.ipolitics.ca/2025/07/02/its-too-late-david-suzuki-says-the-fight-against-climate-change-is-lost/
27•dluan•1h ago•23 comments

What Happened to the Creator of Valve's Forgotten Game – Gunman Chronicles

https://www.pcgamer.com/games/fps/what-happened-to-the-creator-of-gunman-chronicles-valves-forgotten-fps-my-relationship-with-gabe-didnt-really-go-that-great/
4•LarsDu88•1h ago•1 comments

IBM Quantum Success- Charles Tibedo's 127 qubit q-circuit w 70k Gates/20k Depth

https://twitter.com/CTibedo/status/1941606958143811765
2•GeometryKernel•1h ago•0 comments

A new way to conquer deterministic SEC filings

https://edgaranalyzer.com
3•louieteed•1h ago•0 comments

Show HN: D++lang – A new systems programming language with Python-like syntax

https://angel250511.github.io/D-/
2•jarbcopilot•1h ago•1 comments

Serving 200M requests per day with a CGI-bin

https://simonwillison.net/2025/Jul/5/cgi-bin-performance/
17•mustache_kimono•1h ago•9 comments

Soham Parekh breaks silence on defrauding companies, says he was forced to do it

https://timesofindia.indiatimes.com/world/us/im-not-proud-soham-parekh-breaks-silence-on-defrauding-companies-says-he-was-forced-to-do-it/articleshow/122235662.cms
4•romanhn•1h ago•0 comments

Discovery of ancient Roman shoes leaves a big impression

https://www.vindolanda.com/news/magna-shoes
3•geox•1h ago•0 comments

Xi Jinping's two-week absence sparks speculation of power shift within CCP

https://www.cnbctv18.com/world/chinese-president-xi-jinpings-two-week-absence-sparks-speculation-of-power-shift-within-ccp-report-19629056.htm
9•ivape•1h ago•3 comments

Only two islands in the world have population of more than 100M people

https://twitter.com/koridentetsu/status/1692831722159890752
3•matsuu•1h ago•1 comments

Britain is already a hot country. It should act like it

https://www.economist.com/britain/2025/07/03/britain-is-already-a-hot-country-it-should-act-like-it
5•_dain_•1h ago•7 comments

Science has changed, have you? Change is good

https://mnky9800n.substack.com/p/science-has-changed-have-you
2•Bluestein•2h ago•0 comments

Why Polyworking Is The Future Of Work And How To Become A Polyworker

https://www.forbes.com/sites/williamarruda/2024/11/05/why-polyworking-is-the-future-of-work-and-how-to-become-a-polyworker/
4•Anon84•2h ago•1 comments