frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Convert tempo (BPM) to millisecond durations for musical note subdivisions

https://brylie.music/apps/bpm-calculator/
1•brylie•2m ago•0 comments

Show HN: Tasty A.F.

https://tastyaf.recipes/about
1•adammfrank•2m ago•0 comments

The Contagious Taste of Cancer

https://www.historytoday.com/archive/history-matters/contagious-taste-cancer
1•Thevet•4m ago•0 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
1•alephnerd•4m ago•0 comments

Bithumb mistakenly hands out $195M in Bitcoin to users in 'Random Box' giveaway

https://koreajoongangdaily.joins.com/news/2026-02-07/business/finance/Crypto-exchange-Bithumb-mis...
1•giuliomagnifico•4m ago•0 comments

Beyond Agentic Coding

https://haskellforall.com/2026/02/beyond-agentic-coding
2•todsacerdoti•6m ago•0 comments

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

https://www.loom.com/embed/e26a750c0c754312b032e2290630853d
1•kaicianflone•7m ago•0 comments

OpenBSD Copyright Policy

https://www.openbsd.org/policy.html
1•Panino•8m ago•0 comments

OpenClaw Creator: Why 80% of Apps Will Disappear

https://www.youtube.com/watch?v=4uzGDAoNOZc
1•schwentkerr•12m ago•0 comments

What Happens When Technical Debt Vanishes?

https://ieeexplore.ieee.org/document/11316905
1•blenderob•13m ago•0 comments

AI Is Finally Eating Software's Total Market: Here's What's Next

https://vinvashishta.substack.com/p/ai-is-finally-eating-softwares-total
2•gmays•14m ago•0 comments

Computer Science from the Bottom Up

https://www.bottomupcs.com/
2•gurjeet•14m ago•0 comments

Show HN: A toy compiler I built in high school (runs in browser)

https://vire-lang.web.app
1•xeouz•16m ago•0 comments

You don't need Mac mini to run OpenClaw

https://runclaw.sh
1•rutagandasalim•17m ago•0 comments

Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
1•nicholascarolan•19m ago•0 comments

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

https://arxiv.org/abs/2601.22389
1•energyscholar•19m ago•1 comments

Ask HN: Will GPU and RAM prices ever go down?

1•alentred•19m ago•0 comments

From hunger to luxury: The story behind the most expensive rice (2025)

https://www.cnn.com/travel/japan-expensive-rice-kinmemai-premium-intl-hnk-dst
2•mooreds•20m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
5•mindracer•21m ago•0 comments

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

https://www.wsj.com/finance/currencies/a-new-crypto-winter-is-here-and-even-the-biggest-bulls-are...
1•thm•21m ago•0 comments

Moltbook was peak AI theater

https://www.technologyreview.com/2026/02/06/1132448/moltbook-was-peak-ai-theater/
1•Brajeshwar•22m ago•0 comments

Why Claude Cowork is a math problem Indian IT can't solve

https://restofworld.org/2026/indian-it-ai-stock-crash-claude-cowork/
2•Brajeshwar•22m ago•0 comments

Show HN: Built an space travel calculator with vanilla JavaScript v2

https://www.cosmicodometer.space/
2•captainnemo729•22m ago•0 comments

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•Brajeshwar•22m ago•0 comments

Micro-Front Ends in 2026: Architecture Win or Enterprise Tax?

https://iocombats.com/blogs/micro-frontends-in-2026
2•ghazikhan205•25m ago•1 comments

These White-Collar Workers Actually Made the Switch to a Trade

https://www.wsj.com/lifestyle/careers/white-collar-mid-career-trades-caca4b5f
1•impish9208•25m ago•1 comments

The Wonder Drug That's Plaguing Sports

https://www.nytimes.com/2026/02/02/us/ostarine-olympics-doping.html
1•mooreds•25m ago•0 comments

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

https://new.knife.day/blog/reddit-steel-sentiment-analysis
1•p-s-v•26m ago•0 comments

Federated Credential Management (FedCM)

https://ciamweekly.substack.com/p/federated-credential-management-fedcm
1•mooreds•26m ago•0 comments

Token-to-Credit Conversion: Avoiding Floating-Point Errors in AI Billing Systems

https://app.writtte.com/read/kZ8Kj6R
1•lasgawe•26m ago•1 comments
Open in hackernews

Ask HN: Is there a market for agentic scraping tools?

3•mxfeinberg•7mo ago
As a long time data scientist and engineer, I've had to write a couple of quick and dirty scrapers and bots over the years using selenium and more recently playwright. I haven't really been tracking it, but I've also been reading about the crawl4ai project.

With the explosion of AI agents, I've been playing around with building agentic scrapers that can simply be given a prompt and a target site and are able to return structured data in a specified format. I've also been playing around with adding in steps that have a different model/step attempt to define the structured format dynamically.

However, as with most AI projects, the token consumption can scale pretty aggressively.

Has anyone else been working on similar projects? Would people realistically pay $0.025 to $0.03 per request?

Comments

PaulHoule•7mo ago
I've been building those since 1999.

One of the weird anomalies I've been following is that people consistently overestimate how hard scraping is, in fact the horrible difficulties that people have developing GUI applications work in favor of scraping because, even though your boss is afraid that the target site is going to change and you're going to have to maintain your scraper, the boss of the guy who maintains that site is afraid that projects to make changes to it will get bogged down and besides if they change anything it will tank their SEO.

I am amazed at the poor judgement that scraper developers seem to have. At work I work on a project where you can go to

   https://our.site/item/39349109
and get really nice structured JSON and you can even add one to that number and get another valid URL. Instead people use something that downloads our React application without a cache and probably scrapes the DOM produced by it to make something like the JSON they could get just by asking.

Now... Web crawlers are totally vibe codable and people aren't intimidated anymore and are discovering just how easy it is. To be fair, in 2025 it should be possible to extract facts out of unstructured text with LLMs (at great expense) but most you can get structured data out of most web sites with CSS selectors.

I've frequently had the experience of, "I could use an API that gives me 80% of what I want with a really low rate limit if I debug their buggy OAuth implementation" vs "I can change three lines from my Flickr scraper I wrote in 2009 and it just works"

What's bothering me though is that those Cloudflare nag screens that used to be performative are really starting to screw up my crawlers [1]... and the people who are slow on the draw are waking up to the dangers of web crawlers 25 years after the cool kids did. So it is getting a lot harder, which is too bad, because Cloudflare is really locking in the Google monopoly and slamming the door in front of those trying to escape the enshittification economy.

[1] could tell you what I am doing about it but then I'd have to kill you

heldrida•7mo ago
If you're concerned about the costs, you could provide the process/service but require clients to provide their own LLM token. With that being said, you'd have to rethink your service charge.