frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: Claude Skills and Bs4 for intelligent scraping

1•poch4nga•2h ago
while building https://polyscores.xyz (soccer betting for polymarket) as a side project, I was scraping a number of data sources that feed the front end and my decision making during live games. but the problem that I have is that there is so many different data models and parts of the source websites that I am scraping - I don't want to have a dedicated solution for all them. I want something that can do all at once with a simple prompt

While doing all this scraping without any agent help, seeing claude skills' (I am an anthropic power user) ability to run scripts - I want to build a generic scraper that can do a number of things:

1) create data models based on the data it sees

2) create target data stores, like a postgres table based on a model from 1

3) scrape the data using claude agent sdk + playwright mcp

4) dump the data into the storages it creates

this can be utilised for so many different problems. but looking at the scope of the problem, this might be its own company :D

my questions are more of a market discovery:

- if people know of anyone building something similar?

- is someone from the audience would find it useful?

I think I will build a small version for myself without 1 and 2 but would love to know what people think

Comments

DatMule•2h ago
careful anthropic, well all top, but I have direct copyright violations oct22/'25 documented when they settled copyright violations for $1.5B usd on oct09/'25. Wouldn't say it if I didn't have screenshot and recording proof. just a heads up is all tho. Claude does Code well. usually.
poch4nga•1h ago
sorry I don't get this at all

Boost E2E Testing with Keploy Test Case Generator Learn How – Keploy

https://keploy.io/test-case-generator
1•Keployio•2m ago•0 comments

EsLibre 2026: Spain's open source event (mostly in Spanish), Apr 17–18

https://eslib.re/2026/es/
1•mdtrooper•5m ago•0 comments

NeXT vs. Sun: which was faster to code for?

https://www.youtube.com/watch?v=UGhfB-NICzg
1•chorlton2080•5m ago•0 comments

The rise of coding with parallel agents

https://leaddev.com/technical-direction/the-rise-of-coding-with-parallel-agents
1•scarey101•6m ago•0 comments

The Elm Architecture

https://guide.elm-lang.org/architecture/
1•alabhyajindal•6m ago•0 comments

Language Models Are Injective and Hence Invertible

https://arxiv.org/abs/2510.15511
1•qsort•9m ago•0 comments

Six Words Every Killer Should Know: 'I Feared for My Life, Officer'

https://www.wsj.com/us-news/homicide-standground-law-crime-f25bd211
1•impish9208•10m ago•1 comments

DeepSeek-OCR:10x Compression and 97% Accuracy Beats Tesseract and PaddleOCR

https://deepocr.cc/blog/deepseek-ocr-vs-tesseract-vs-paddleocr-2025-best-deep-ocr-tool-comparison
1•Karaoke•11m ago•2 comments

Three tough truths about climate

https://www.gatesnotes.com/home/home-page-topic/reader/helping-the-worlds-poorest-adapt-to-climat...
1•nhatcher•15m ago•0 comments

Why Doesn't Anyone Trust the Media?

https://harpers.org/archive/2025/11/why-doesnt-anyone-trust-the-media-jelani-cobb-taylor-lorenz-j...
2•bryanrasmussen•17m ago•0 comments

Pypkg.guru: The fastest way to find Python Packages

https://pypkg.guru
1•fbrdm•17m ago•0 comments

Springs and Bounces in Native CSS

https://www.joshwcomeau.com/animation/linear-timing-function/
2•Bogdanp•17m ago•0 comments

Distributing Your MCP Servers

https://www.speakeasy.com/mcp/distributing-mcp-servers
1•ritzaco•18m ago•0 comments

YouTube is taking down videos on performing nonstandard Windows 11 installs

https://old.reddit.com/r/DataHoarder/comments/1oiz0v0/youtube_is_taking_down_videos_on_performing/
9•jjbinx007•20m ago•1 comments

Helping protect the 2025 Moldova elections

https://blog.cloudflare.com/helping-protect-the-2025-moldova-elections/
2•fleahunter•25m ago•0 comments

What I Don't Care About in Software

https://blainsmith.com/articles/i-dont-care/
1•gm678•26m ago•1 comments

First Look at Java Valhalla: Flattening and Memory Alignment of Value Objects

https://substack.com/inbox/post/177347562
1•pjmlp•27m ago•0 comments

International Klein Blue and Customised Commodities

https://squirrelsquadron.substack.com/p/international-klein-blue-the-ipt
1•squirrel•30m ago•0 comments

TrueAnon Saw How Twisted Politics Were About to Get. Here's What Is Coming Next

https://www.gq.com/story/trueanon-podcast-profile
1•dluan•34m ago•0 comments

SpiderMonkey Garbage Collector

https://firefox-source-docs.mozilla.org/js/gc.html
7•sebg•40m ago•0 comments

A River Restoration in Oregon Gets Fast Results: The Salmon Swam Right Back

https://www.nytimes.com/2025/10/29/climate/klamath-salmon-recovery.html
2•fleahunter•42m ago•0 comments

Ask HN: How does HN avoid AI generated content in the comments?

2•hamburgererror•44m ago•3 comments

Sora2 AI Video Studio

https://www.jxp.com/sora
1•cy1414569•45m ago•1 comments

I Tried the First Humanoid Home Robot. It Got Weird [video]

https://www.youtube.com/watch?v=f3c4mQty_so
1•JamesAdir•47m ago•0 comments

Show HN: AI PM Evaluation Framework (Open Source)

https://aipmframework.com/
2•abediaz•47m ago•0 comments

Recent Rust Changes

https://www.ncameron.org/blog/recent-rust-changes/
1•birdculture•54m ago•0 comments

Show HN: Zrc - a Unix shell without fi, esac or done

https://github.com/Edd12321/zrc
1•edward_9x•54m ago•0 comments

Jellyfin for Android TV 0.19

https://jellyfin.org/posts/androidtv-v0.19.0/
1•edent•57m ago•0 comments

Your AI Visibility Dashboard Is Measuring Yesterday's Web, Not Today's Model

https://www.aivojournal.org/your-ai-visibility-dashboard-is-measuring-yesterdays-web-not-todays-m...
2•businessmate•1h ago•1 comments

Australian Federal Police to develop LLM for decoding GenZ slang

https://www.theguardian.com/australia-news/2025/oct/29/afp-developing-ai-tool-to-decode-gen-z-and...
3•L_226•1h ago•1 comments