frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Asking AI to build scrapers should be easy right?

https://www.skyvern.com/blog/asking-ai-to-build-scrapers-should-be-easy-right/
27•suchintan•1h ago

Comments

showerst•56m ago
A point orthogonal to this; consider whether you need browser automation at all.

If a website isn't using Cloudflare or a JS-only design, it's generally better to skip playwright. All the major AIs understand beautifulsoup pretty well, and they're likely to write you a faster, less brittle scraper.

pavel_lishin•54m ago
If.
Etheryte•36m ago
The vast majority of the modern internet falls into one of those two buckets though, no?
showerst•29m ago
I mostly scrape government data so the sites are a little 'behind' on that trend, but no. Even JS heavy sites are almost always pulling from a JSON or graphql source under the hood.

At scale, dropping the heavier dependencies and network traffic of a browser is meaningful.

suchintan•25m ago
Yeah, reverse engineering APIs is another fantastic approach. They aren't enough if you are dealing with wizards (eg typeform), but they can work really well
suchintan•25m ago
IF you can use crawlers, definitely do.

They aren't enough for anything that's login-protected, or requires interacting with wizards (eg JS, downloading files, etc)

philipbjorge•52m ago
We had a similar realization here at Thoughtful and pivoted towards code generation approaches as well.

I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?

From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.

suchintan•26m ago
I think they're complimentary, and that's the direction we're headed.

We can ask the vision based models to output why they are doing what they are doing, and fallback to code-based approaches for subsequent runs

ahstilde•51m ago
this matches our personal experience, too
franze•48m ago
In AI First workshops. By now I tell them for the last exercise "no scrappers". the learning is to separate reasoning (AI) from data (that you have to bring.) and ai coded scrappers seem a logical, but always fail. scrapping is a scaling issue, not reasoning challenge. also the most interesting websites are not keen for new scrappers.
pyuser583•18m ago
Over the past few days I've spent a lot of time dealing with terribly designed UIs. Some legitimate and desired use cases are impossible because poor logic excludes them.

Is AI capable of saying, "This website sucks, and doesn't work - file a complaint with the webmaster?"

I once had similar problems with the CIA's World Factbook. I shudder to think what an I would do there.

nithril•14m ago
The same day, a post on reddit was about: "We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source" [1].

Not fully equivalent to what is doing Skyvern, but still an interesting approach.

[1] https://www.reddit.com/r/LocalLLaMA/comments/1o8m0ti/we_buil...

Kagi Specials

https://blog.kagi.com/kagi-specials
1•wofo•2m ago•0 comments

Zendesk seems to be compromised: tickets created without email verification

https://phpc.social/@bobmagicii/115375953781800473
1•jeroenhd•2m ago•0 comments

Apple and F1 reach 5-year media deal for US broadcasts

https://www.cnbc.com/2025/10/17/apple-f1-media-deal-streaming.html
2•vaadu•3m ago•1 comments

Renaming the Default Branch of Rust

https://blog.rust-lang.org/inside-rust/2025/10/16/renaming-the-default-branch-of-rust-langrust/
3•0xedb•4m ago•0 comments

Developing for web browsers is a lot of fun

1•logtrees•5m ago•0 comments

Show HN: I turned my resume into a catchy song. It's a game changer

https://suno.com/song/eac3ef90-4f83-4343-b31b-6895b8cf4166
1•rmtbb•7m ago•0 comments

Xi Is Never Giving Up His Newfound Leverage over Trump

https://www.bloomberg.com/news/features/2025-10-17/xi-uses-rare-earth-minerals-dominance-to-turn-...
1•zerosizedweasle•10m ago•2 comments

Europe unveils plans for 'drone wall' to shield continent from Russian threats

https://www.abc.net.au/news/2025-10-17/europe-drone-wall-defence-system-russia-threat-incursions/...
1•breve•13m ago•0 comments

List of classical music concerts with an unruly audience response

https://en.wikipedia.org/wiki/List_of_classical_music_concerts_with_an_unruly_audience_response
1•kblissett•16m ago•0 comments

Rattled Wall Street on Alert After Trillion-Dollar Risk Runup

https://www.bloomberg.com/news/articles/2025-10-17/rattled-wall-street-on-alert-after-trillion-do...
1•zerosizedweasle•16m ago•1 comments

AI Helped Bootstrap Our Startup: Atlas

https://atlascdt.substack.com/p/how-ai-helped-bootstrap-our-startup
2•duane1024•16m ago•0 comments

A petabyte worth of Omarchy in a month

https://world.hey.com/dhh/a-petabyte-worth-of-omarchy-in-a-month-a1fc538e
6•chilipepperhott•18m ago•0 comments

SIM-Swap Attack Victim

https://mayberay.bearblog.dev/so-i-was-the-victim-of-a-sim-swap-attack/
1•mugamuga•19m ago•0 comments

Artificial Intelligence – A Modern Approach (visualization of concepts)

http://aimacode.github.io/aima-javascript/
2•swatson741•22m ago•0 comments

Show HN: Quickmark Lightweight Bookmark Manager

https://github.com/drkpxl/QuickMark
1•stevenhubertron•22m ago•0 comments

MoonScript – A programmer friendly language that compiles to Lua

https://moonscript.org/
1•shakna•24m ago•0 comments

TSMC moves up 2nm production plans in Arizon

https://www.tomshardware.com/tech-industry/tsmc-moves-up-2nm-production-plans-in-arizona-ceo-also...
7•yboris•25m ago•0 comments

Free Solfege Books

https://www.solfegebooks.com/blogs/10-free-solfege-books-pdf
1•protik49•27m ago•0 comments

Republicans use deepfake video of Chuck Schumer in new attack ad

https://www.theguardian.com/us-news/2025/oct/17/republican-ad-deepfake-video-chuck-schumer
3•asib•27m ago•0 comments

Why are mothers in the developed world are having less kids?

https://www.governance.fyi/p/35-why-is-child-per-mother-cpm-is
4•guardianbob•28m ago•1 comments

Show HN: Privacy-first work journal with no backend

https://scribe-notes.com/
2•napping_penguin•32m ago•0 comments

The Majority AI View

https://www.anildash.com//2025/10/17/the-majority-ai-view/
5•FromTheArchives•35m ago•2 comments

Poppy Game Insult to Our War Dead [Ahoy] [video]

https://www.youtube.com/watch?v=LPnOVK1766E
1•instagraham•37m ago•0 comments

Show HN: ASCII Automata

https://hlnet.neocities.org/ascii-automata/
1•california-og•39m ago•0 comments

Code from MIT's 1986 SICP video lectures

https://github.com/felipap/sicp-code
15•felipap•41m ago•0 comments

Air pollution modulates brown adipose tissue function

https://insight.jci.org/articles/view/187023
1•PaulHoule•42m ago•0 comments

Women Running from Houses

https://womenrunningfromhouses.blogspot.com/?m=1
1•triptych•42m ago•0 comments

Debian TC Overrules systemd Maintainers on /var/lock Permissions

https://linuxiac.com/debian-tc-overrules-systemd-maintainers-on-var-lock-permissions/
3•7bit•42m ago•0 comments

Basis Trade Helps Mask Who Owns $1.4T of U.S. Treasuries

https://www.bloomberg.com/news/articles/2025-10-17/fed-staff-query-missing-1-4-trillion-hedge-fun...
3•clanky•44m ago•1 comments

Kremlin envoy proposes 'Putin-Trump tunnel' to link Russia and US

https://www.reuters.com/world/kremlin-envoy-proposes-putin-trump-tunnel-link-russia-us-2025-10-17/
2•coffeecoders•44m ago•3 comments