frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Reader – open-source web scraping engine built for LLMs

https://github.com/vakra-dev/reader
1•nihalwashere•1h ago
I've been building AI agents that needed web access and kept hitting the same wall: production scraping is hard. You start with Puppeteer, then add stealth plugins, then fight Cloudflare, then manage proxies, then handle browser pooling and it still breaks.

I kept solving this problem from scratch on different projects, so I packaged it up as Reader, hoping it saves others the same headaches...

Two primitives:

  const reader = new ReaderClient();
  
  // Scrape URLs → clean markdown
  const result = await reader.scrape({ urls: ["https://example.com"] });
  
  // Crawl a site → discover + scrape pages
  const pages = await reader.crawl({ url: "https://example.com", depth: 2 });
Under the hood it's built on Ulixee Hero, a headless browser designed for anti-detection. The hard stuff like TLS fingerprinting, Cloudflare/Turnstile bypass, browser pool recycling, proxy rotation is built in.

The HTML-to-markdown conversion uses supermarkdown, a Rust engine I built specifically for messy real world HTML. Clean output, no artifacts.

TypeScript first, full type safety, works as CLI or library. Apache 2.0 license.

GitHub: https://github.com/vakra-dev/reader

Happy to answer questions about the architecture, approach, or tradeoffs I made.

Would love feedback from anyone doing web scraping at scale, especially on edge cases where it breaks. That's how I can make this better.

How a software meltdown will shake private markets

https://www.reuters.com/commentary/breakingviews/how-software-meltdown-will-shake-private-markets...
1•petethomas•59s ago•0 comments

Mind the GAAP Again

https://blog.dshr.org/2026/02/mind-gaap-again.html
2•eripen•3m ago•0 comments

10 months since the Llama-4 release: what happened to Meta AI?

1•Invictus0•5m ago•1 comments

Show HN: An LLM-enabled bash-based shell for Linux

https://yoshell.ai/
1•pizlonator•8m ago•0 comments

Volkswagen overtook Tesla as Europe's top EV seller in 2025

https://www.reuters.com/business/autos-transportation/volkswagen-overtook-tesla-europes-top-ev-se...
1•_fizz_buzz_•11m ago•0 comments

OpenAI and Anthropic go to war: Claude Opus 4.6 vs. GPT 5.3 Codex

https://www.latent.space/p/ainews-openai-and-anthropic-go-to
1•swyx•12m ago•0 comments

Extension to Fix YT Background Play

https://microsoftedge.microsoft.com/addons/detail/keepbackplay/dgcbepoghjphcihldlmiaefeadookoff
1•carlosj•15m ago•1 comments

Overseas transfer of map data could cost Korea up to $136B, study warns

https://www.koreaherald.com/article/10669626
1•e2e4•20m ago•0 comments

'Ripping' Clips for YouTube Reaction Videos Can Violate the DMCA, Court Rules

https://torrentfreak.com/ripping-clips-for-youtube-reaction-videos-can-violate-the-dmca-court-rules/
1•mikhael•21m ago•0 comments

I2P is currently facing an ongoing attack on its network

2•cyllek•22m ago•0 comments

Digging into UUID, ULID, and implementing my own

https://atlas9.dev/blog/id-type.html
1•buchanae•26m ago•0 comments

Our Kona EBM a 96% vs. 2% Sudoku Benchmark

https://logicalintelligence.com/blog/energy-based-model-sudoku-demo
1•jz391•29m ago•0 comments

I Gave Claude Code Infinity Gauntlet of LLMs

https://github.com/Pickle-Pixel/HydraMCP
2•picklepixel•31m ago•4 comments

Rdrama Down for the last 2 days

1•BWC_profile_GIF•40m ago•0 comments

I shipped 706 commits in 5 days with Taskwarrior and Claude Code

1•neilbb•42m ago•2 comments

' injection' claims in ski jump competition investigation by WADA

https://www.theguardian.com/sport/2026/feb/05/penis-injection-doping-claims-in-winter-olympics-sk...
3•anigbrowl•43m ago•2 comments

Corpus Lifetime free. Track gold, stocks, mutualfunds and net worth in one place

https://icorpus.vercel.app/
1•mathan_karthik•44m ago•0 comments

Show HN: TabAny – Start AI chats from text boxes, enables quick translations

https://www.tabany.app/
1•sally-suite•46m ago•0 comments

rsync.net is down

https://www.rsync.net/
1•gurjeet•47m ago•1 comments

Study: Meta AI model can reproduce almost half of Harry Potter book

https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first...
1•throwoutway•49m ago•0 comments

Built a desktop assistant [fully local] for myself without any privacy issue

1•surajkumar5050•49m ago•3 comments

Built a desktop assistant [fully local] for myself without any privacy issue

https://github.com/Surajkumar5050/zyron-assistant
1•surajkumar5050•53m ago•1 comments

Treasury SEC Admits Americans Are on the Hook for Trump's $10B Lawsuit

https://newrepublic.com/post/206211/treasury-secretary-bessent-trump-irs-lawsuit-taxpayers
4•SilverElfin•54m ago•2 comments

Waiting for Postgres 19: Better Planner Hints with Path Generation Strategies [video]

https://www.youtube.com/watch?v=QLb3nhIy2Lc
2•sbuttgereit•57m ago•0 comments

I reversed Tower of Fantasy's anti-cheat driver: a BYOVD toolkit never loaded

https://vespalec.com/blog/tower-of-flaws/
16•svespalec•1h ago•2 comments

What's at the Other End of 8.8.8.8?

https://blog.nono.io/post/8.8.8.8/
13•marinesebastian•1h ago•0 comments

Californian Court Rules That Ripping YouTube Clips Can Violate the DMCA

https://news.slashdot.org/story/26/02/05/1924252/court-rules-that-ripping-youtube-clips-can-viola...
2•tocitadel•1h ago•0 comments

Solving Shrinkwrap: New Experimental Technique

https://kizu.dev/shrinkwrap-solution/
1•spiros•1h ago•0 comments

The Question she did not ask

https://claudepress.substack.com/p/the-question-she-didnt-ask
2•Paodim•1h ago•2 comments

Show HN: CursedFeed, a social feed where people use spells to mutate next posts

https://cursedfeed.vercel.app/
1•Roccan•1h ago•0 comments