frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: ScrapAI – We scrape 500 sites. AI runs once per site, not per page

https://github.com/discourselab/scrapai-cli
1•iranu•2h ago

Comments

iranu•2h ago

  Hi HN, I built this. It's been in production across 500+ websites.

  We're a research group that studies online communications. We needed to scrape hundreds of sites regularly — news,
  blogs, forums, policy orgs — and maintain all those scrapers. At 10 sites, individual scrapers were fine. At 200+
  we were spending more time fixing broken scrapers than doing actual work. Every redesign broke something, every new
   site meant another scraper from scratch.

  ScrapAI flips the cost model. You tell an AI agent "add bbc.co.uk to my news project." It analyzes the site, writes
   URL patterns and extraction rules, tests on 5 pages, and saves a JSON config to a database. After that it's just
  Scrapy — no AI in the loop, no per-page inference calls. ~$1-3 in tokens per website with Sonnet 4.5, not per page.

  Cloudflare was the hardest part. Most tools keep a browser open for every request (~5-10s per page). We use
  CloakBrowser (open source, C++ stealth patches, 0.9 reCAPTCHA v3 score) to solve the challenge once, cache the
  cookies, kill the browser, and hit the site with normal HTTP. Re-solves every ~10 minutes. 1,000 pages in ~8
  minutes vs 2+ hours.

  The agent writes JSON configs, not Python. An agent that writes and runs code can do anything an unsupervised
  developer can — one prompt injection from a malicious page and you have a real problem. JSON goes through Pydantic
  validation before it touches the database. Worst case is a bad config that extracts wrong fields. This also makes
  it safe to use as a tool for Claws — structured web data without arbitrary code execution.

  ~4,000 lines of Python. Scrapy, SQLAlchemy, Alembic. Apache 2.0. We recommend Claude Code with Sonnet 4.5 but it
  works with any agent that can read instructions and run shell commands. We tried GLM 4.7 and it performed
  similarly, just slower.

My accent costs me 30 IQ points on Zoom. So we built an ML model to fix it

https://krisp.ai/blog/introducing-accent-conversion-for-the-listener/
1•artavazdsm•1m ago•1 comments

The MokaBot Brews Better Coffee Than Me [video]

https://www.youtube.com/watch?v=UGf7mtfhOFM
1•jocoda•2m ago•0 comments

Altered default-mode network functional connectivity with mobile phone addiction [pdf]

https://e-century.us/files/ijcem/12/2/ijcem0078031.pdf
1•vitto_gioda•2m ago•0 comments

New scams emerging as leaked Odido data pops up on social media

https://nltimes.nl/2026/03/03/new-scams-emerging-leaked-odido-data-pops-social-media
1•TechTechTech•4m ago•0 comments

Show HN: BustAPI Back

https://github.com/RUSTxPY/BustAPI
1•ZOROX•4m ago•0 comments

Kind Technologies

http://www.kindtechnologies.com/index.htm
2•theonething•5m ago•0 comments

Wolt pulls out of Japan amid DoorDash exit from some Asian markets

https://www.japantimes.co.jp/business/2026/02/26/companies/wolt-japan-exit/
2•mikhael•6m ago•0 comments

Pass-Through of Tariffs: Evidence from European Wine Imports

https://www.nber.org/202603/digest/pass-through-tariffs-evidence-european-wine-imports
3•neehao•6m ago•0 comments

Show HN: TicketToPR, an open source tool that turns Notion tickets into PRs

https://github.com/JohnRiceML/ticket-to-pr
2•hello_code•6m ago•1 comments

Show HN: Pencil Puzzle Bench – LLM Benchmark for Multi-Step Verifiable Reasoning

https://ppbench.com/
2•bluecoconut•8m ago•0 comments

Production Agentic RAG Course

https://github.com/jamwithai/production-agentic-rag-course
2•redbell•8m ago•0 comments

Google CLI

https://gogcli.sh/
1•simonebrunozzi•9m ago•0 comments

Bhutan's crypto experiment shows how hard digital money is in the real world

https://restofworld.org/2026/bhutan-bitcoin-tourism-payment-adoption-failure/
1•PaulHoule•9m ago•0 comments

Show HN: DejaShip – an intent ledger to stop AI agents from building duplicates

https://github.com/mingulov/dejaship
1•mdn0•9m ago•0 comments

Show HN: WordPress for Voice Agents – Unpod.ai

https://github.com/parvbhullar/unpod
1•parvbhullar•9m ago•0 comments

I have $10k+ in cloud credits and want to turn them into a real business

1•Palominocoq•10m ago•0 comments

Show HN: I vibecoded a glucose analysis tool

https://github.com/daedalus/agp_tool
1•dclavijo•11m ago•0 comments

User Privacy and LLMs: An Analysis of Frontier Developers' Privacy Policies

https://arxiv.org/abs/2509.05382
1•redbell•11m ago•0 comments

Gary McKinnon Pentagon hacker interview [video]

https://www.youtube.com/watch?v=2ttdlCa5ZCI
1•childintime•12m ago•0 comments

A Story Bigger Than Iran by Garry Kasparov

https://www.thenextmove.org/p/a-story-bigger-than-iran
2•wslh•12m ago•0 comments

Authentication bypass in pac4j-JWT using only the RSA public key

https://www.codeant.ai/security-research/pac4j-jwt-authentication-bypass-public-key
1•Amartya_jha•13m ago•1 comments

I built a self-hosted RSS system for filtering (NetNewsWire + Miniflux)

https://jordankrueger.com/blog/self-hosted-rss-with-claude-code/
1•pandemicsoul•15m ago•1 comments

Gemini 3.1 Flash-Lite: Built for intelligence at scale

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-lite/
4•meetpateltech•15m ago•0 comments

Ask HN: How is Claude agent experience in Xcode 26.3?

3•malshe•16m ago•0 comments

Tech Is Shooting Itself in the Foot

https://datasciencetalent.co.uk/techs-dumbest-mistake-why-firing-programmers-for-ai-will-destroy-...
1•frag•16m ago•0 comments

Fine-Tuning Qwen3 Embeddings for product category classification

https://blog.ivan.digital/fine-tuning-qwen3-embeddings-for-product-category-classification-on-the...
1•ipotapov•16m ago•0 comments

Why your 2-week passkey sprint will turn into 6 months

https://www.corbado.com/blog/native-ios-android-passkey-implementation-challenges
1•vdelitz•17m ago•0 comments

Gemini 3.1 Flash-Lite

https://deepmind.google/models/model-cards/gemini-3-1-flash-lite/
1•meetpateltech•18m ago•0 comments

When AI Writes the Software, Who Verifies It?

https://leodemoura.github.io/blog/2026/02/28/when-ai-writes-the-worlds-software.html
2•todsacerdoti•18m ago•0 comments

Show HN: A trainable, modular electronic nose for industrial use

https://sniphi.com/
2•kwitczak•19m ago•0 comments