frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

1•andrewstuart•1m ago•0 comments

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

https://www.cbsnews.com/news/pentagon-says-its-cutting-ties-with-woke-harvard-discontinuing-milit...
2•alephnerd•4m ago•1 comments

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

https://cds.cern.ch/record/405662/files/PhysRev.47.777.pdf
1•northlondoner•4m ago•1 comments

Kessler Syndrome Has Started [video]

https://www.tiktok.com/@cjtrowbridge/video/7602634355160206623
1•pbradv•7m ago•0 comments

Complex Heterodynes Explained

https://tomverbeure.github.io/2026/02/07/Complex-Heterodyne.html
2•hasheddan•8m ago•0 comments

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
2•ArtemZ•19m ago•3 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•20m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
1•LiamPowell•22m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
2•duxup•24m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•26m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•38m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•40m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
3•savrajsingh•41m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•42m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•46m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•51m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
2•g1raffe•53m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•59m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
3•rolph•1h ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•1h ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•1h ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•1h ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
2•cedel2k1•1h ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
37•chwtutha•1h ago•6 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
4•osnium123•1h ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•1h ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•1h ago•1 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•1h ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•1h ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments
Open in hackernews

Ask HN: Web scraping in production?

4•arkmm•9mo ago
Are any of you maintaining any web scrapers in production?

I've done some for side projects, automated testing, and personal scripts (checking personal bank balances, getting a Global Entry interview slot, etc.), but it always feels very brittle.

Curious what applications people have in industry and what sorts of techniques people use for reliability.

Comments

sargstuff•9mo ago
excel web scraping[0] (vs. using python[1] and/or odbc/delimited files)

A few 2025 use cases [2],[3]:

   Use publically available database information (construction, taxes, sales, traffic report, proposed building/zone changes, etc) to find out what's going on withing an area aka. zip code, housing area, 'vacation spot', etc
----

   creative take on topic:

      modern looming / static 'threaded' approach : https://news.ycombinator.com/item?id=43977384

      Structurally reprogrammable magnetic maetamaterials hold promise for biomedicine, soft robotics. ("web" support formed via scraping material in relevant patterns) : https://techxplore.com/news/2025-05-reprogrammable-magnetic-metamaterials-biomedicine-soft.html

      3d printed smart-fabrics : https://techxplore.com/news/2025-05-d-smart-fabrics-flexibility-ability.html

----

[0] : excel scraping : https://www.youtube.com/watch?app=desktop&v=6coVzIt93vk

[1] : python scraping : https://www.youtube.com/watch?v=Oo8-nEuDBkk

[2] : https://dataforest.ai/blog/top-web-scraping-use-cases

[3] : https://www.parsehub.com/blog/web-scraping-examples/

arkmm•9mo ago
Neat - didn't realize there were affordances for scraping in Excel (but in hindsight I shouldn't be surprised).

I didn't follow the connection between modern looming and scraping though?

sargstuff•9mo ago
hint: silk spider webs & fabric threads.

Guess 3d printing should have been clarified as linear, fused deposition. Melted plastic line gets scraped along plate/material.

The 3d printed web reference, in this instance, being the in-fill pattern. [0]

Robotic metal pinching / incremental sheet forming might be bit more clearer example. [1]

-----

[0] : in-fill pattern : https://jlc3dp.com/blog/choosing-the-right-infill-structure-...

[1] : https://www.youtube.com/watch?v=Jc16Ob-yoDs

9d•9mo ago
Scraping is inherently brittle, but it can be very useful for short-term scraping in very specific circumstances. I haven't had any in maybe 10 years.
sargstuff•9mo ago
IMHO, "untyped" format/delimited file yes. Directly placing/'compiling' in appropriate topological construct/environment works wonders. aka environment of database, spreadsheet, "reports" with information beyond raw data, etc