frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•3m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
1•throwaw12•4m ago•0 comments

MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•4m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•5m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•7m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•10m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
1•andreabat•13m ago•0 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
1•mgh2•19m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•21m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•26m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•27m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•28m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•31m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•32m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•34m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•35m ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•38m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•39m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•42m ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•43m ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
3•cinusek•43m ago•1 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•45m ago•0 comments

LLMs are powerful, but enterprises are deterministic by nature

2•prateekdalal•48m ago•0 comments

Make your iPad 3 a touchscreen for your computer

https://github.com/lemonjesus/ipad-touch-screen
2•0y•53m ago•1 comments

Internationalization and Localization in the Age of Agents

https://myblog.ru/internationalization-and-localization-in-the-age-of-agents
1•xenator•53m ago•0 comments

Building a Custom Clawdbot Workflow to Automate Website Creation

https://seedance2api.org/
1•pekingzcc•56m ago•1 comments

Why the "Taiwan Dome" won't survive a Chinese attack

https://www.lowyinstitute.org/the-interpreter/why-taiwan-dome-won-t-survive-chinese-attack
2•ryan_j_naughton•56m ago•0 comments

Xkcd: Game AIs

https://xkcd.com/1002/
2•ravenical•58m ago•0 comments

Windows 11 is finally killing off legacy printer drivers in 2026

https://www.windowscentral.com/microsoft/windows-11/windows-11-finally-pulls-the-plug-on-legacy-p...
2•ValdikSS•58m ago•0 comments

From Offloading to Engagement (Study on Generative AI)

https://www.mdpi.com/2306-5729/10/11/172
1•boshomi•1h ago•1 comments
Open in hackernews

OpenStreetMap overwhelmed by bots scraping data

https://twitter.com/openstreetmap/status/2016320492420878531
45•molly_radstowe•1w ago

Comments

molly_radstowe•1w ago
#OpenStreetMap hammered by scrapers hiding behind residential proxy/embedded-SDK networks.
direwolf20•1w ago
More like hammered by Google and Apple so you'll use their apps instead.
petre•1w ago
Unlikely. The data is freely available for download from geofabrik and other sources.
direwolf20•1w ago
The data is, the app isn't. OSM provides a giant data dump, not a way to view maps
Bender•1w ago
Looks like it is hosted in Equinix in NL? Or just part of it maybe? Is it behind a load balancer, maybe something like HAProxy? If so were stick tables set up to limit rates by cookie and require people be logged in on unique accounts and limit anonymous access after so many requests? I know limiting anonymous access is not great but that is something that could be enabled when under a high load so that instead of the site going offline for everyone it would just be limited for the anonymous users. Degradation vs critical outage

On a separate note have tcpdump captures been done on these excessive connections? Minus the IP, what do their SYN packets look like? Minus the IP what do the corresponding log entries look like in the web server? Are they using HTTP/1.1 or HTTP/2.0? Are they missing any expected headers for a real person such as cors, no-cors, navigate, accept_language?

    tcpdump -p --dont-verify-checksums -i any -NNnnvvv -B32768 -c32 -s0 port 443 and 'tcp[13] == 2'
Is there someone at OpenStreetMap that can answer these questions?
KomoD•1w ago
I think it could be worth trying to block them with TLS fingerprinting, or since they think it's residential proxies they are being hammered by, https://spur.us could be worth a try.
Bender•1w ago
My personal preference is to first make a small amount of effort finding something unique to the bots that can more often than not be dropped with a simple firewall rule or load balancer ACL. The botters almost always miss something.
Firefishy•1w ago
Disclosure: I am part of the mostly volunteer run OpenStreetMap ops team.

Technically we able to block and restrict the scrapers after the initial request from an IP. We've seen 400,000 IPs in the last 24 hours. Each IP only does a few requests. Most are not very good at faking browsers, but they are getting better. (HTTP/1.1 vs HTTP/2, obviously faked headers etc)

The problem has been going on for over a year now. It isn't going away. We need journalists and others to help us push back.

Bender•1w ago
I hear ya. This is just my opinion but I don't think journalists are going to be much help. The bots would have to be hurting something belonging to the government or the government is paying for to really get them on it. e.g. some big orgs in the government embed your maps on their site. They would have to create legislation and then someone would have to trace the bots back to their operator for attribution and then someone would have to file lawsuits against them once it is illegal. Or you could try using a ToS/AuP to go after them assuming attribution. I am not a lawyer.

I think your only hope would be to either find subtle differences between them and real legit users or change how your site works so that bots have to be authenticated unless they have a whitelisted IP/CIDR or put your site behind something else that spots the bots. Beyond that all anyone can do is beef up their infrastructure to handle much more than the bots could dish out.

Have you tried silly simple things like hidden javascript puzzles the browser has to solve?

Kodiack•5d ago
Hey. I run a small community forum and I've been dealing with this exact same kind of behaviour where well over 99% of requests are bad crawlers. There used to be plenty of "tells" for the faked browsers, HTTP/1.1 being a huge one. As you said, however, they're getting a bit smarter about that and it's becoming increasingly difficult to differentiate it from legitimate traffic.

It's been getting worse over the past year, with the past few weeks in particular seeing a massive change literally overnight. I had to aggressively tune my WAF rules to even remotely get things under control. With Cloudflare I'm aggressively issuing browser challenges to any browser that looks remotely suspicious, and the pass rate is currently below 0.5%. For my users' sake, a successful browser challenge is "valid" for over a month, but this still feels like another thing that'll eventually be bypassed.

I'd be keen to know if you've found any other effective ways of mitigating these most recent aggressive scraping requests. Even a simple "yes" or "no" would be appreciated; I think it's fair to be apprehensive about sharing some specific details publicly since even a lot of folks here on HN seem to think it's their right to scrape content with orders of magnitude higher throughput than all users combined.

I really don't know how this is sustainable long-term. It's eaten up quite a lot of my personal time and effort just for the sake of a hobby that I otherwise greatly enjoy.

phillipseamore•1w ago
The number of idiotic vibe coded repos I've seen on GH lately that are doing things like crawling OSM for POI data is mindboggling!
CqtGLRGcukpy•1w ago
https://xcancel.com/openstreetmap/status/2016320492420878531

https://nitter.poast.org/openstreetmap/status/20163204924208...

CqtGLRGcukpy•1w ago
They also posted about this on Mastodon / Fedi: https://en.osm.town/@osm_tech/115968544599864782
dzhiurgis•1w ago
I'll ask dumb question - if they are "open source" then why they are bothered by it? Is it scraping itself? Are their data not freely available for download?
wodenokoto•1w ago
Someone has to pay for bandwidth. And that someone would like the bandwidth to go to human users.
zeeZ•1w ago
Their data is freely available to download. There are weekly dumps of the entire planet and several sources for partial data. There's no need for most legitimate use cases to scrape their API.
dzhiurgis•1w ago
So problem is someone is stupid enough scraping without realizing they can just download 100gb at once?

And there are so many of such idiots that it's overwhelming their servers?

Something doesn't math here.

tencentshill•1w ago
That's exactly the case. It's been an issue for a while.
solaris2007•1w ago
Make the data available through bit-torrent and IPFS. Redirect IPs that make excessive requests to response only kilobytes in size "use the torrents and IPFS".

As an SRE, the only legitimate concern here could be the bandwidth costs. But QoS tuning should solve that too.

Supposedly technical people crying out for a journalist to help them is super lame. Everything about this looks super lame.

zeeZ•1w ago
That data is already available. Including torrents.

https://planet.openstreetmap.org/

solaris2007•1w ago
Perfect. Now all they need to do is set up the redirect.

Every bot is doing something on behalf of a human. Now that LLMs can churn out half-assed bot scripts every "look I installed Arch Linux and ohmyzsh" script kiddie has bots too.

Bots aren't going anywhere.

"Use the web the way it was over 10 years ago plox" isn't going to do it.

Firefishy•1w ago
Disclosure: I am part of the OpenStreetMap mostly-volunteer sysadmin team fighting this.

The scrapers try hard to make themselves look like valid browsers, sending requests via residential IP addresses (400,000+ IPs at last count).

I reached out to journalists because despite strong technical measures, the abuse will not go away on its own.