frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Download responsibly

https://blog.geofabrik.de/index.php/2025/09/10/download-responsibly/
167•marklit•3h ago•79 comments

Privacy and Security Risks in the eSIM Ecosystem [pdf]

https://www.usenix.org/system/files/usenixsecurity25-motallebighomi.pdf
143•walterbell•4h ago•68 comments

How I, a beginner developer, read the tutorial you, a developer, wrote for me

https://anniemueller.com/posts/how-i-a-non-developer-read-the-tutorial-you-a-developer-wrote-for-...
309•wonger_•7h ago•150 comments

A Generalized Algebraic Theory of Directed Equality

https://jacobneu.phd/
27•matt_d•3d ago•3 comments

Australian telco cut off emergency calls, firewall upgrade linked to 3 deaths

https://www.theregister.com/2025/09/21/optus_emergency_call_incident/
24•croes•53m ago•9 comments

Sj.h: A tiny little JSON parsing library in ~150 lines of C99

https://github.com/rxi/sj.h
398•simonpure•16h ago•196 comments

Why is Venus hell and Earth an Eden?

https://www.quantamagazine.org/why-is-venus-hell-and-earth-an-eden-20250915/
125•pseudolus•10h ago•190 comments

Simulating a Machine from the 80s

https://rmazur.io/blog/fahivets.html
35•roman-mazur•3d ago•4 comments

Lightweight, highly accurate line and paragraph detection

https://arxiv.org/abs/2203.09638
113•colonCapitalDee•11h ago•17 comments

How can I influence others without manipulating them?

https://andiroberts.com/leadership-questions/how-to-influence-others-without-manipulating
113•kiyanwang•10h ago•96 comments

40k-Year-Old Symbols in Caves Worldwide May Be the Earliest Written Language

https://www.openculture.com/2025/09/40000-year-old-symbols-found-in-caves-worldwide-may-be-the-ea...
143•mdp2021•4d ago•86 comments

DSM Disorders Disappear in Statistical Clustering of Psychiatric Symptoms (2024)

https://www.psychiatrymargins.com/p/traditional-dsm-disorders-dissolve?r=2wyot6&triedRedirect=true
120•rendx•6h ago•67 comments

Obsidian Note Codes

https://ezhik.jp/obsidian/note-codes/
89•surprisetalk•3d ago•30 comments

DXGI debugging: Microsoft put me on a list

https://slugcat.systems/post/25-09-21-dxgi-debugging-microsoft-put-me-on-a-list/
261•todsacerdoti•18h ago•75 comments

I uncovered an ACPI bug in my Dell Inspiron 5567. It was plaguing me for 8 years

https://triangulatedexistence.mataroa.blog/blog/i-uncovered-an-acpi-bug-in-my-dell-inspiron-5667-...
46•thunderbong•3d ago•4 comments

Why your outdoorsy friend suddenly has a gummy bear power bank

https://www.theverge.com/tech/781387/backpacking-ultralight-haribo-power-bank
217•arnon•20h ago•266 comments

Nvmath-Python: Nvidia Math Libraries for the Python Ecosystem

https://github.com/NVIDIA/nvmath-python
45•gballan•3d ago•1 comments

Show HN: Tips to stay safe from NPM supply chain attacks

https://github.com/bodadotsh/npm-security-best-practices
57•bodash•11h ago•22 comments

Calculator Forensics (2002)

https://www.rskey.org/~mwsebastian/miscprj/results.htm
81•ColinWright•3d ago•36 comments

Teach Kids Electronics Using Dough: Light Up Caterpillar Project

https://newsletter.infiniteretry.com/dough-circuits-led-caterpillar/
9•ekuck•3d ago•1 comments

We Politely Insist: Your LLM Must Learn the Persian Art of Taarof

https://arxiv.org/abs/2509.01035
24•chosenbeard•8h ago•3 comments

South Korea's President says US investment demands would spark financial crisis

https://www.reuters.com/world/china/south-koreas-president-lee-says-us-investment-demands-would-s...
66•rbanffy•5h ago•43 comments

Procedural Island Generation (VI)

https://brashandplucky.com/2025/09/28/procedural-island-generation-vi.html
57•ibobev•11h ago•4 comments

I forced myself to spend a week in Instagram instead of Xcode

https://www.pixelpusher.club/p/i-forced-myself-to-spend-a-week-in
236•wallflower•19h ago•92 comments

Node 20 will be deprecated on GitHub Actions runners

https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
97•redbell•1d ago•41 comments

Pointer Tagging in C++: The Art of Packing Bits into a Pointer

https://vectrx.substack.com/p/pointer-tagging-in-c-the-art-of-packing
55•signa11•7h ago•40 comments

INapGPU: Text-mode graphics card, using only TTL gates

https://github.com/Leoneq/iNapGPU
73•userbinator•4d ago•11 comments

How Isaac Newton discovered the binomial power series (2022)

https://www.quantamagazine.org/how-isaac-newton-discovered-the-binomial-power-series-20220831/
75•FromTheArchives•3d ago•15 comments

RCA VideoDisc's Legacy: Scanning Capacitance Microscope

https://spectrum.ieee.org/rca-videodisc
19•WaitWaitWha•3d ago•7 comments

Timesketch: Collaborative forensic timeline analysis

https://github.com/google/timesketch
116•apachepig•16h ago•12 comments
Open in hackernews

Download responsibly

https://blog.geofabrik.de/index.php/2025/09/10/download-responsibly/
165•marklit•3h ago

Comments

holowoodman•2h ago
Just wait until some AI dudes decide it is time to train on maps...
M95D•1h ago
Map slop? That's new!
jbstack•1h ago
AI models are trained relatively rarely, so it's unlikely this would be very noticeable among all the regular traffic. Just the occasional download-everything every few months.
holowoodman•1h ago
One would think so. If AI bros were sensible, responsible and intelligent.

However, the pratical evidence is to the contrary, AI companies are hammering every webserver out there, ignoring any kind of convention like robots.txt, re-downloading everything in pointlessly short intervals. Annoying everyone and killing services.

Just a few recent examples from HN: https://news.ycombinator.com/item?id=45260793 https://news.ycombinator.com/item?id=45226206 https://news.ycombinator.com/item?id=45150919 https://news.ycombinator.com/item?id=42549624 https://news.ycombinator.com/item?id=43476337 https://news.ycombinator.com/item?id=35701565

Waraqa•1h ago
IMHO in the long term this will lead to a closed web where you are required to log-in to view any content.
nativeit•1h ago
I’m looking forward to visiting all of the fictional places it comes up with!
cadamsdotcom•2h ago
Definitely a use case for bittorrent.
john_minsk•2h ago
If the data changes, how would a torrent client pick it up and download changes?
hambro•2h ago
Let the client curl latest.torrent from some central service and then download the big file through bittorrent.
maeln•1h ago
A lot of torrent client support various API to automatically collect torrent file. The most common is to simply use RSS.
Klinky•1h ago
Pretty sure people used or even still use RSS for this.
extraduder_ire•1h ago
There's a BEP for updatable torrents.
Gigachad•2h ago
Sounds like someone people are downloading it in their CI pipelines. Probably unknowingly. This is why most services stopped allowing automated downloads for unauthenticated users.

Make people sign up if they want a url they can `curl` and then either block or charge users who download too much.

userbinator•2h ago
I'd consider CI one of the worst massive wastes of computing resources invented, although I don't see how map data would be subject to the same sort of abusive downloading as libraries or other code.
Gigachad•2h ago
This stuff tends to happen by accident. Some org has an app that automatically downloads the dataset if it's missing, helpful for local development. Then it gets loaded in to CI, and no one notices that it's downloading that dataset every single CI run.
mschuster91•1h ago
CI itself doesn't have to be a waste. The problem is most people DGAF about caching.
stevage•50m ago
Let's say you're working on an app that incorporates some Italian place names or roads or something. It's easy to imagine how when you build the app, you want to download the Italian region data from geofabrik then process it to extract what you want into your app. You script it, you put the script in your CI...and here we are:

> Just the other day, one user has managed to download almost 10,000 copies of the italy-latest.osm.pbf file in 24 hours!

raverbashing•19m ago
Also for some reason, most CI runners seem to cache nothing except for that minor thing that you really don't want cached.
aitchnyu•2h ago
Can we identify requests from CI servers reliably?
IshKebab•1h ago
You can identify requests from Github's free CI reliably which probably covers 99% of requests.

For example GMP blocked GitHub:

https://www.theregister.com/2023/06/28/microsofts_github_gmp...

This "emergency measure" is still in place, but there are mirrors available so it doesn't actually matter too much.

ncruces•1h ago
I try to stick to GitHub for GitHub CI downloads.

E.g. my SQLite project downloads code from the GitHub mirror rather than Fossil.

Gigachad•1h ago
Sure, have a js script involved in generating a temporary download url.

That way someone manually downloading the file is not impacted, but if you try to put the url in a script it won’t work.

marklit•1h ago
I suspect web apps that "query" the GPKG files. Parquet can be queried surgically, I'm not sure if there is a way to do the same with GPKG.
rossant•2h ago
Can't the server detect and prevent repeated downloads from the same IP, forcing users to act accordingly?
jbstack•1h ago
See: "Also, when we block an IP range for abuse, innocent third parties can be affected."

Although they refer to IP ranges, the same principle applies on a smaller scale to a single IP address: (1) dynamic IP addresses get reallocated, and (2) entire buildings (universities, libraries, hotels, etc.) might share a single IP address.

Aside from accidentally affecting innocent users, you also open up the possibility of a DOS attack: the attacker just has to abuse the service from an IP address that he wants to deny access to.

imiric•1h ago
More sophisticated client identification can be used to avoid that edge case, e.g. TLS fingerprints. They can be spoofed as well, but if the client is going through that much trouble, then they should be treated as hostile. In reality it's more likely that someone is doing this without realizing the impact they're having.
imiric•1h ago
It could be slightly more sophisticated than that. Instead of outright blocking an entire IP range, set quotas for individual clients and throttle downloads exponentially. Add latency, cap the bandwidth, etc. Whoever is downloading 10,000 copies of the same file in 24 hours will notice when their 10th attempt slows down to a crawl.
tlb•25m ago
It'll still suck for CI users. What you'll find is that occasionally someone else on the same CI server will have recently downloaded the file several times and when your job runs, your download will go slowly and you'll hit the CI server timeout.
detaro•22m ago
that's working as intended then, you should be caching such things. It sucking for companies that don't bother is exactly the point, no?
aitchnyu•2h ago
Do they email heavy users? We used Nominatim free api for geocoding addresses in 2012 and our email was required parameter. They mailed us and asked us to cache results to reduce request rates.
6581•24m ago
There's no login, so they won't have any email addresses.
crimsoneer•2h ago
I continue to be baffled the geofabrik folks remain the primary way to get a clean-ish OSM shapefile. Big XKCD "that one bloke holding up the internet" energy.

Also, everyone go contribute/done to OSM.

omcnoe•1h ago
It's beneficial to the wider community, and also supports their commercial interests (OSM consulting). Win-win.
marklit•1h ago
> primary way to get a clean-ish OSM shapefile

Shapefiles shouldn't be what you're after, Parquet can almost always do a better job unless you need to either edit something or use really advanced geometry not yet supported in Parquet.

Also, this is your best source for bulk OSM data: https://tech.marksblogg.com/overture-dec-2024-update.html

If you're using ArcGIS Pro, use this plugin: https://tech.marksblogg.com/overture-maps-esri-arcgis-pro.ht...

teekert•1h ago
Whenever I read about such issues I always wonder why we all don’t make more use of BitTorrent. Why is it not the underlying protocol for much more stuff? Like container registries? Package repos, etc.
vaylian•1h ago
> Like container registries? Package repos, etc.

I had the same thoughts for some time now. It would be really nice to distribute software and containers this way. A lot of people have the same data locally and we could just share it.

maeln•1h ago
I can imagine a few things :

1. BitTorrent has a bad rep. Most people still associate it with just illegal download.

2. It requires slightly more complex firewall rules, and asking the network admin to put them in place might raise some eyebrow for reason 1. On very restrictive network, they might not want to allow them at all due to the fact that it opens the door for, well, BitTorrent.

3. A BitTorrent client is more complicated than an HTTP client, and not installed on most company computer / ci pipeline (for lack of need, and again reason 1.). A lot of people just want to `curl` and be done with it.

4. A lot of people think they are required to seed, and for some reason that scare the hell of them.

Overall, I think it is mostly 1 and the fact that you can just simply `curl` stuff and have everything working. I do sadden me that people do not understand how good of a file transfer protocol BT is and how it is underused. I do remember some video game client using BT for updates under the hood, and peertube use webtorrent, but BT is sadly not very popular.

simonmales•1h ago
At least the planet download offers BitTorrent. https://planet.openstreetmap.org/
_def•1h ago
> A lot of people think they are required to seed, and for some reason that scare the hell of them.

Some of the reasons consists of lawyers sending put costly cease and desist letters even to "legitimate" users

Fokamul•1h ago
Lol, bad rep? Interesting, in my country everybody is using it to download movies :D Even more so now, after this botched streaming war. (EU)
maeln•58m ago
Which is exactly why it has a bad rep. In most people mind BitTorrent = illegal download.
_zoltan_•43m ago
downloading movies for personal use is legal in many countries.
ahofmann•26m ago
This is a useless discussion. Imagine how the firewall-guy/network-team in your company will react to that argument.
loa_in_•9m ago
This is not a useless discussion just because it'll inconvenience someone who is at work anyway.
_flux•10m ago
How about the uploading part of it, which is behind the magic of Bittorrent and default mode of operation?
lobochrome•6m ago
Really?? Which countries allow copyright infringement by individuals?
zwnow•32m ago
I got billed 1200€ for downloading 2 movies when I was 15. I will never use torrents again.
ioteg•25m ago
You mean some asshole asked your parents for that sum to not go to a trial that they would lose and your parents paid.
zwnow•20m ago
First off it was like 2 months after my father's death we didnt have time for this, secondly my mom got an attorney that I paid. Was roughly the same amount though. We never paid them.
xzjis•31m ago
To play devil's advocate, I think the author of the message was talking about the corporate context where it's not possible to install a torrent client; Microsoft Defender will even remove it as a "potentially unwanted program", precisely because it is mostly used to download illegal content.

Obviously illegal ≠ immoral, and being a free-software/libre advocate opposed to copyright, I am in favor of the free sharing of humanity's knowledge, and therefore supportive of piracy, but that doesn't change the perception in a corporate environment.

loa_in_•10m ago
Wow, that's vile. U have many objections to this but they all boil down to M$ telling you what you cannot do with your own computer.
nativeit•1h ago
I assume it’s simply the lack of the inbuilt “universal client” that http enjoys, or that devs tend to have with ssh/scp. Not that such a client (even an automated/scripted CLI client) would be so difficult to setup, but then trackers are also necessary, and then the tooling for maintaining it all. Intuitively, none of this sounds impossible, or even necessarily that difficult apart from a few tricky spots.

I think it’s more a matter of how large the demand is for frequent downloads of very large files/sets, which leads to a questions of reliability and seeding volume, all versus the effort involved to develop the tooling and integrate it with various RCS and file syncing services.

Would something like Git LFS help here? I’m at the limit of my understanding for this.

nativeit•1h ago
I certainly take advantage of BitTorrent mirrors for downloading Debian ISOs, as they are generally MUCH faster.
nopurpose•41m ago
All Linux ISOs collectors in the world wholeheartedly agree.
mschuster91•1h ago
Trackers haven't been necessary for well over a decade now thanks to DHT.
zaphodias•1h ago
I remember seeing the concept of "torrents with dynamic content" a few years ago, but apparently never became a thing[1]. I kind of wish it did, but I don't know if there are critical problems (i.e. security?).

[1]: https://www.bittorrent.org/beps/bep_0046.html

charcircuit•1h ago
AFAIK Bittorrent doesn't allow for updating the files for a torrent.
trenchpilgrim•1h ago
> Like container registries?

https://github.com/uber/kraken exists, using a modified BT protocol, but unless you are distributing quite large images to a very large number of nodes, a centralized registry is probably faster, simpler and cheaper

marklit•1h ago
Amazon, Esri, Grab, Hyundai, Meta, Microsoft, Precisely, Tripadvisor and TomTom, along with 10s of other businesses got together and offer OSM data in Parquet on S3 free of charge. You can query it surgically and run analytics on it needing only MBs of bandwidth on what is a multi-TB dataset at this point. https://tech.marksblogg.com/overture-dec-2024-update.html

If you're using ArcGIS Pro, use this plugin: https://tech.marksblogg.com/overture-maps-esri-arcgis-pro.ht...

willtemperley•15m ago
It's just great that bounding box queries can be translated into HTTP range requests.
dotwaffle•49m ago
From a network point of view, BitTorrent is horrendous. It has no way of knowing network topology which frequently means traffic flows from eyeball network to eyeball network for which there is no "cheap" path available (potentially causing congestion of transit ports affecting everyone) and no reliable way of forecasting where the traffic will come from making capacity planning a nightmare.

Additionally, as anyone who has tried to share an internet connection with someone heavily torrenting, the excessive number of connections means overall quality of non-torrent traffic on networks goes down.

Not to mention, of course, that BitTorrent has a significant stigma attached to it.

The answer would have been a squid cache box before, but https makes that very difficult as you would have to install mitm certs on all devices.

For container images, yes you have pull through registries etc, but not only are these non-trivial to setup (as a service and for each client) the cloud providers charge quite a lot for storage making it difficult to justify when not having a check "works just fine".

The Linux distros (and CPAN and texlive etc) have had mirror networks for years that partially addresses these problems, and there was an OpenCaching project running that could have helped, but it is not really sustainable for the wide variety of content that would be cached outside of video media or packages that only appear on caches hours after publishing.

BitTorrent might seem seductive, but it just moves the problem, it doesn't solve it.

rlpb•45m ago
> From a network point of view, BitTorrent is horrendous. It has no way of knowing network topology which frequently means traffic flows from eyeball network to eyeball network for which there is no "cheap" path available...

As a consumer, I pay the same for my data transfer regardless of the location of the endpoint though, and ISPs arrange peering accordingly. If this topology is common then I expect ISPs to adjust their arrangements to cater for it, just the same as any other topology.

dotwaffle•33m ago
> ISPs arrange peering accordingly

Two eyeball networks (consumer/business ISPs) are unlikely to have large PNIs with each other across wide geographical areas to cover sudden bursts of traffic between them. They will, however, have substantial capacity to content networks (not just CDNs, but AWS/Google etc) which is what they will have built out.

BitTorrent turns fairly predictable "North/South" traffic where capacity can be planned in advance and handed off "hot potato" as quickly as possible, into what is essentially "East/West" with no clear consistency which would cause massive amounts of congestion and/or unused capacity as they have to carry it potentially over long distances they have not been used to, with no guarantee that this large flow will exist in a few weeks time.

If BitTorrent knew network topology, it could act smarter -- CDNs accept BGP feeds from carriers and ISPs so that they can steer the traffic, this isn't practical for BitTorrent!

sulandor•16m ago
bittorrent will make best use of what bandwidth is available. better think of it as a dynamic cdn which can seamlessly incorporate static cdn-nodes (see webseed).

it could surely be made to care for topology but imho handing that problem to congestion control and routing mechanisms in lower levels works good enough and should not be a problem.

alluro2•1h ago
People like Geofabrik are why we can (sometimes) have nice things, and I'm very thankful for them.

Level of irresponsibility/cluelessness you can see from developers if you're hosting any kind of an API is astonishing, so downloads are not surprising at all...If someone, a couple of years back, told me things that I've now seen, I'd absolutely dismiss them as making stuff up and grossly exaggerating...

However, on the same token, it's sometimes really surprising how API developers rarely ever think in terms of multiples of things - it's very often just endpoints to do actions on single entities, even if nature of use-case is almost never on that level - so you have no other way than to send 700 requests to do "one action".

alias_neo•22m ago
> Level of irresponsibility/cluelessness you can see from developers if you're hosting any kind of an API is astonishing

This applies to anyone unskilled in a profession. I can assure you, we're not all out here hammering the shit out of any API we find.

With the accessibility of programming to just about anybody, and particularly now with "vibe-coding" it's going to happen.

Slap a 429 (Too Many Requests) in your response or something similar using a leaky-bucket algo and the junior dev/apprentice/vibe coder will soon learn what they're doing wrong.

- A senior backend dev

zwnow•18m ago
They mention a single user downloading a 20GB file thousands of times on a single day, why not just rate limit the endpoint?
Meneth•1h ago
Some years ago I thought, no one would be stupid enough to download 100+ megabytes in their build script (which runs on CI whenever you push a commit).

Then I learned about Docker.

trklausss•58m ago
I mean, at this point I wouldn't mind if they rate-limit downloads. A _single_ customer downloading the same file 10.000 times? Sorry, we need to provide for everyone, try again at some other point.

It is free, yes, but there is no need to either abuse it or give as much resource for free as they can.

k_bx•51m ago
This. Maybe they could actually make some infra money out of this. Make token-based free tier download, pay if you break it.
stevage•52m ago
>Just the other day, one user has managed to download almost 10,000 copies of the italy-latest.osm.pbf file in 24 hours!

Whenever I have done something like that, it's usually because I'm writing a script that goes something like:

1. Download file 2. Unzip file 3. Process file

I'm working on step 3, but I keep running the whole script because I haven't yet built a way to just do step 3.

I've never done anything quite that egregious though. And these days I tend to be better at avoiding this situation, though I still commit smaller versions of this crime.

stanac•45m ago
10,000 times a day is on average 8 times a second. No way someone has 8 fixes per second, this is more like someone wanted to download a new copy every day, or every hour but they messed up milliseconds config or something. Or it's simply malicious user.

edit: bad math, it's 1 download every 8 seconds

gblargg•43m ago
When I do scripts like that I modify it to skip the download step and keep the old file around so I can test the rest without anything time-consuming.
xmprt•42m ago
My solution to this is to only download if the file doesn't exist. An additional bonus is that the script now runs much faster because it doesn't need to do any expensive networking/downloads.
cjs_ac•48m ago
I have a funny feeling that the sort of people who do these things don't read these sorts of blog posts.
globular-toast•42m ago
Ah, responsibility... The one thing we hate teaching and hate learning even more. Someone is probably downloading files in some automated pipeline. Nobody taught them that with great power (being able to write programs and run them on the internet) comes great responsibility. It's similar to how people drive while intoxicated or on the phone etc. It's all fun until you realise you have a responsibility.
vgb2k18•32m ago
Seems a perfect justification for using api keys. Unless I'm missing the nuance of this software model.