OpenFreeMap survived 100k requests per second

https://blog.hyperknot.com/p/openfreemap-survived-100000-requests

188•hyperknot•4h ago

Comments

colinbartlett•3h ago

Thank you for this breakdown and for this level of transparency. We have been thinking of moving from MapTiler to OpenFreeMap for StatusGator's outage maps.

hyperknot•2h ago

Feel free to migrate. If you ever worry about High Availability, self-hosting is always an option. But I'm working hard on making the public instance as reliable as possible.

v5v3•2h ago

The article mentions Cloudflare, so how much of this was cached by them?

do_anh_tu•2h ago

Do you even read the article?

keketi•2h ago

Are you new? Nobody actually reads the articles.

LorenDB•2h ago

False. I almost never upvote an article without reading it, and half of those upvotes are because I already read something similar recently that gave me the same information.

eszed•51m ago

I'll submit in the second case (already read something similar) that, properly speaking, we should read both, and upvote (or submit, if not already here) the better of the articles.

Not that, you know, I often take the time to do that, either - but it would improve the site and the discussions if we all did.

jwilk•2h ago

From the HN Guidelines <https://news.ycombinator.com/newsguidelines.html>:

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

RandomBacon•1h ago

That guideline is decent I guess.

I am disappointed that they edited another guideline for the worse:

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

It used to just say, don't complain about voting.

If the number of votes are so taboo, why do they even show us the number or user karma (and have a top list)?

alessandroberna•2h ago

99.38%

fnord77•2h ago

sounds like they survived 1,000 reqs/sec and the cloudflare CDN survived 99,000 reqs/sec

LoganDark•2h ago

> I believe what is happening is that those images are being drawn by some script-kiddies.

Oh absolutely not. I've seen so many autistic people literally just nolifing and also collaborating on huge arts on wplace. It is absolutely not just script kiddies.

> 3 billion requests / 2 million users is an average of 1,500 req/user. A normal user might make 10-20 requests when loading a map, so these are extremely high, scripted use cases.

I don't know about that either. Users don't just load a map, they look all around the place to search for and see a bunch of the art others have made. I don't know how many requests is typical for "exploring a map for hours on end" but I imagine a lot of people are doing just that.

I wouldn't completely discount automation but these usage patterns seem by far not impossible. Especially since wplace didn't expect sudden popularity so they may not have optimized their traffic patterns as much as they could have.

nemomarx•2h ago

There are some user scripts to overlay templates on the map and coordinate working together, but I can't imagine that increases the load much. What might is that wplace has been struggling under the load and you have to refresh to see your pixels placed or any changes and that could be causing more calls an hour maybe?

Karliss•1h ago

Just scrolled around a little bit 2-3minutes with network monitor open. That already resulted in 500requests, 5MB transferred (after filtering by vector tile data). Not sure how many of those got cached by browser with no actual requests, cached by browser exchanging only headers or cached by cloudflare. I am guessing that the typical 10-20 requests/user case is for embedded map fragment like those commonly found in contact page where most users don't scroll at all or at most slightly zoom out to better see rest of city.

charcircuit•2h ago

>Nice idea, interesting project, next time please contact me before.

It's impossible to predict that one's project may go viral.

>As a single user, you broke the service for everyone.

Or you did by not having a high enough fd limit. Blaming sites when using it too much when you advertise there is no limit is not cool. It's not like wplace themselves were maliciously hammering the API.

columb•2h ago

You are so entitled... Because of you most nice things have "no limits but...". Not cool stress testing someone's infrastructure. Not cool. The author of this post is more than understanding, tried to fix it and offered a solution even after blocking them. On a free service.

Show us what you have done.

charcircuit•2h ago

>You are so entitled

That's how agreements work. If someone says they will sell a hamburger for $5, and another person pays $5 for a hamburger, then they are entitled to a hamburger.

>On a free service.

It's up to the owner to price the service. Being overwhelmed by traffic when there are no limits is not a problem limited only to free services.

perching_aix•1h ago

> Do you offer support and SLA guarantees?

> At the moment, I don’t offer SLA guarantees or personalized support.

From the website.

eszed•47m ago

Sure, and if you bulk-order 5k hamburgers the restaurant will honor the price, but they'll also tell you "we're going to need some notice to handle that much product". Perfect analogy, really. This guy handled the situation perfectly, imo.

010101010101•2h ago

Do you expect him just to let the service remain broken or to scale up to infinite cost to himself on this volunteer project? He worked with the project author to find a solution that works for both and does not degrade service for every other user, under literally no obligation to do anything at all. This isn’t Anthropic deciding to throttle users paying hundreds of dollars a month for a subscription. Constructive criticism is one thing, but entitlement to something run by an individual volunteer for free is absurd.

charcircuit•2h ago

We are talking about hosting a fixed amount of static files. This should be a solved problem. This is nothing like running large AI models for people.

010101010101•2h ago

The nature of the service is completely irrelevant.

charcircuit•2h ago

Running a no limit service for free definitely depends on the marginal cost of serving a single request.

toast0•34m ago

The project page kind of suggests he might scale up to infinite cost...

> Financially, the plan is to keep renting servers until they cover the bandwidth. I believe it can be self-sustainable if enough people subscribe to the support plans.

Especially since he said Cloudflare is providing the CDN for free... Yes, running the origins costs money, but in most cases, default fd limits are low, and you can push them a lot higher. At some point you'll run into i/o limits, but I think the I/O at the origin seems pretty managable if my napkin math was right.

If the files are all tiny, and the fd limit is the actual bottleneck, there's ways to make that work better too. IMHO, it doesn't make sense to accept a inbound connection if you can't get a fd to read a file for it, so better to limit the concurrent connections and let connections sit in the listen queue and have a short keepalive time out to make sure you're not wasting your fds on idle connections. With no other knowledge, I'd put the connection limit at half the FD limit, assuming the origin server is dedicated for this and serves static files exclusively. But, to be honest, if I set up something like this, I probably wouldn't have thought about FD limits until they got hit, so no big deal ... hopefully whatever I used to monitor would include available fds by default and I'd have noticed, but it's not a default output everywhere.

rikafurude21•2h ago

the funny part is that his service didnt break- cloudflares cache caught 99% of the requests. just wanted to feel powerful and break the latest viral trend.

feverzsj•2h ago

So, OFM was hit by another Million Dollar Homepage for kids.

eggbrain•2h ago

Limiting by referrer seems strange — if you know a normal user makes 10-20 requests (let’s assume per minute), can’t you just rate limit requests to 100 requests per minute per IP (5x the average load) and still block the majority of these cases?

Or, if it’s just a few bad actors, block based on JA4/JA3 fingerprint?

hyperknot•2h ago

What if one user really wants to browse around the world and explore the map. I remember spending half an hour in Google Earth desktop, just exploring around interesting places.

I think referer based limits are better, this way I can ask high users to please choose self-hosting instead of the public instance.

toast0•6m ago

Limiting by referrer is probably the right first step. (And changing the front page text)

You want to track usage by the site, not the person, because you can ask a site to change usage patterns in a way you can't really ask a site's users. Maybe a per IP limit makes sense too, but you wouldn't want them low enough that it would be effective for something like this.

jspiner•2h ago

The cache hit rate is amazing. Is there something you implemented specifically for this?

hyperknot•2h ago

Yes, I designed the whole path structure / location blocks with caching in mind. Here is the generated nginx.conf, if you are interested:

https://github.com/hyperknot/openfreemap/blob/main/docs/asse...

rtaylorgarlock•2h ago

Is it always/only 'laziness' (derogatory, i know) when caching isn't implemented by a site like wplace.live ? Why wouldn't they save openfreemap all the traffic when a caching server on their side presumably could serve tiles almost as fast or faster than openfreemap?

VladVladikoff•1h ago

I actually have a direct answer for this: priorities. I run a fairly popular auction website and we have map tiles via stadia maps. We spend about $80/month on this service for our volume. We definitely could get this cost down to a lower tier by caching the tiles and serving them from our proxy. However we simply haven’t yet had the time to work on this, as there is always some other task which is higher priority.

markerz•1h ago

It looks like a fun website, not a for-profit website. The expectations and focus of fun websites is more to just get it working than to handle the scale. It sounds like their user base exploded overnight, doubling every 14 hours or so. It also sounds like it’s other a solo dev or a small group based on the maintainers wording.

hyperknot•1h ago

We are talking about an insane amount of data here. It was 56 Gbit/s (or 56 x 1 Gbit servers 100% saturated!). This is not something a "caching server" could handle. We are talking on the order of CDN networks, like Cloudflare, to be able to handle this.

wyager•1h ago

> or 56 x 1 Gbit servers 100% saturated

Presumably a caching server would be 10GbE, 40GbE, or 100GbE

56Gbit/sec of pre-generated data is definitely something that you can handle from 1 or 2 decent servers, assuming each request doesn't generate a huge number of random disk reads or something

ndriscoll•1h ago

I'd be somewhat surprised if nginx couldn't saturate a 10Gbit link with an n150 serving static files, so I'd expect 6x $200 minipcs to handle it. I'd think the expensive part would be the hosting/connection.

toast0•52m ago

Why should they when openfreemap is behind a CDN and their home page says things like:

> Using our public instance is completely free: there are no limits on the number of map views or requests. There’s no registration, no user database, no API keys, and no cookies. We aim to cover the running costs of our public instance through donations.

> Is commercial usage allowed?

> Yes.

IMHO, reading this and then just using it, makes a lot of sense. Yeah, you could put a cache infront of their CDN, but why, when they said it's all good, no limits, for free?

I might wonder a bit, if I knew the bandwidth it was using, but I might be busy with other stuff if my site went unexpectedly viral.

willsmith72•1h ago

so 96% availability = "survived" now?

but interesting write-up. If I were a consumer of OpenFreeMap, I would be concerned that such an availability drop was only detected by user reports

ndriscoll•1h ago

If I were a consumer of a free service from someone who will not take your money to offer support or an SLA (i.e. is not trying to run a business), I would assume there's little to no monitoring at all.

timmg•1h ago

96% during a unique event. I think you would typically consider long term in a stat like that.

Assuming it was close to 100% the rest of the year, that works out to 99.97% over 12 months.

ch33zer•1h ago

Since the limit you ran into was number of open files could you just raise that limit? I get blocking the spammy traffic but theoretically could you have handled more if that limit was upped?

hyperknot•58m ago

I've just written my question to the nginx community forum, after a lengthy debugging session with multiple LLMs. Right now, I believe it was the combination of multi_accept + open_file_cache > worker_rlimit_nofile.

https://community.nginx.org/t/too-many-open-files-at-1000-re...

Also, the servers were doing 200 Mbps, so I couldn't have kept up _much_ longer, no matter the limits.

ndriscoll•43m ago

One thing that might work for you is to actually make the empty tile file, and hard link it everywhere it needs to be. Then you don't need to special case it at runtime, but instead at generation time.

NVMe disks are incredibly fast and 1k rps is not a lot (IIRC my n100 seems to be capable of ~40k if not for the 1 Gbit NIC bottlenecking). I'd try benchmarking without the tuning options you've got. Like do you actually get 40k concurrent connections from cloudflare? If you have connections to your upstream kept alive (so no constant slow starts), ideally you have numCores workers and they each do one thing at a time, and that's enough to max out your NIC. You only add concurrency if latency prevents you from maxing bandwidth.

hyperknot•37m ago

Yes, that's a good idea. But we are talking about 90+% of the titles being empty (I might be wrong on that), that's a lot of hard links. I think the nginx config just need to be fixed, I hope I'll receive some help on their forum.

ndriscoll•4m ago

You could also try turning off the file descriptor cache. Keep in mind that nvme ssds can do ~30-50k random reads/second, so even if every request hit disk 10 times it should be fine. There's also kernel caching which I think includes some of what you'd get from nginx's metadata cache?

toast0•10m ago

I'm pretty sure your open file cache is way too large. If you're doing 1k/sec, and you cache file descriptors for 60 minutes, assuming those are all unique, that's asking for 3 million FDs to be cached, when you've only got 1 million available. I've never used nginx or open_file_cache, but I would tune it way down and see if you even notice a difference. Maybe 10k files, 60s timeout.

> Also, the servers were doing 200 Mbps, so I couldn't have kept up _much_ longer, no matter the limits.

For cost reasons or system overload?

If system overload ... What kind of storage? Are you monitoring disk i/o? What kind of CPU do you have in your system? I used to push almost 10GBps with https on dual E5-2690 [1], but it was a larger file. 2690s were high end, but something more modern will have much better AES acceleration and should do better than 200 Mbps almost regardless of what it is.

[1] https://www.intel.com/content/www/us/en/products/sku/64596/i...

perching_aix•28m ago

Haven't worked with Cloudflare yet first hand, and I'm not familiar with web map tech. But if the site really is pretty much just serving lots of static files, why is Hetzner in the loop? Wouldn't fully migrating to Cloudflare Pages be possible?

internetter•24m ago

The tiles need to be rendered. Yes frequent tiles can be cached but you already have a cache… it’s Cloudflare. Theoretically you could port the tileserver to Cloudflare pages but then you’d need to… port it… and it probably wouldn’t be cheaper

perching_aix•21m ago

Oh interesting, okay. For some reason I had the impression that the tiles were static and rendered offline.

hyperknot•14m ago

They are actually static files. There is just too many of them, about 300 million. You cannot put that in Pages.

andai•12m ago

From the screenshot I wanted to say, couldn't this be done on a single VPS? Seemed over engineered to me. Then I realized the silly pixels are on top of a map of the entire earth. Dang!

I'm curious what the peak req/s is like. I think it might be just barely within the range supported by benchmark-friendly web servers.

Unless there's some kind of order of magnitude slowdowns due to the nature of the application.

Mexico to US Livestock Trade halted due to Screwworm spread

Show HN: The current sky at your approximate location, as a CSS gradient

Long-term exposure to outdoor air pollution linked to increased risk of dementia

OpenFreeMap survived 100k requests per second

Simon Willison's Lethal Trifecta Talk at the Bay Area AI Security Meetup

Quickshell – building blocks for your desktop

Empire of the Absurd: A Brief History of the Absurdities of the Soviet Union

A CT scanner reveals surprises inside the 386 processor's ceramic package

ChatGPT Agent – EU Launch

Accessibility and the Agentic Web

ESP32 Bus Pirate 0.5 – A Hardware Hacking Tool That Speaks Every Protocol

Cordoomceps – replacing an Amiga's brain with Doom

MCP's Disregard for 40 Years of RPC Best Practices

Don Knuth on ChatGPT(07 April 2023)

Jan – Ollama alternative with local UI

Testing Bitchat at the music festival

The dead need right to delete their data so they can't be AI-ified, lawyer says

End-User Programmable AI

Ratfactor's Illustrated Guide to Folding Fitted Sheets

Car has more than 1.2M km on it – and it's still going strong

I want everything local – Building my offline AI workspace

Sandstorm- self-hostable web productivity suite

The current state of LLM-driven development

Partially Matching Zig Enums

Tribblix – The Retro Illumos Distribution

Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

A SPARC makes a little fire

60% of medal of honor recipients are Irish or Irish-American

Tor: How a military project became a lifeline for privacy

Why Wisconsin's county highways are lettered, not numbered (2019)

Mexico to US Livestock Trade halted due to Screwworm spread

Show HN: The current sky at your approximate location, as a CSS gradient

Long-term exposure to outdoor air pollution linked to increased risk of dementia

OpenFreeMap survived 100k requests per second

Simon Willison's Lethal Trifecta Talk at the Bay Area AI Security Meetup

Quickshell – building blocks for your desktop

Empire of the Absurd: A Brief History of the Absurdities of the Soviet Union

A CT scanner reveals surprises inside the 386 processor's ceramic package

ChatGPT Agent – EU Launch

Accessibility and the Agentic Web

ESP32 Bus Pirate 0.5 – A Hardware Hacking Tool That Speaks Every Protocol

Cordoomceps – replacing an Amiga's brain with Doom

MCP's Disregard for 40 Years of RPC Best Practices

Don Knuth on ChatGPT(07 April 2023)

Jan – Ollama alternative with local UI

Testing Bitchat at the music festival

The dead need right to delete their data so they can't be AI-ified, lawyer says

End-User Programmable AI

Ratfactor's Illustrated Guide to Folding Fitted Sheets

Car has more than 1.2M km on it – and it's still going strong

I want everything local – Building my offline AI workspace

Sandstorm- self-hostable web productivity suite

The current state of LLM-driven development

Partially Matching Zig Enums

Tribblix – The Retro Illumos Distribution

Breaking the Sorting Barrier for Directed Single-Source Shortest Paths

A SPARC makes a little fire

60% of medal of honor recipients are Irish or Irish-American

Tor: How a military project became a lifeline for privacy

Why Wisconsin's county highways are lettered, not numbered (2019)

OpenFreeMap survived 100k requests per second

Comments