Sometimes I think if backing store and swap were more clearly delineated we might have got to decent algorithms sooner. Having a huge amount of swap pre-emptively claimed was making it look like starvation, when it was just a runtime planning strategy. It's also confusing how top and vmstat report things.
Also, as a BSD mainly person, I think the differences stand out. I haven't noticed an OOM killer approach on BSD.
Ancient model: twice as much swap as memory
Old model: same amount of swap as memory
New model: amount of swap your experience tells you this job mix demands to manage memory pressure fairly, which is a bit of a tall ask sometimes, but basically pick a number up to memory size.
This is the most important reason I try to avoid having a large swap. The duration of pathological behavior at near-OOM is proportional to the amount of swap you have. The sooner your program is killed, the sooner your monitoring system can detect it ("Connection refused" is much more clear cut than random latency spikes) and reboot/reprovision the faulty server. We no longer live in a world where we need to keep a particular server online at all cost. When you have an army of servers, a dead server is preferable to a misbehaving server.
OP tries to argue that a long period of thrashing will give you an opportunity for more visibility and controlled intervention. This does not match my experience. It takes ages even to log in to a machine that is thrashing hard, let alone run any serious commands on it. The sooner you just let it crash, the sooner you can restore the system to a working state and inspect the logs in a more comfortable environment.
Like something is going very wrong if the system is in that state, so I want everything to die immediately.
So they invested in additional swap space, let the processes slowly grow, swap out leaked stuff and restart them all over the weekend...
on some workloads this may represent a non-trivial drop in performance due to stale, anonymous pages taking space away from more important use
WTF?
So even if you never run into OOM situations, adding a couple gigabytes of swap lets you free up that many gigabytes of RAM for file caching, and suddenly your application is on average 5x faster - but takes 3 seconds longer to service that one obscure API call that needs to dig all those pages back up. YMMV if you prefer consistently poor performance over inconsistent but usually much better performance.
Eventually I wrote a small script that does the equivalent of "sudo swapoff -a && sudo swapon -a" to eagerly flush everything to RAM, but I was surprised by how many people seemed to think there's no legitimate reason to ever want to do so.
Worst thing: I left 5% of my SSD unused which will actually be used for garbage collection and other staff. That's OK.
What I don't understand is why modern Linux is so shy of touching swap. With old kernels, Linux happily pushed unused pages to a swap, so even if you don't eat memory, your swap will be filled with tens or hundreds MB of memory and that's a great thing. Modern kernel just keeps swap usage at 0, until memory is exhausted.
That's a couple terabyte of swap on servers these days, and even on laptops I wouldn't want to deal with 300-ish GB swap.
01HNNWZ0MV43FF•2h ago
Configure it to fire at like 5% and forget it.
I've never seen the OOM do its dang job with or without swap.
cmckn•1h ago