Hosting staging envs in pricey cloud envs seems crazy to me but I understand why you would want to because modern clouds can have a lot of moving parts.
I'd still like a staging + prod, but keeping the dev environments on a separate beefy server seems smart.
It offloads things like - Power Usage - Colo Costs - Networking (a big one) - Storage (SSD wear / HDD pools) - etc
It is a long list but what doesnt allow you do it make trade offs like spending way less but accept downtime if your switch dies etc etc.
For a staging env these are things you might want to do.
It's fun the first time, but becomes an annoying faff when it has to be repeated constantly.
In Heroku, Vercel and similar you git push and you're running. On a linux server you set up the OS, the server authentication, the application itself, the systemctl jobs, the reverse proxy, the code deployment, the ssl key management, the monitoring etc etc.
I still do prefer a linux server due to the flexibility, but the UX could be a lot better.
This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.
And the overlap between what Nix does and what the 'cloud' does for you is only partial. (Eg it can still make sense to use Nix in the cloud.)
Certainly true, but there are a whole lot of tools to automate those operations so that you aren't doing them constantly.
cloud is easy until is not, for 90% of us maybe we dont need a multi region with hot and cold storage
for those that need it, its neccesary
Is it mostly developer insecurity, or mostly tech leadership insecurity?
Configuring a web server is a low-difficulty task that should be available for any good software developer with 3 days to study for it. It's absurd for a developer to need to configure a web server, but insist on paying a large rent and cede control to some 3rd party instead of just doing it.
It’s a lot cheaper than me learning to bake as well as he does—not to mention dedicating the time every day to get my daily bread—and I’ll never need bread on the kind of scale that would make it worth my time to do so.
But the cloud is different. None of the financial scale benefits are passed on to you. You save serious money running it in-house. The arguments around scale have no validity for the vast, vast majority of use cases.
Vercel isn't selling bread: they're selling a fancy steak dinner, and yes, you can make steak at home for much less, and if you eat fancy steak dinners at fancy restaurants every night you're going to go broke.
So the key is to understand whether your vendors are selling you bread, or a fancy steak dinner, and to not make the mistake of getting the two confused.
I wonder, though—at the risk of overextending the metaphor—what if I don’t have a kitchen, but I need the lunch meeting to be fed? Wouldn’t (relatively expensive) catering routinely make sense? And isn’t the difference between having steak catered and having sandwiches catered relatively small compared to the alternative of building out a kitchen?
What if my business is not meaningfully technical: I’ll set up applications to support our primary function, and they might even be essential to the meat of our work. But essential in the same way water and power are: we only notice it when it’s screwed up. Day-to-day, our operational competency is in dispatching vehicles or making sandwiches or something. If we hired somebody with the expertise to maintain things, they’d sit idle—or need a retainer commensurate with what the Vercels and Herokus of the world are charging. We only need to think about the IT stuff when it breaks—and maybe to the extent that, when we expect a spike, we can click one button to have twice as much “application.”
In that case, isn’t it conceivable that it could be worth the premium to buy our way out of managing some portion of the lower levels of the stack?
Water is cheap, yes. Salt isn't all that cheap, but you only need a little bit.
> [...] and I’ll never need bread on the kind of scale that would make it worth my time to do so.
If you need bread by hand, it's a very small scale affair. Your physique and time couldn't afford you large scale bread making. You'd a big special mixer and a big special oven etc for that. And you'd probably want a temperature and moisture controlled room just for letting your dough rise.
https://postmates.com/store/restaurant-depot-4538-s-sheridan...
I blush to admit that I do from time to time pay $21 for a single sourdough loaf. It’s exquisite, it’s vastly superior to anything I could make myself (or anything I’ve found others doing). So I’m happy to pay the extreme premium to keep the guy in business and maintain my reliable access to it.
It weighs a couple of pounds, though I’m not clear how the water weight factors in to the final weight of a loaf. And I’m sure that flour is fancier than this one. I take your point—I don’t belong in the bread industry :)
(Similarly to how you pay Amazon or Google etc not just for the raw cloud resources, but for the system they provide.)
I grew up in Germany, but now live in Singapore. What's sold as 'good' sourdough bread here would make you fail your baker's training in Germany: huge holes in the dough and other defects. How am I supposed to spread butter over this? And Mischbrot, a mixture of rye and wheat, is almost impossible to find.
So we make our own. The goal is mostly to replicate the everyday bread you can buy in Germany for cheap, not to hit any artisanal highs. (Though they are massively better IMHO than anything sold as artisanal here.)
Interestingly, the German breads we are talking about are mostly factory made. Factory bread can be good, if that's what customers demand.
See https://en.wikipedia.org/wiki/Mischbrot
Going on a slight tangent: with tropical heat and humidity, non-sourdough bread goes stale and moldy almost immediately. Sourdough bread can last for several days or even a week without going moldy in a paper bag on the kitchen counter outside the fridge, depending on how sour you go. If you are willing to toast your bread, going stale during that time isn't much of an issue either.
(Going dry is not much of an issue with any bread here--- sourdough or not, because it's so humid.)
also skills, some people just bake better than others
Wait, what? Salt is literally one of the cheapest of all materials per kilogram that exists in all contexts, including non-food contexts. The cost is almost purely transportation from the point of production. High quality salt is well under a dollar a pound. I am currently using salt that I bought 500g for 0.29 euro. You can get similar in the US (slightly more expensive).
This was a meme among chemical engineers. Some people complain in reviews on Amazon that the salt they buy is cut with other chemicals that make it less salty. The reality is that there is literally nothing you could cut it with that is cheaper than salt.
I think this is partly responsible for the increased popularity of sqlite as a backend. It's super simple and lightstream for recovery isn't that complicated.
Most apps don't need 5 9s, but they do care about losing data. Eliminate the possibility of losing data, without paying tons of $ to also eliminate potential outages, and you'll get a lot of customers.
You get X resources in the cloud and know that a certain request/load profile will run against it. You have to configure things to handle that load, and are scored against other people.
Things like Lambda do fit in this model, but they are too inefficient to model every workload.
Amazon lacks vision.
* The big caveat: If you don't incur the exact same devops costs that would have happened with a linux instance.
Many tools (containers in particular) have cropped up that have made things like quick, redundant deployment pretty straightforward and cheap.
Cloud isn't worth it until suddenly it is because you can't deploy your own servers fast enough, and then it's worth it until it exceeds the price of a solid infrastructure team and hardware. There's a curve to how much you're saving by throwing everything in the cloud.
As cloud marches on it continues to seem like a grift.
The cloud costs includes everything.
Today the smallest, and even large, aws machines are a joke, comparable to a mobile phone from 15 years ago to a terrible laptop today, and take about three to six months to in rent as buying the hardware outright.
If you're on the cloud without getting 75% discount you will save money and headcount by doing everything on prem.
As an example: my Macbook Pro from 2015 had 16 GiB RAM, and that's what my MacBook Air from 2025 also has.
Oh, and the new machine has unified RAM. The old machine had a bit of extra RAM in the GPU that I'm not counting here.
As far as I can tell, the new RAM is a lot faster. That counts for something. And presumably also uses less power.
Dokku can be an option if needed to maintain heroku endpoints.
Quick question: how long would it take to provision and set up another server if this one dies?
Which means, that if they want to test what it will look like running in cloud for prod, they are going to either need a pre-prod environment or go yolo
But to provision a new server, as these are "stateless" (per 12 Factor) servers, it's just 1) get a VPS 2) install Docker+Disco using our curl|sh install script 3) authorize github 4) deploy a "project" (what we call an app), setting the env vars.
All in all ~10 minutes for a new machine.
[0] https://github.com/gregsadetsky/example-flask-site/blob/main...
We used to be on Heroku and the cost wasn't just the high monthly bill - it was asking "is this little utility app I just wrote really worth paying $15/month to host?" before working on it.
This year we moved to a self-hosted setup on Coolify and have about 300 services running on a single server for $300/month on Hetzner. For the most part, it's been great and let us ship a lot more code!
My biggest realization is that for an organization like us, we really only need 99% uptime on most of our services (not 99.99%). Most developer tools are around helping you reach 99.99% uptime. When you realize you only need 99%, the world opens up.
Disco looks really cool and I'm excited to check it out!
We know of two similar cases: a bootcamp/dev school in Puerto Rico that lets its students deploy all of their final projects to a single VPS, and a Raspberry Pi that we've set up at the Recurse Center [0] which is used to host (double checking now) ~75 web projects. On a single Pi!
(Just remember to take regular backups now, so that when this 5 year deal expires you don’t get into the same situation again :-)
> Even with all 6 environments and other projects running, the server's resource usage remained low. The average CPU load stayed under 10%, and memory usage sat at just ~14 GB of the available 32 GB.
This seems like a good idea to have plentiful dev environments and avoid a bad pricing model. If your production instance is still on Heroku, you might still want a staging environment on Heroku since a Hetzner server and your production instance might have subtle differences.
If you can fit them all on a 4 cpu / 32gb machine, you can easily forgo them and run the stack locally on a dev machine. IME staging environments are generally snowflakes that are hard to stand up (no automation).
$500/month each is a gross overpayment.
Not if you're running with external resources of specific type, or want to share the ongoing work with others. Or need to setup 6 different projects with 3 different databases at the same time. It really depends on your setup and way of working. Sometimes you can do local staging easily, sometimes it's going to be a lot of pain.
Especially when I got look at the site in question (idealist.org) and it seems to be a pretty boring job board product.
As for the staging servers, for each deployment, it was a mix of Performance-M dynos, multiple Standard dynos, RabbitMQ, a database large enough, etc. - it adds up quickly.
Finally, Idealist serves ~100k users per day - behind the product is a lot of boring tech that makes it reliable & fast. :-)
That's more than 1/3 of the cost of a developer there.
That will save you some week of a person's work to set things up and half-a-day every couple of months to keep it running. Rounding way up.
Not free, it became a productivity boost.
You now have a $35k annual budget for the maintenance, other overhead, and lost productivity. What do you spend it on?
> The team also took on responsibility for server monitoring, security updates, and handling any infrastructure issues themselves
For a place that’s paying devs $150k a year that might math out. It absolutely does not for places paying devs $250k+ a year.
One of the great frustrations of my mid career is how often people tried to bargain for more speed by throwing developers at my already late project when what would have actually helped almost immediately was more hardware and tooling. But that didn’t build my boss’ or his bosses’ empires. Don’t give me a $150k employee to train, give me $30k in servers.
Absolutely no surprise at all when devs were complicit with Cloud migrations because now you could ask forgiveness instead of permission for more hardware.
Just something to consider if you are in a professional environment before switching your entire infra: maintenance cost is expensive. I strongly suggest to throw man-days in your cost calculation.
To prevent security vulnerabilities, the team will need to write some playbooks to auto-update regularly your machine, hoping for no breaking changes. Or instead write a pipeline for immutable OS images updates. And it often mean testing on an additional canary VM first.
Scaling up the VM from a compute point of view is not that straightforward as well, and will require depending of the provider either downtime or to migrate the entire deployments to a new instance.
Scaling from a disk size point of view, you will need to play with filesystems.
And depending on the setup you are using, you might have to manage lets encrypt, authentication and authorization, secrets vaults, etc (here at least Disco manages the SSL certs for you)
If you are small enough, you are not going to be truly affected by downtime. If you are just a little bigger, a single hot spare is going to be sufficient.
The place where you get dinged is heavy growth in personnel and bandwidth. You end up needing to solve CPU bound activities quicker because it hurts the whole system. You need to start thinking about sticky round robin load balancing and other fun pieces.
This is where the cloud can allow you to trade money for velocity. Eventually, though, you will need to pay up.
That said, the average SaaS can go a long way with a single server per product.
Only if those man-days actually incur a marginal cost. If it's just employees you already have spending their time on things, then it's not worth factoring in because it's a cost you pay regardless.
For example, the "Bridging the Gap: Why Not Just Docker Compose?" section is a 1:1 copy of the points in the "Powerful simplicity" on the landing page - https://disco.cloud/
And this blog post is the (only) case study that they showcase on their main page.
- ...
I'm kidding :-)
Our library is open source, and we're very happy and proud that Idealist is using us to save a bit of cash. Is it marketing if you're proud of your work? :-) Cheers
Marketing should be marketing and clearly so. Tech blogs are about sharing information with the community (Netflix Tech blog is a good example) NOT selling something. Marketing masquerading as a tech blog is offputting to a lot of people. People don't like being fooled with embedded advertising and putting ad copy into such pieces is at best annoying.
Netflix is giving away free water bottles (I hate them, but I use their fast.com super often to test the speeds), another is pretending to be a blog post, but actually being an ad (if that was the case here). You just feel lied to. You cannot take anything seriously you read there, as it will probably be super biased and you cannot get your time back now.
I'm complaining about thinly veiled ad copy wearing the mask of shared technical notes. This is seen as a bad faith effort by the publisher of such notes and a dirty trick played on the reader. Advertising should announce itself for what it is.
I'm very clearly making a distinction, I like A, I don't like B.
You're taking that, saying I must actually hate both A and B, and by the way C through Z because nobody is 111% pure of heart and everybody must have at least some motivation for doing something and nobody is entirely altruistic.... which is just this crazy extreme that it's clear I don't believe at all.
I like the incentive structure that leads Netflix to produce objectively high quality articles sharing with the community in a way that really seems to be entirely untainted by the motivation.
Ad copy in tech notes does seem to taint the motivation and quality of them, it can be innocent but it doesn't seem like it and is generally irritating to a lot of people.
Dislike of a certain kind of advertising doesn't mean I'm sitting around miserable because nobody is truly altruistic as you suggest, and that the issue. My lines of thinking aren't taken to a silly extreme. A lot of disagreements these days are people reinterpreting their opposition as exclusively extremist and that's a problem.
You say you like A and don't like B. You don't like B because it has X in it. But A also has X in it. So why do you like A but not B? It's not logically consistent. We disagree on how much X is in A. You want X to be clearly marked with red tape. It's not clear how reasonable and feasible that is or isn't. I'm saying if you're looking for X, you're going to find trace amounts of it everywhere once you start looking for it. X isn't some previously unheard of chemical that's gonna give you cancer or leaky gut though, it's other people making money. It's been chosen for us, that money is how the world works. It's not how I would do it, but I'm not in charge of the world, so it's a moot point. Everyone is weird about money in their own special way. I am no exception. What sticks in my craw is when people have problems with other people making money. How they make money is material. I'm not okay with making money off of sex trafficking or CSAM, for example, but advertising a product with an interesting bit of writing beforehand isn't that. So on the spectrum of your kid's painting that they made for you in school with crayon that were ethically sourced and drew on recycled paper, to the in your face red plastic Coca-Cola banner wrapped around the side of a bus that's gonna be fed to whales to choke and die on, where this particular blog post lies is for you to determine for yourself. Where I'm really getting at is that requiring X to be at a certain level has the unintended consequence that only big corporations with giant bags of money can create content that passes this purity test of yours, is, if we do some extrapolating, self-defeating.
Mine isn't, unless you make the meaning of that term so broad that it essentially lost any meaningful meaning. (Intentionally meta.)
We're actually mostly talking to people (that "schedule a meeting") to see how we can help them migrate their stuff away (from Heroku, Vercel, etc.)
But we're not sure of the pricing model yet - probably Entreprise features like Gitlab does, while remaining open source. It's a tough(er) balance than running a hosted service where you can "just" (over)charge people.
Why is that an issue? Is it forbidden by HN guidelines? Or would you like all marketing to be marked as such? Which articles aren't marketing, one way or another?
You can also enable zram to compress ram, so you can over-provision like the pros'. A lot of long-running software leaks memory that compresses pretty well.
Here is how I do it on my Hetzner bare-metal servers using Ansible: https://gist.github.com/fungiboletus/794a265cc186e79cd5eb2fe... It also works on VMs.
For an algorithm using the whole memory, that’s a terrible idea.
I understand all of those words, but none of the meaning. Why would I reserve RAM in order to put fast swap on it?
This has a number of benefits: in practice more “active” space is freed up as unused pages are compressed and often compressible. Often times that can be freed application memory that is reserved within application space but in the free space of the allocator, especially if that allocator zeroes it those pages in the background, but even active application memory (eg if you have a browser a lot of the memory is probably duplicated many times across processes). So for a usually invisible cost you free up more system RAM. Additionally, the overhead of the swap is typically not much more than a memcpy even compressed which means that you get dedup and if you compressed erroneously (data still needed) paging it back in is relatively cheap.
It also plays really well with disk swap since the least frequently used pages of that compressed swap can be flushed to disk leaving more space in the compressed RAM region for additional pages. And since you’re flushing retrieving compressed pages from disk you’re reducing writes on an SSD (longevity) and reducing read/write volume (less overhead than naiive direct swap to disk).
Basically if you think of it as tiered memory, you’ve got registers, l1 cache, l2 cache, l3 cache, normal RAM, compressed swap RAM, disk swap - it’s an extra interim tier that makes the system more efficient.
To clarify OP's represention of the tool, it compresses swap space not resident ram. Outside of niche use-cases, compressing swap has overall little utility.
It has the benefit of absorbing memory leaks (which for whatever reason compress really well) and compressing stale memory pages.
Under actual memory pressure performance will degrade. But in many circumstances where your powerful CPU is not fully utilized you can 2x or even 3x your effective RAM (you can opt for zstd compression). zram also enables you to make the trade-off of picking a more powerful CPU for the express purpose of multiplying your RAM if the workload is compatible with the idea.
PS: On laptops/workstations, zram will not interfere with an SSD swap partition if you need it for hibernation. Though it will almost never be used for anything else if you configure your zram to be 2x your system memory.
And of course the overhead is zero when you don't page-out to swap.
Maybe back in the 90s, it was okay to wait 2-3 seconds for a button click, but today we just assume the thing is dead and reboot.
Nowadays when a program hits swap it's not going to fallback to a different memory usage profile that prioritises disk access. It's going to use swap as if it were actual ram, so you get to see the program choking the entire system.
If your GC is a moving collector, then absolutely this is something to watch out for.
There are, however, a number of runtimes that will leave memory in place. They are effectively just calling `malloc` for the objects and `free` when the GC algorithm detects an object is dead.
Go, the CLR, Ruby, Python, Swift, and I think node(?) all fit in this category. The JVM has a moving collector.
Tracing garbage collectors solve a single problem really really well - managing a complex, possibly cyclical reference graph, which is in fact inherent to some problems where GC is thus irreplaceable - and are just about terrible wrt. any other system-level or performance-related factor of evaluation.
There's a lot of "it depends" here.
For example, an RC garbage collector (Like swift and python?) doesn't ever trace through the graph.
The reason I brought up moving collectors is by their nature, they take up a lot more heap space, at least 2x what they need. The advantage of the non-moving collectors is they are much more prompt at returning memory to the OS. The JVM in particular has issues here because it has pretty chunky objects.
If the implementer cares about memory use it won't. There are ways to compact objects that are a lot less memory-intensive than copying the whole graph from A to B and then deleting A.
Even not so modern ones: have you heard of generational garbage collection?
But even in eg Python they introduced 'immortal objects' which the GC knows not to bother with.
It's that "touching" of all the pages controlled by the GC that ultimately wrecks swap performance. But also the fact that moving collector like to hold onto memory as downsizing is pretty hard to do efficiently.
Non-moving collectors are generally ultimately using C allocators which are fairly good at avoiding fragmentation. Not perfect and not as fast as a moving collector, but also fast enough for most use cases.
Java's G1 collector would be the worst example of this. It's constantly moving blocks of memory all over the place.
If swapping to SSD is 'extremely slow', what's your term for swapping to HDD?
I regularly use it on my Snapdragon 870 tablet (not exactly a top of the line CPU) to prevent OOM crashes (it's running an ancient kernel and the Android OOM killer basically crashes the whole thing) when running a load of tabs in Brave and a Linux environment (through Tmux) at the same time.
ZRAM won't save you if you do actually need to store and actively use more than the physical memory but if 60% of your physical memory is not actively used (think background tabs or servers that are running but not taking requests) it absolutely does wonders!
On most (web) app servers I happily leave it enabled to handle temporary spikes, memory leaks or applications that load a whole bunch of resources that they never ever use.
I'm also running it on my Kubernetes cluster. It allows me to set reasonable strict memory limits while still having the certainty that Pods can handle (short) spikes above my limit.
In the age of microservices and cattle servers, reboot/reinstall might be cheap, but in the long run it is not. A long running server, albeit being cattle, is always a better solution because esp. with some excess RAM, the server "warms up" with all hot data cached and will be a low latency unit in your fleet, given you pay the required attention to your software development and service configuration.
Secondly, Kernel swaps out unused pages to SWAP, relieving pressure from RAM. So, SWAP is often used even if you fill 1% of your RAM. This allows for more hot data to be cached, allowing better resource utilization and performance in the long run.
So, eff it, we ball is never a good system administration strategy. Even if everything is ephemeral and can be rebooted in three seconds.
Sure, some things like Kubernetes forces "no SWAP, period" policies because it kills pods when pressure exceeds some value, but for more traditional setups, it's still valuable.
It doesn't. SSDs came a long way but so did memory dies and buses, and with that the way programs work also changed as more and more they are able to fit their stacks and heaps on memory more often than not.
I have had a problem with shellcheck that for some reason eats up all my ram when I open I believe .zshrc and trust me, it's not invisible. The system crawls to a halt.
If we're talking about SATA SSDs which top at 600MBps, then yes, an aggressive application can make itself known. However, if you have a modern NVMe, esp. a 4x4 one like Samsung 9x0 series or if you're using a Mac, I bet you'll notice the problem much later, if ever. Remember the SSD trashing problem on M1 Macs? People never noticed that system used SWAP that heavily and trashed the SSD on board.
Then, if you're using a server with a couple of SAS or NVMe SSDs, you'll not notice the problem again, esp. if these are backed by RAID (even md counts).
( ) a 1% chance the system would crawl to a halt but would work
( ) a 1% change the kernel would die and nothing would work
If your problem doesn't keep growing, and you just have more data that programs want to keep in memory than you have RAM, but the actual working set of what's accessed frequently still fits in RAM, then swap perfectly solves this.
Think lots of programs open in the background, or lots of open tabs in your browser, but you only ever rapidly switch between at most a handful at a time. Or you are starting a memory hungry game and you don't want to be bothered with closing all the existing memory hungry programs that idle in the background while you play.
> Doesn't swap just delay the fundamental issue?
The fundamental issue here is what the linux fanboys literally think what killing a working process and most of the time the process[0] is a good solution for not solving the fundamental problem of memory allocation in the Linux kernel.
Availability of swap allows you to avoid malloc failure in a rare case your processes request more memory than physically (or 'physically', heh) present in the system. But in the mind of so called linux administrators even if a one byte of the swap would be used then the system would immediately crawl to a stop and never would recover itself. Why it always should be the worst and the most idiotic scenario instead of a sane 'needed 100MB more, got it - while some shit in the memory which wasn't accessed since the boot was swapped out - did the things it needed to do and freed that 100MB' is never explained by them.
[0] imagine a dedicated machine for *SQL server - which process would have the most memory usage on that system?
Also: When those processes that haven't been active since boot (and which may never be active again) are swapped out, more system RAM can become available for disk caching to help performance of things that are actively being used.
And that's... that's actually putting RAM to good use, instead of letting it sit idle. That's good.
(As many are always quick to point out: Swap can't fix a perpetual memory leak. But I don't think I've ever seen anyone claim that it could.)
Adding a couple of gb of swap means the image resizing is _slow_, but completes without causing issues.
Eg Google used to (and perhaps still does?) run their servers without swap, because they had built fault tolerance in their fleet anyway, so were happier to deal with the occasional crash than with the occasional slowdown.
For your desktop at home, you'd probably rather deal with a slowdown that gives you a chance to close a few programs, then just crashing your system. After all, if you are standing physically in front of your computer, you can always just manually hit the reset button, if the slowdown is too agonising.
I’m gonna guess you’re not old enough to remember computers with memory measured in MB and IDE hard disks? Swapping was absolutely brutal back then. I agree with the other poster, swap hitting an SSD is a barely noticeable in comparison.
This is not about belief, but lived experience. Setting up swap to me is a choice between a unresponsive system (with swap) or a responsive system with a few oom kills or downed system.
I mean, I manage some servers, and this is my experience.
> Setting up swap to me is a choice between a unresponsive system (with swap) or a responsive system with a few oom kills or downed system.
Sorry, but are you sure that you budgeted your system requirements correctly? A Linux system shall neither fill SWAP nor trigger OOM regularly.
With a good amount of swap, you don't have to worry about closing programs. As long as your 'working set' stays smaller than your RAM, your computer stays fast and responsive, regardless of what's open and idling in the background.
Without swap oom killer runs and things become responsive.
If the slowest drive on the machine is the SSD, how does caching to swap help?
This cache is evictable, but it'll be there eventually.
Linux used to don't touch unused pages in the RAM in the older days if your RAM was not under pressure, but now it swaps out pages unused for a long time. This allows more cache space in RAM.
> how does caching to swap help?
I think I failed to convey what I tried to say. Let me retry:
Kernel doesn't cache to SSD. It swaps out unused (not accessed) but unevictable pages to SWAP, assuming that these pages will stay stale for a very long time, allowing more RAM to be used as cache.
When I look to my desktop system, in 12 days, Kernel moved 2592MB of my RAM to SWAP despite having ~20GB of free space. ~15GB of this free space is used as disk cache.
So, to have 2.5GB more disk cache, Kernel moved 2592 MB of non-accessed pages to SWAP.
wallstop@fridge:~$ free -m
total used free shared buff/cache available
Mem: 15838 9627 3939 26 2637 6210
Swap: 4095 0 4095
wallstop@fridge:~$ uptime
00:43:54 up 37 days, 23:24, 1 user, load average: 0.00, 0.00, 0.00
This is from another system I have close:
total used free shared buff/cache available
Mem: 31881 1423 1042 10 29884 30457
Swap: 976 2 974
2MB of SWAP used, 1423 MB RAM used, 29GB cache, 1042 MB Free. Total RAM 32 GB.I DON’T WANT THE KERNEL PRIORITIZING CACHE OVER NRU PAGES.
The easiest way to do this is to disable swap.
When you call malloc(), it requests a big chunk of memory from the OS, in units of pages. It then uses an allocator to divide it up into smaller, variable length chunks to form each malloc() request.
You may have heard of “heap” memory vs “stack” memory. The stack of course is the execution/call stack, and heap is called that because the “heap allocator” is the algorithm originally used for keeping track of unused chunks of these pages.
(This is beginner CS stuff so sorry if it came off as patronizing—I assume you’re either not a coder or self-taught, which is fine.)
I'm not an AWS guy. I can see and touch the servers I manage, and in my experience, SWAP works, and works well.
> Secondly, Kernel swaps out unused pages to SWAP, relieving pressure from RAM. So, SWAP is often used even if you fill 1% of your RAM. This allows for more hot data to be cached, allowing better resource utilization and performance in the long run.
Yes, and you can observe that even in your desktop at home (if you are running something like Linux).
> So, eff it, we ball is never a good system administration strategy. Even if everything is ephemeral and can be rebooted in three seconds.
I wouldn't be so quick. Google ran their servers without swap for ages. (I don't know if they still do it.) They decided that taking the slight inefficiency in memory usage, because they have to keep the 'leaked' pages around in actual RAM, is worth it to get predictability in performance.
For what it's worth, I add generous swap to all my personal machines, mostly so that the kernel can offload cold / leaked pages and keep more disk content cached in RAM. (As a secondary reason: I also like to have a generous amount of /tmp space that's backed by swap, if necessary.)
With swap files, instead of swap partitions, it's fairly easy to shrink and grow your swap space, depending on what your needs for free space on your disk are.
So no, my experience with swap isn't that it's invisible with SSD.
I've had good experience with linux's multi-generation LRU feature, specifically the /sys/kernel/mm/lru_gen/min_ttl_ms feature that triggers OOM-killer when the "working set of the last N ms doesn't fit in memory".
On my 8gb M1 Mac, I can have a ton of tabs open and it'll swap with minimal slowdown. On the other hand, running a 4k external display and a small (4gb) llm is at best horrible and will sometimes require a hard reset.
I've seen similar with different combinations of software/hardware.
I still don’t use it though.
I once used an Intel Optane drive as swap for a job that needed hundreds of gigabytes of ram (in a computer that maxed out at 64 gigs). The latency was so low that even while the task was running the machine was almost perfectly usable; in fact I could almost watch videos without dropping frames at the same time.
Yup, this is a thing. It happens because file-backed program text and read-only data eventually get evicted from RAM (to make room for process memory) so every access to code and/or data beyond the current 4K page can potentially involve a swap-in from disk. It would be nice if we had ways of setting up the system so that pages of code or data that are truly critical for real-time responsiveness (including parts of the UI) could not get evicted from RAM at all (except perhaps to make room for the OOM reaper itself to do its job) - but this is quite hard to do in practice.
Swap helps you use ram more efficiently, as you put the hot stuff in swap and let the rest fester on disk.
Sure if you overwhelm it, then you're gonna have a bad day, but thats the same without swap.
Seriously, swap is good, don't believe the noise.
Many won't enable swap. For some swap wouldn't help anyways, but others it could help soak up spikes. The latter in some cases will upgrade to a larger instance without even evaluating if swap could help, generating AWS more money.
Either way it's far-fetched to derive intention from the fact.
That is my interpretation of what people are saying upthread, at least. To which posters such as yourself are saying “you still need swap.” Why?
It's a bit wasteful to provision your computers so that all the cold data lives in expensive RAM.
If you size your RAM and swap right, you get no service degradation, but still get away with using less RAM.
But when I was at Google (about a decade ago), they followed exactly the philosophy you were outlining and disabled swap.
But that's a job applications are already doing. They put data that's being actively worked on in RAM they leave all the rest in storage. Why would you need swap once you can already fit the entire working set in RAM?
In that case, and if you are only running these applications, the need for swap is much less.
Swap ram by itself would be stupid but no one doing this isn’t also turning on compression.
Example on my personal VPS
$ free -m
total used free shared buff/cache available
Mem: 3923 1225 328 217 2369 2185
Swap: 1535 1335 200
Swap is good to have. The value is limited but real.
Also not having swap doesn't prevent thrashing, it just means that as memory gets completely full you start dropping and re-reading executable code over and over. The solution is the same in both cases, kill programs before performance falls off a cliff. But swap gives you more room before you reach the cliff.
Gives you some time to upgrade, or tune services before it goes ka-boom.
If your memory usage spikes suddenly, a nominal amount of swap isn't stopping anything from getting killed; you're at best buying yourself a few seconds, so unless you spend your time just staring at the server, it'll be dead anyways.
To enable a swap file in Linux, first create the swap file using a command like sudo dd if=/dev/zero of=/swapfile bs=1G count=1 for a 1GB file. Then, set it up with sudo mkswap /swapfile and activate it using sudo swapon /swapfile. To make it permanent, add /swapfile swap swap defaults 0 0 to your /etc/fstab file.
And that was like... two years ago? 1GB of RAM and actually ~700MB usable before I found the proper magik incantations to really disable kdump.
Also have used 1GB machines for literally years.
Strongly suggest you shouldn't strongly suggest.
You do understand what's being discussed... right?
Or you have a very peculiar understanding what 'VPS' means.
fallocate -l 1G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
Works really well with no problems that I've seen. Really helps give a bit more of a buffer before applications get killed. Like others have said, with SSD the performance hit isn't too bad.Partly it's a money thing (they want to sell you RAM), partly it's so that the shared disk isn't getting thrashed by multiple VPS
NSDJUST=$(pgrep -x nsd); echo -en '-378' > /proc/"${NSDJUST}"/oom_score_adj
Another useful thing to do is effecively disable over-commit on all staging and production servers (0 ratio instead of 2 memory to fully disable as these do different things, memory 0 still uses formula) vm.overcommit_memory = 0
vm.overcommit_ratio = 0
Also use a formula to set min_free and reserved memory using a formula from Redhat that I do not have handy based on installed memory. min_free can vary from 512KB to 16GB depending on installed memory. vm.admin_reserve_kbytes = 262144
vm.user_reserve_kbytes = 262144
vm.min_free_kbytes = 1024000
At least that worked for me in about 50,000 physical servers for over a decade that were not permitted to have swap and installed memory varied from 144GB to 4TB of RAM. OOM would only occur when the people configuring and pushing code would massively over-commit and not account for memory required by the kernel. Not following best practices defined by Java and thats a much longer story.Another option is to limit memory per application in cgroups but that requires more explaining than I am putting in an HN comment.
Another useful thing is to never OOM kill in the first place on servers that are only doing things in memory and need not commit anything to disk. So don't do this on a disked database. This is for ephemeral nodes that should self heal. Wait 60 seconds so drac/ilo can capture crash message and then earth shattering kaboom...
# cattle vs kittens, mooooo...
kernel.panic = 60
vm.panic_on_oom = 2
For a funny side note, those options can also be used as a holy hand grenade to intentionally unsafely reboot NFS diskless farms when failing over to entirely different NFS server clusters. setting panic to 15 mins, triggering OOM panic by setting min_free to 16TB at the command line via Ansible not in sysctl.conf, swapping clusters, arp storm and reconverge.Is earlyoom a better solution than that to prevent an erratic process from making an instance unresposnsive?
systemd-oomd and oomd use the kernel's PSI[2] information which makes them more efficient and responsive, while earlyoom is just polling.
earlyoom keeps getting suggested, even though we have PSI now, just because people are used to using it and recommending it from back before the kernel had cgroups v2.
[0]: https://www.freedesktop.org/software/systemd/man/latest/syst...
What's in it for Disco ?
What's the pricing ?
How many work hours per month does keeping this thing stable take.
If it takes over 15 Heroku is cheaper.
Hosting with bare metal is still expensive, you pay in other ways.
The key element here is the need to continuously exercise both processes (Heroku + your staging server), to work out both processes & maintain familiarity on both.
Depending on the amount of staff involved in the above, it might eclipse the compute savings, but only OP knows those details. I'm sure they are a smart bunch.
At least, the "fear" factor (will the new system work? what bugs will it introduce? how much time will I spend, etc.) pushes a lot of folks to accept a very big price differential aka known knowns versus unknowns...
It's understandable really. It's just that once you've migrated, you almost definitely never want to go back :-)
...but this CX33 "server" being discussed - is a 6 bucks a month VPS [0]
Normally you build a prototype on laptop and move it out to fat hardware when it outgrows that. Here they started with 3k infra and then later realized it runs on toaster. Completely back to front.
Maybe they just never iterated on a local version and nobody developed an intuition for requirements. Switched straight to iterating on a nebulous cloud where you can't tell how much horsepower is behind the cloudfunctions etc.
Presumably there is a perfectly reasonably explanation and it's just not spelled out, it just seems weird based on given info
From looking at your docs, it appears like using and connecting GitHub is a necessary prerequisite for using Disco. Is that correct? Can disco also deploy an existing Docker image in a registry of my choosing without a build step? (Something like this with Kamal: `kamal --skip-push --version latest`)
However, yes, you can ask Disco to fetch an existing Docker image (we use that to self-host RabbitMQ). An example of deploying Meilisearch's image is here [0] with the tutorial here [1].
Do you typically build your Docker images and push them to a registry? Curious to learn more about your deployment process.
[0] https://github.com/letsdiscodev/sample-meilisearch/blob/main...
It's a shame they don't just license all their software stack at a reasonable price with a similar model like Sidekiq and let you sort out actually decent hardware. It's insane to consider Heroku if anything has gotten more expensive and worse compared to a decade ago yet in comparison similar priced server hardware has gotten WAY better of a decade. $50 for a dyno with 1 GB of ram in 2025 is robbery. It's even worse considering running a standard rails app hasn't changed dramatically from a resources perspective and if anything has become more efficient. It's comical to consider how many developers are shipping apps on Heroku for hundreds of dollars a month on machines with worse performance/resources than the macbook they are developing it on.
It's the standard playback that damn near everything in society is going for though just jacking prices and targeting the wealthiest least price sensitive percentiles instead of making good products at fair prices for the masses.
We built and open sourced https://canine.sh for exactly that reason. There’s no reason PaaS providers should be charging such a giant markup over already marked up cloud providers.
Regardless, you're going to have a much easier time developing your app if your datastore access latency is submillisecond rather than tens of milliseconds.
So that extra trouble might be worth it...
You can also self host almost any open source service without any fuss, and perform internal networking with telepresence. (For example, if you want to run an internal metabase that is not available on public internet, you can just run `telepresence connect`, and then visit the private instance at metabase.svc.cluster.local).
Canine tries to leverage all the best practices and pre-existing tools that are already out there.
But agreed, business critical databases probably shouldn't belong on Kubernetes.
It's insane how much a restaurant charges for a decent steak, I can do it much cheaper myself!
...!
By the time the product is a success and reaches a scale where it becomes cost prohibitive, they have enough resources to expand or migrate away anyway.
I suppose for solo devs it might be cheaper to setup a box for fun, but even then, I would argue that not everyone enjoys doing devops and prefers spending their time elsewhere.
Or they didn't check. A business still existing is pretty weak evidence that the pricing is reasonable.
No one gets hurt if someone else chooses to waste their money on Heroku so why are people complaining? Of course it applies in cases where there aren't a lot of competitors but there are literally hundreds of different of different options for deploying applications and at least a dozen of them are just as reliable and cheaper than Heroku.
Really? I mean oil changes are pretty cheap. You can get an oil change at walmart for like 40 bucks.
AWS isn't much better honestly.. $50/month gets you an m7a.medium which is 1 vCPU (not core) and 4GB of RAM. Yes that's more memory but any wonder why AWS is making money hand-over-fist..
To compare to Heroku's standard dynos (which are shared hosting) you want the t3a family which is also shared, and much cheaper.
If you reserve that instance you can get it for 40% cheaper, or get 4 cores instead.
Yes it's more expensive than OVH but you also get everything AWS to offer.
Heroku's pricing has _remained the same_ for at least seven years, while hardware has improved exponentially. So when you look at their pricing and see a scam, what you're actually doing is comparing a 2025 anchor to a mid-2010s price that exists to retain revenue. At the big cloud vendors, they differentiate customers by adding obstacles to unlocking new hardware performance in the form of reservations and updated SKUs. There's deliberate customer action that needs to take place. Heroku doesn't appear to have much competition, so they keep their prices locked and we get to read an article like this whenever a new engineer discovers just how capable modern hardware is.
Heroku has obviously stagnated now but their stack is _very cool_ for if you have a fairly simple system but still want all the nice parts of a mode developed ops system. It almost lets you get away with not having an ops team for quite a while. I don't know any other provider that is low-effort "decent" ops (Fly seems to directionally want to be new Heroku but is still missing a _lot_ in my book, though it also has a lot)
Netlify sets the same prices.
Just throw it into a cloud bucket from CI and be done with it.
That said, they are emerging. I'm actually working on a drop-in Vercel competitor at https://www.sherpa.sh. We're 70% lower cost by running on EU based CDN and dedicated servers (Hetzner, etc). But we had to build the relationships to solve all the above challenges first.
Every other time I login to the admin site I get a Heroku error.
I even shown one customer that their elaborate cluster costing £10k a month could run on a £10 vps faster and with less headache (they set it up for "big data" thinking 50GB is massive. There was no expectation of the database growing substantially beyond that).
Their response? Investors said it must run on the cloud, because they don't want to lose their money if homegrown setup goes down.
So there is that.
How do you typically deploy this?
> Critically, all staging environments would share a single "good enough" Postgres instance directly on the server, eliminating the need for expensive managed database add-ons that, on Heroku, often cost more than the dynos themselves.
Heroku also has cheaper managed database add-ons, why not use something like that for staging? The move to self hosting might still make sense, my point is that perhaps the original staging costs of $500/mo could have been lower from the start.
It's like juniors who did not recieve a proper training/education got hired into companies where someone told them to go serverless on some heroku or vercel, or use some incredibly expensive aws service because that's a "modern correct way" to do it, except now they were a developer for long enough to get a "senior" title in their job title now are in positions of actually modelling this architecture themselves
The challenge I always face with homebrew PaaS solutions is that you always end up moving from managing your app to managing your PaaS.
This might not be true right now but as complexity of your app grows it’s almost always the eventual outcome.
They offer convenience
Genuine question.
The draw of a docker-compose-like interface for deployment is so alluring that I have spent the last year or so working on a tool called Defang that takes a compose file and deploys it to the cloud. We don't support Hetzner (yet), but we do support AWS, GCP, and DO. We provision networking, IAM, compute, database, secrets, etc in your cloud account, so you maintain full control, but you also get the ergonomics of compose.
If you are on a PaaS and you want to reduce cost without losing ergonomics and scalability, it might be interesting.
AMD Ryzen™ 7 3700X CPU 8 cores / 16 threads @ 3.6 GHz Generation: Matisse (Zen2) RAM 64 GB DDR4 ECC
Drives 4 x 22 TB HDD 2 x 1 TB SSD
is only 104 euros a month on Hetzner.
The STORAGE alone would cost $1624 a month in most clouds
I mean something like a list of moving parts so I can understand how it works. Perhaps something like this:
https://caprover.com/#:~:text=CapRover%20Architecture%20at%2...
Once everything is installed/running, a very tldr diagram would be:
GitHub (webhook on git push) -> Docker swarm running Caddy -> Disco Daemon REST API which will ask Docker to build the image, and then does a blue-green zero-time deployment swap
But yeah, a clearer/better diagram would be great. Thanks for the push!
And your description is a great macro view of it. Thanks!
If you're running something that's too expensive for your taste and can share more information, happy to brainstorm some options.
I did just this using Coolify, Mythic Beasts running Django & Postgres the other month from Google App Engine. Hilariously easy, even with my extremely rusty skills.
Either that or use a PaaS that deploys to VMs. Can't make recommendations here but you could start by looking at Semaphore, Dokku, Dokploy.
Bring back sanity to tech.
The lead who wrote it had never even profiled code before, after some changes we cut it down to ~$0.01/per, but that's still insane.
but glad we have new product offering for this
If you setup a server with the curl|sh install script on the homepage, you'll get a url at the end that directs you there. And you can use the CLI too of course.
But yeah, thanks for the reminder!
gregsadetsky•7h ago
Lots of conversation & discussion about self-hosting / cloud exits these days (pros, cons, etc.) Happy to engage :-)
Cheers!
bstsb•7h ago
alberth•7h ago
Would be great to have a comparison on the main page of Disco
odie5533•7h ago
ajayvk•1h ago
I am building https://github.com/openrundev/openrun/. Main difference is that OpenRun has a declarative interface, no need for manual CLI commands or UI operations to manage apps. Another difference is that OpenRun is implemented as a proxy, it does not depend on Traefik/Nginx etc. This allows OpenRun to implement features like scaling down to zero, RBAC access control for app access, audit logs etc.
Downside with OpenRun is that is does not plan to support deploying pre-packaged apps, no Docker compose support. Streamlit/Gradio/FastHTML/Shiny/NiceGUI apps for teams are the target use case. Coolify has the best support and catalog of pre-packaged apps.
gregsadetsky•7h ago
I'd say the main differences is that we 1) we offer a more streamlined CLI and UI rather than offering extensive app/installation options 2) have an api-key based system that lets team members collaborate without having to manage ssh access/keys.
Generally speaking, I'd say our approach and tooling/UX tends to be more functional/pragmatic (like Heroku) than one with every possible option.
Onavo•7h ago
https://news.ycombinator.com/item?id=44292103
https://news.ycombinator.com/item?id=44873057
martinald•7h ago
The load average in htop is actually per CPU core. So if you have 8 CPU cores like in your screenshot, a load average of 0.1 is actually 1.25% (10% / 8) of total CPU capacity - even better :).
Cool blog! I've been having so much success with this type of pattern!
gregsadetsky•7h ago