In the end, Hetzner is a provider of "cheap but not 100% uptime" infrastructure, probably why it's so cheap in the first place.
As every other provider, if you want 100% uptime (or getting close to it), you really need at least N+1 instances of everything, as every hosting provider end up fucking something up, sooner or later.
Sure they’ll throw you some service credits. But it’ll always be magnitudes less than the cost of their disruption to you.
I've used Vultr for about the same amount of time, and I never got an email that some network switch had a hardware failure and it'll take a couple of hours to restore connectivity, but I've had that happen with Hetzner more than once, in the same time-span. And again, I say this as a Hetzner-lover, and someone who prefers Hetzner over Vultry any day of the week.
We kept most smaller-scale, stateless services in AWS but migrated databases and high-scale / high-performance services to bare metal servers.
Backups are stored in S3 so we still benefit from their availability.
Performance is much higher thanks to physically attached SSDs and DDR5 on-die RAM.
Costs are drastically lower and for much larger server sizes which means we are no getting stressed about eventually needing to scale up our RDS / EC2 costs.
It's literally a agency doing professional development for others, among other services. Clearly not "toys".
HN dismissals are going down in quality, at least they used to be well researched some years ago. Now people just spew out the first thing that comes up in their mind, and zero validation before hitting that "reply" button.
It's a rotten attitude, and judging a projects worth by an AWS bill is a very poor comparator. I could spin up a massive aws bill doing some pointless machine learning workloads, is that suddenly a valid project in your eyes?
Can you spin it on a AWS competitor for a fraction of a cost? Absolutely yes I would be interested in reading about it!
When I've needed dedicated servers in the US I've used Vultr in the past, relatively nice pricing, only missing unmetered bandwidth for it to be my go-to. But all those US-specific cases been others paying for it, so hasn't bothered me, compared to personal/community stuff I host at Hetzner and pay for myself.
This wasn't a consideration a few years ago, but with how quickly things are devolving south of the border it's now much more of a risk. If I were operating a company in Canada, I would want to be able to assure my customers that their data won't get expropriated to the US without first going through Canadian courts.
OVH Canada now has two Canadian locations, by the way - the original location in Beauharnois and a new location in Cambridge, so you even can have two zones for redundancy.
For example, I got a dedicated server from Hetzner earlier this year with a consumer Ryzen CPU that had unstable SIMD (ZFS checksums would randomly fail, and mprime also reported errors). Opened a ticket about it and they basically told me it wasn't an issue because their diagnostics couldn't detect it.
And based on our different experiences, the quality of care you receive could differ too :)
To be fair, they probably would've done the same for me if I'd pushed the issue further, but after over a week of trying to diagnose the issue and convince them that it wasn't an problem with the hard drives (they said one of the drives was likely faulty and insisted on replacing it and having me resilver the zpool to see if it fixed the issue. spoiler: it didn't) I just gave up, disabled SIMD in ZFS and moved on.
That sucks big time :( In the most recent case I can recall, I successfully got access, noticed weirdness, gathered data and sent an email, and had a new instance within 2-3 hours.
Overall, based on comments here on HN and otherwhere, the quality and speed of support is really uneven.
Can you name one tech company that's scaled passed the point where the founders are closely involved with support that has consistently good tech support? I think this is just really hard to get right, as many customers are not as knowledgeable as they think they are.
Probably the company most people have had any sort of consistency from would be Stripe I think. Of course, there are cases where they haven't been great, but if you ask me for a company with the best tech support, Stripe comes to mind first.
I'm not sure it's active anymore, but there used to be a somewhat hidden and unofficial support channel in #stripe@freenode back in the day, where a bunch of Stripe developers hanged out and helped users in an in-official capacity. That channel was a godsend more than once.
Too cool to not share, most of the providers listed there have dedicated servers too.
Edit: Ironically, that website doesn't have Hetzner in their index.
excellent website, thanks.
The article is worth the read.
https://dillonshook.com/postgres-cloud-benchmarks-for-indie-...
FWIW, Hetzner has two data centers in the US, in case you're just looking for "Hetzner quality but in the US", not for "American/Canadian companies similar to Hetzner".
Hetzner, OVH, Leaseweb, and Scaleway (EU locations only).
I've used other providers as well, but I won't mention them because they were either too small or had issues.
Years ago Broadberry has a similar thing with Supermicro, but not any more. You have to talk to a sales person about how they can rip you off. Then they don't give you what you specced anyway -- I spec 8x8G sticks of ram, they provide 2x32G etc.
In a thread two days ago https://ioflood.com/ was recommended as US-based alternative
https://www.layerstack.com/en/dedicated-cloud
Clouvider is available in alot of US DCs, 4GB ram/2cpu/80GB NVME and a 10Gb port for like $6 a month.
We are running modest operations on European VPS provider where I work and whenever we get a new hire (business or technical does not matter) it is like a Groundhog day - I have to explain — WE ALREADY ARE IN THE CLOUD, NO YOU WILL NOT START "MIGRATING TO CLOUD PROJECT" ON MY WATCH SO YOU CAN PAD YOUR CV AND MOVE TO ANOTHER COMPANY TO RUIN THEIR INFRA — or something along those lines but asking chatgpt to make it more friendly tone.
Google doesn't even deploy most of its own code to run on VMs. Containers yes but not VMs.
I have ran services on bare metal, and VPSs, and I always got far better performance than I can get from AWS or GCP for a small fraction of the cost. To me "cloud" means vendor lock-in, terrible performance, and wild costs.
People do not realize for that fancy infinite storage scaling, that it means that AWS etc run network based storage. And that, like on a DB, can be a 10x performance hit.
In the best case scenario. In the worst, some cluster f-up will eat 10x that in engineering time.
The only benefit you get is reliability, temporary network issues on AWS are not a thing.
On DigitalOcean they are fairly bad (I lose thousands of requests almost every month and I get pennies in credit back when I complain - while my users churning cost way more), on Hetzner I've heard mixed reviews.
Some people complains, some say it's extremely reliable.
I'm looking forward to try Hetzner out!
Yeah, I remember when AWS first appeared, and the value proposition was basically "It's expensive but you can press a button and a minute later you have a new instance, so we can scale really quickly". For the companies that know more or less the workload they have during a week don't really get any benefits, just more expensive monthly bills.
But somewhere along the line, people started thinking it was easier to use AWS than the alternatives, and I even heard people saying it's cheaper...
The biggest innovation AWS delivered was to convince engineers they are cheap, while wresting control of provisioning away from the people with actual visibility into the costs.
But in general if you don't need to scale crazy Hetzner is amazing, we still have a lot of stuff running on Hetzner but fan out to other services when we need to scale.
My point of people moving to Hetzner for the dedicated instances rather than the cloud still remains though, at least in my bubble.
I'm not sure if this is a difference between other clouds, at least a few years ago this was a weekly or even daily problem in GCP; my experience is if you request hundreds of VMs rapidly during peak hours, all the clouds struggle.
At the scale of providers like AWS and even the smaller GCP, “hundreds of VMs” is not a large amount.
Now maybe after the AI demand and waves of purchases of systems appropriate for that things have improved, but it definitely wasn’t the case at the large scale employer I worked at in 2023 (my current employer is much smaller, so doesn’t have those needs, so I can’t comment)
So you have approx 1MM concurrent customers? That's a big number. You should definitely be able to get preferred pricing from AWS at that scale.
https://www.linkedin.com/posts/jeroen-jacobs-8209391_somethi...
I didn't know AWS and GCP also did it. Not surprised.
The problem is that European regulators do nothing about such anti-competitive dirty tricks. The big clouds hide behind "lots of spam coming from them", which is not true.
On the other hand, someone linked a report from last year[0]:
> 72% of BEC attacks in Q2 2024 used free webmail domains; within those, 72.4% used Gmail. Roughly ~52% of all BEC messages were sent from Gmail accounts that quarter.
[0] https://docs.apwg.org/reports/apwg_trends_report_q2_2024.pdf
And just deleting it and starting again is just going to give you the exact same IP again!
I ended up having to buy a dozen or so IPs until I found one that wasn't blocked, and then I could delete all the blocked ones.
He's also just released a book on hosting scale production Python apps [3]. Haven't read yet though would assume it'll get covered there in more detail too.
--
[1] https://talkpython.fm/blog/posts/we-have-moved-to-hetzner/
[2] https://talkpython.fm/blog/posts/update-on-hetzner-changes-p...
Yeah, even when you move to "EC2 Dedicated Instances" you end up sharing the hardware with other instances, unless you go for "EC2 Dedicated Hosts", and even then the performance seems worse than other providers.
Not sure how they managed to do so for even the dedicated stuff, would require some dedicated effort.
A good example is a the big lichess outage from last year [1]. Lichess is a non-profit, and also must serve a huge user base. Given their financials, they have to go the cheap dedicated server route (they host on OVH). They publish an Excel sheet somewhere with every resources they use to run the services and last year, I had fun calculating how much it would cost them if they were using an hyperscaler cloud offering instead. I don't remember exactly but it was 5 or 6x the price they currently pay OVH.
The downside, is that when you have an outage, your stuff is tied to physical servers and they can't easily be migrated, when cloud provider on the opposite can easily move around your workload. In the case of Lichess outage, it was some network device they had no control of that went bad, and lichess was down until OVH could fix it, that is many hours.
So, yes you get a great deal, but for a lot of businesses, uptime is more important than cost optimization and the physicality of dedicated servers is actually a serious liability.
[1]: https://lichess.org/@/Lichess/blog/post-mortem-of-our-longes...
Even hosting double of everything when you're doing dedicated servers will let you have cheaper monthly bills, compared to the same performance/$ you could get with AWS or whatever.
But Hetzner does seem a bit worse than other providers in that they have random failures in their own infrastructure, so you do need to take care if you wanna avoid downtime. I'm guessing that's how they can keep the prices so low.
> is that when you have an outage, your stuff is tied to physical servers and they can't easily be migrated
I think that's a problem in your design/architecture, if you don't have backups that live outside the actual servers you wanna migrate away from, or at least replicate the data to some network drive you can easily attach to a new instance in an instant.
When you pay 1/4 for 3X the performance you can duplicate your servers and then be paying 1/2 for 3X the performance.
I find baffling that people forget about how things were done before the cloud.
So they could have had 100% redundant systems at OVH and still be under half the cost of a traditional "cloud" provider?
I would look at architecture and operations first. Their "main" node went down, and they did not have a way they could just bring another instance of it online fast on a fresh OVH machine (typically provisioned in a few minutes, assuming they had no hot standby). If the same happened to their "main" VM at a "hyperscaler" , I would guess they also would have been up the same creek. It is not the difference between 120 and 600 seconds to provision a new machine that caused their 10 hrs downtime.
But I think "redundancy" is more like a spectrum, rather than a binary thing. You can be more or less redundant, even within the same VPS if you'd like, but that of course be less redundant than hosting things across multiple data centers.
While AWS is probably towards the safer end if you want to put all your eggs in one basket, people are still putting all their eggs in one basket if they have everything at AWS as well...
I don't see how that follows? Could you please explain?
I run my stuff on Hetzner physical servers. It's deployed/managed through ansible. I can deploy the same configuration on another Hetzner cluster (say, in a different country, which I actually do use for my staging cluster). I can also terraform a fully virtual cloud configuration and run the same ansible setup on that. Given that user data gets backed up regularly across locations, I don't see the problem you are describing?
This is a myth, created so cloud providers can sell more, and so those who overpay can feel better. I've been using dedicated servers since 2005, so for 20 years across different providers. I have machines at these providers with 1000-1300 days of uptime.
You did not say what system you use on them, but don't you need to reboot them to apply kernel upgrades, for instance?
I run most of the workloads in containers, but there are also some VMs (mostly Windows) and some workloads use Firecracker micro vms in containers. A small number of machines are rebooted more often because they occasionally need new kernel features, and their workloads aren't VM friendly, so they run on bare metal.
OVH offers a managed kubernetes solution which for a team experienced with Kubernetes and/or already using containers would be a fairly straightforward way to get a solid HA setup up and running. Kubernetes has its downsides and complexity but in general it does handle hardware failures very well.
Looking at Hetzner or Vultr as alternatives. A few folks mentioned me Infomaniak has great service and uptime, but I haven't heard much about them otherwise.
Anyone used Infomaniak in production? How do they compare to Hetzner/Vultr?
Both Vultr and Hetzner are solid options, I'd go for Hetzner if I know the users are around Europe or close to it, and I want to run tiny CDN-like nodes myself across the globe. Also, Hetzner if you don't wanna worry about bandwidth costs. Otherwise go for Vultr, they have a lot more locations.
The lightsail instance sometimes just hangs and we have to reboot it when people performing simple action like login or queryng API (we have a simple express / nextjs app)
Just wondering if your limits just apply to lightsail or normal stuff too.
That said, for your use case, you might want the predictability and guarantee of having no "noisy neighbors" on an instance. While most VM providers don't offer that (you have to go to fully dedicated machine), AWS does, so keep that in mind as well.
For BYOL (bring your own hosting labor), Vultr is a lesser known but great choice.
Big fan of Vultr, I like them a lot, but got bare metal stuff Hetzner is going to be cheaper
This is risked if the CPU or another resource is using close to 100% for a couple of months. Hetzner likes customers that pay for what they don't use.
To label it an "anecdote" is to gaslight it. It is lived experience.
Speaking of their rules, those are a bit insane too. Speaking of "flouting rules", any prospective user should think about whether it's okay for a cloud vendor to keep spying on which processes the user is running, even without a court order; it is not okay.
If you keep moving the goalpost, then you will understand nothing. You might as well be an employee of Hetzner.
So let me revise that to say I haven't seen any reports I can 1) verify are first hand, and 2) know accurately reflect an actual unfair termination. That is also why I don't bother going around reading accounts on Reddit.
There are no "fair" terminations except without a court order. You will understand when it happens to you. Also, there is no way for you to determine if a report of a a termination is "unfair". In this way, you will continue reveling in your limited worldview.
I have seen this multiple times with German providers. They promise to serve, then when the user really genuinely exercises the service, they cancel the user.
That you're being evasive makes it very much sound like you used them in ways you should have expected would be treated accordingly.
If you've run into this multiple times, it very much sounds like a "you problem".
Even if it was a shared instance, people don't hire a 48 core server just to use 1 or 2 cores. It makes no sense to rent out a big shared server and then expect users to not use it. Someone would rent it out only if they have exhausted smaller instances.
Something tells me that your idea of computing is communist computing, where someone shouldn't use too much even when paying for it. That's a mental roadblock for which there is no fix.
Someone with your communist mental model would be okay a cloud provider spying on their activities very closely, but most people are not.
If you think about it, Hetzner had to be spying on my activities in very close detail to see what I am doing. Such unnecessary spying (without a court order) alone should detract anyone from using them. Assuming they copied my disk image and subjected it to a scan, it's very possible that they retained my confidential data without my permission. Is this the kind of cloud provider that anyone should use?
As for the type of server, it really shouldn't matter. The service exists to be used. People don't rent say 24 core or 48 core servers just to pass the time and pay money for nothing.
We ended up building a managed Postgres that runs directly on Hetzner. Same setup, but with HA, backups, and PITR handled for you. It’s open-source, runs close to the metal, and avoids the egress/I/O gotchas you get on AWS.
If anyone’s curious, I added here are some notes about our take [1], [2]. Always happy to talk about it if you have any questions.
[1] https://www.ubicloud.com/blog/difference-between-running-pos... [2] https://www.ubicloud.com/use-cases/postgresql
Not having an ops background I am nervous about:
* database backup+restore * applying security patches on time (at OS and runtime levels) * other security issues like making sure access to prod machines is restricted correctly, access is logged, ports are locked down, abnormal access patterns are detected * DoS and similar protections are not my responsibility
It feels like picking a popular cloud provider gives a lot of cover for these things - sometimes technically, and otherwise at least politically...
Most of the time you are good if you follow version updates for major releases as they come you do regression testing and put it on prod in your planned time.
Most problems come from not updating at all and having 2 or 3 year old versions because that’s what automated scanners will be looking for and after that much time someone much more likely wrote exploit code and shared it.
For example, one of my employers routinely tested DB restore by wiping an entire table in stage, and then having the on call restore from backup. This is trivial because you know it happened recently, you have low traffic in this instance, and you can cleanly copy over the missing table.
But the last actual production DB incident they had was a subtle data corruption bug that went unnoticed for several weeks - at which point restoring meant a painful merge of 10s of thousands of records, involving several related tables.
How come? The baseline for that comparison will also stay static, regardless of how many TPS or whatever is going on, meanwhile the AWS price they're comparing to would only increase the more people use whatever they deploy.
The hours they put into not wasting money on AWS today could pay off many times if it makes their SaaS economically viable for their target audience.
My hosting bill is a fraction of what people pay at AWS or other similar providers, and my servers are much faster. This lets me use a simpler architecture and fewer servers.
When I need to scale, I can always add servers. The only difference is that with physical servers you don't scale up/down on demand within minutes, you have to plan for hours/days. But that's perfectly fine.
I use a distributed database (RethinkDB, switching to FoundationDB) for fault tolerance.
The reason for FoundationDB specifically is mostly correctness, it is pretty much the only distributed database out there that gives you strict serializability and delivers on that promise. Performance is #2 on the list.
Hetzner does provide free Private Networks, but they only work within a single region - I'm not aware of them providing anything (yet) to securely connect between regions.
In terms of networking many offer no-headache solutions with some kind of transit blend.
<rant>I recently had to switch away from hetzner due to random dhclient failures causing connectivity loss once ip's expired, complete failure of the loadbalancer - stopped forwarding traffic for around 6 hours and the worst part is that there was no acknoledgement from hetzner about any of these issues so at some point I was going insane over trying to find what is the issue when in the end it was hetzner. (US VA region)
Refurb servers will still blast AWS, and spares are easy to source.
I know HE.net does a rack for like $500/mo intro price and that comes with a 1G internet feed as well.
most datacenters do offer remote hands which is a bit pricey, but since they're only needed in emergencies in a redundant setup it is just not required.
Full Rack = $100/month* with $500 install, Power (20A) = $350/month with $500 install, DIA (1Gbps) = $300/month
Total = $750/month plus $1,000 Install on 12 month term
A dedicated server or VPS from OVH, Hetzner, Scaleway, etc., or even Docker containers on Koyeb, will give you way more bang for your buck.
Call me a dinosaur, but I’ve never used any of the big cloud providers like AWS. They’re super expensive, and it’s hard to know what you’ll actually end up paying at the end of the month.
I'd love to hear more about how you use terraform and helm together.
Currently our major friction in ops is using tofu (terraform) to manage K8s resources. Avoiding yaml is great - but both terraform and K8s maintaining state makes the deployment of helm from terraform feel fragile; and vice-versa depending on helm directly in a mostly terraform setup also feels fragile.
It was a wake up moment for me about keeping billing in shape, but also made me understand that a cloud provider is as good as their support and communications when things go south. Like an automated SMS would be great before you destroy my entire work. But because they are so cheap, they probably can't do that for every 100$/month account.
I've had similar issues with AWS, but they will have much friendlier grace periods.
But if you do not pay and you do not check your e-mails, it's basically your fault. Who is using SMS these days even?
Add to that the declining experience of email with so much marketing and trash landing in the inbox (and sometimes Gmail categorizing important emails as "Updates")
That's why grace periods for these situations are important.
Who uses SMS? This might be a cultural difference, but in Europe they are still used a lot. And would you be ok if your utility company cut your electricity bill just with an email warning? Or being asked to appear to court by email?
That period should definitely be longer than a few days.
Hetzner is great for cheap personal sites but I would never use them for any serious business use-cases. Other than failed payments, Hetzner also has very strict content policies and they use user reports to find offenders. This means that if just a few users report your website, everything is deleted and you're banned with zero warning or support, whether the reports are actually true or not. (This also means you can never use Hetzner for anything that has user uploaded content, it doesn't matter if you actively remove offending material because if it ever reaches their servers you're already SOL.)
This is also something under your control - you don't have to use Gmail as your email provider for important accounts and you can whitelist the domains of those service providers if you don't rely on a subpar email service.
> It was a wake up moment for me about keeping billing in shape
It should be a wake up moment about keeping backups as well.1. How many nodes do you have? 2. Did you install anything to monitor your node(s) and the app deployed on these nodes? If so, which software?
2. Yes, TLDR: Prometheus + Grafana + AlertManager + ELK. I think it's a fairly common setup.
2. OpenTelemetry Collector installed on all nodes, sending data to a self-hosted OpenObserve instance. UI is a little clunky, but it's been an invaluable tool, and it handles everything in one place - logs, traces, metrics, alerts.
I got my account validation rejected despite having everything "in norm" and tried 3 times, they wouldn't give me a reason why it ended up rejected.
I think it's better that way, I wouldn't like to get the surprise my account was terminated at some point after that.
Abstracted infrastructure like Kubernetes is expensive by default, so design has an impact.
Does anyone know if there is a VM vendor that sits somewhere in between a dedicated server host like Hetzner in terms of performance + cost-effectiveness and AWS/GCP in terms of security?
Basically TPM/vTPM + AMD SEV/SEV-SNP + UEFI Secure Boot support. I've scoured the internet and can't seem to find anyone who provides virtualised trusted computing other than AWS/GCP. Hetzner does not provide a TPM for their VMs, they do not mention any data-in-use encryption, and they explicitly state that they do not support UEFI secure boot - all of these are critical requirements for high-assurance use cases.
Software/virtualization is just helpless against such a threat model.
NixOS is used for declarative and more importantly deterministic OS state and runtime environment, layered with dm-verity to prevent tampering of the Nix store. The root partition, aside from whatever is explicitly configured in the nix store, is wiped on every reboot. The ephemerality prevents persistence of any potential attacker, and the state of the machine is completely identical to whatever you have configured in your NixOS configuration, which is great for audibility. This OS image + boot loader is signed with organisation-private keys, and deployed to machines preloaded with UEFI keys to guarantee boot integrity and preventing firmware-level attacks (UEFI secure boot).
At this point you need to trust the cloud provider to not tamper with the UEFI keys or otherwise compromise memory confidentiality through a malicious or insecure hypervisor, unless the provider supports memory encryption through something like AMD SEV-SNP. The processor provides an AMD-signed attestation that is provided to the guest OS that states "Yes, this guest is running in a trusted execution environment, and here are the TPM measurements for the boot" and you can use this attestation to determine whether or not the machine should join your network and that it is running the firmware, kernel, and initramfs that you expect AND on hardware that you expect.
I think I'll put together a write-up on this architecture once I launch the service. There is no such thing as perfect security, of course, but I think this security architecture prevents many classes of attacks. Bootkits and firmware-level attacks are exceedingly difficult or even impossible with this model, combine this with an ephemeral root filesystem and any attacker would be effectively unable to gain persistence in the system.
Disclaimer, just joined Oracle a few months ago. I'm using both Hetzner and OCI for my private stuff and my open-source services right now. I still personally think they've identified a clever market fit there.
https://github.com/vitobotta/hetzner-k3s
Or
https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne...
For a K3S cluster? Would love to hear any experience. Thanks!
There just isn’t a compelling story to go “all in on AWS” anymore. For anything beyond raw storage and compute the experience elsewhere is consistently better, faster, cheaper.
It seems AWS leadership got caught up trying to have an answer for every possible computing use case and broadly ended up with a bloated mess of expensive below-bar products. The recent panicked flood of meh AI slop products as AWS tries to make up for its big miss on AI is one such example.
Would like to see AWS just focus on doing core infrastructure and doing it well. Others are simply better at everything that then layers on top of that.
Also first three lines of new stack is a sure shot way to get PTSD. You shouldn't manage database in your plane, unless you really know the internals of the tools you are using. Once you get off AWS then you really start to see the value of things like documentation.
This is down to several things:
- Latency - having your own local network, rather than sharing some larger datacenter network fabric, gives around of order of magnitude reduced latency
- Caches – right-sizing a deployment for the underlying hardware, and so actually allowing a modern CPU to do its job, makes a huge difference
- Disk IO – Dedicated NVMe access is _fast_.
And with it comes a whole bunch of other benefits:
- Auto-scalers becomes less important, partly because you have 10x the hardware for the same price, partly because everything runs 2x the speed anyway, and partly because you have a fixed pool of hardware. This makes the whole system more stable and easier to reason about.
- No more sweating the S3 costs. Put a 15TB NVMe drive in each server and run your own MinIO/Garage cluster (alongside your other workloads). We're doing about 20GiB/s sustained on a 10 node cluster, 50k API calls per second (on S3 that is $20-$250 _per second_ on API calls!).
- You get the same bill every month.
- UPDATE: more benefits - cheap fast storage, run huge Postgresql instances at minimal cost, less engineering time spend working around hardware limitations and cloud vagaries.
And, if chose to invest in the above, it all costs 10x less than AWS.
Pitch: If you don't want to do this yourself, then we'll do it for you for half the price of AWS (and we'll be your DevOps team too):
Email: adam@ above domain
My employer is so conservative and slow that they are forerunning this Local Cloud Edge Our Basement thing by just not doing anything.
In a large enough org that experience doesn’t happen though - you have to go through and understand how the org’s infra-as-code repo works, where to make your change, and get approval for that.
I don't argue there aren't special cases for using fancy cloud vendors, though. But classical datacentre rentals get you almost always there for less.
Personally I like being able to touch and hear the computers I use.
you keep the usual BS to get hardware, plus now it's 10x more expensive and requires 5x the engineering!
Basically: some managers gets fed-up with weeks/months of delays for baremetal or VM access -> takes risks and gets cloud services -> successful projects in less time -> gets promoted -> more cloud in the org.
Over the years I tried occasionally to look into cloud, but it never made sense. A lot of complexity and significantly higher cost, for very low performance and a promise of "scalability". You virtually never need scalability so fast that you don't have time to add another server - and at baremetal costs, you're usually about a year ahead of the curve anyways.
This has nothing to do with cloud. Businesses have forever turned IT expenses from capex to opex. We called this "operating leases".
You are self-managing expensive dedicated hardware in form of MacBooks, instead of renting Azure Windows VM's?!
Shame!
It actually takes a lot of time.
In fact I'd wager a lot more people have used Linux than set up a proper redundant SQL database
Okay, I lied. The later seems much more useful and sane.
Ohh idk if this is the best comparison, due to just how much nuance bubbles up.
If you have to manage those devices, Windows and Active Directory and especially Group Policy works well. If you just have to use the devices, then it depends on what you do - for some dev work, Linux distros are the best, hands down. Often times, Windows will have the largest ecosystem and the widest software support (while also being a bit of a mess). In all of the time I’ve had my MacBook I really haven’t found what it excels at, aside from great build quality and battery life, it feels like one of those Linux distros that do things differently just for the sake of it, even the keyboard layout, the mouse acceleration feeling the most sluggish (Linux distros feel the best, Windows is okay) even if the trackpad is fine, as well as stuff like needing DiscreteScroll and Rectangle and some other stuff to make generic hardware feel okay (or even multi display work), maybe creative software is great there.
It’s the kind of comparison that derails itself in the mind of your average nerd.
But I get the point, the correct tool for the job and all that.
I'm not saying that you won't experience hardware failures, I am just saying that you also need to remember that if you want your product to keep working over the weekend then you must have someone ready to fix it over the weekend.
And they aren't...just passing those costs on to their customers?
How often is GitHub down? We are all just fine without it for a while.
There's a lot, a lot of websites where downtime just... doesn't matter. Yes it adds up eventually but if you go to twitter and its down again you just come back later.
AWS and DigitalOcean = $559.36 monthly or Hetzner = $132.96 The cost of an engineer to set up and maintain a bare metal k8s cluster is going to far exceed the roughly $400 monthly savings.
If you run things yourself and can invest sweat equity, this makes some sense. But for any company with a payroll this does not math out.
Anyway, it is not hard and controlling upgrades saves so much time. Having a clients db force upgraded when there is no budget for it sucks.
Anyway, I encourage you to learn/try it when you have opportunity
I've done onprem highly available MySQL for years, and getting the whole master/slave thing go just right during server upgrades was really challenging. On AWS upgrading MySQL server ("Aurora") is really just a few clicks. It can even do blue/green deployment for you, where you temporarily get the whole setup replicated and in sync so you can verify that everything went OK before switching over. Disaster recovery (regular backups to off site & ability to restore quickly) is also hard to get right if you have to do it yourself.
https://github.com/percona/percona-xtradb-cluster-operator https://github.com/mariadb-operator/mariadb-operator or CNPG for Postgres needs. They all work reasonable well, and cover all the basic (HA, replication, backups, recovery, etc).
It's easy to look at a one-off deployment of a single server and remark on how much cheaper it is than RDS, and that's fine if that's all you need. But it completely skips past the reality of a real life resilient database server deployment: handling upgrades, disk failures, backups, hot standbys, encryption key management, keeping deployment scripts up to date, hardware support contracts and vendor management, the disaster recovery testing for the multi-site SAN fabric with fibre channel switches and redundant dedicated fibre, etc. Before the cloud, we actually had a staff member who was entirely dedicated to managing the database servers.
Plus as a bonus, not ever having to get up at 2AM and drive down to a data centre because there was a power failure due to a generator not kicking in, and it turns out the data centre hadn't adequately planned for the amount of remote hands techs they'd need in that scenario...
RDS is expensive on paper, but to get the same level of guarantees either yourself or through another provider always seems to end up costing about the same as RDS.
Essentially all that pain of yonder years was essentially storage it was a F**ing nightmare running HA network storage before the days of SSDs. It was slower than RAID, 5X more expensive than RAID and generally involved an extreme amount of pain and/or expense (usually both). But these days you only actually need SANs or as we call it today block storage when you have data you care about, again you only have to care about backups when you have data you care about.
For absolutely all of us the side effect of moving away from monolithic 'pets' is that we have made the app layer not require any long term state itself. So today all you need is N X any random thing that might lose data or fail at any moment as your app servers and an external DB service (neon, planetscale, RDS), plus perhaps S3 for objects.
But with Postgres, even with HA, you can't do geographic/multi-DC of data nearly as well as something like Cassandra.
Last I checked, stack overflow and all of the stack exchange sites are hosted on a single server. The people who actually need to handle more traffic than that are in the 0.1% category, so I question your implicit assumption that you actually need a Postgres and Redis cluster, or that this represents any kind of typical need.
Also, databases can easily see a ton of internal traffic. Think internal logistics/operations/analytics. Even a medium size company can have a huge amount of data, such as tracking every item purchased and sold for a retail chain.
[1] https://www.datacenterdynamics.com/en/news/stack-overflow-st...
[2] https://stackoverflow.blog/2025/08/28/moving-the-public-stac...
Like, where do I go? Do i search for Postgres? If so where? Does the IP of my cluster change? If so how to make it static? Also can non-aws servers connect to it? No? Then how to open up the firewall and allow it? And what happens if it uses too much resources? Does it shutdown by itself? What if i wanna fine tune a config parameter? Do I ssh into it? Can i edit it in the UI?
Meanwhile, all that time finding out, and I could ssh into a server, code and run a simple bash script to download, compile, run. Then another script to replicate. And i can check the logs, change any config parameter, restart etc. no black box to debug if shit hits the fan
Seriously I despise PostgreSQL in particular in how fucking annoying it is to upgrade.
If you don't care about HA, then sure everything becomes easy! Until you have a disaster to recover and realize that maybe you do care about HA. Or until you have an enterprise customer or compliance requirement that needs to understand your DR and continuity plans.
Yugabyte is the closest I’ve seen to achieving that simplicity with self host multi region and HA Postgres and it is still quite a bit more involved than the steps you describe and definitely more work than paying for their AWS service. (I just mention instead of Aurora because there’s no self host process to compare directly there as it’s proprietary.)
The things you describe involve a small learning curve, each different for each cloud environment, but then you never have to think about it again. You don't have to worry about downtime (if you set it up right), running a bash script ... literally nothing else has to be done.
Am I overpaying for Postgres compared to the alternatives? Hell yeah. Has it paid off? 100%, would never want to go back.
Yes. In your AWS console right after logging in. And pretty much all of your other setup and config questions are answered by just filling out the web form right there. No sshing to change the parameters they are all available right there.
> And what happens if it uses too much resources?
It can't. You've chosen how much resources (CPU/Memory/Disk) to give it. Run away cloud costs are bill by usage stuff like redshift, s3, lambda, etc.
I'm a strong advocate for self (for some value of self) hosting over cloud, but your making cloud out to be far more difficult than it is.
Anything you don't know how to do - or haven't even searched for - either sounds incredibly complex, or incredibly simple.
I hated having to deal with PostgreSQL on bare metal.
To answer your questions should someone ask these as well and wish answers:
> Does the IP of my cluster change? If so how to make it static?
Use the DNS entry that AWS gives you as the "endpoint", done. I think you can pin a stable Elastic IP to RDS as well if you wish to expose your RDS DB to the Internet although I have really no idea why one would want that given potential security issues.
> Also can non-aws servers connect to it? No?
You can expose it to the Internet in the creation web UI. I think the default the assistant uses is to open it to 0.0.0.0/0 but the last time I did that is many years past so I hope that AWS asks you about what you want these days.
>Then how to open up the firewall and allow it?
If the above does not, create a Security Group, assign the RDS server to that Security Group and create an Ingress rule that either only allows specific CIDRs or a blanket 0.0.0.0/0.
> And what happens if it uses too much resources? Does it shutdown by itself?
It just gets dog slow if your I/O quota is exhausted, it goes into an error state when the disk goes full. Expand your disk quota and the RDS database becomes accessible again.
> What if i wanna fine tune a config parameter? Do I ssh into it? Can i edit it in the UI?
No SSH at all, not even for manually unfucking something, for that you need the assistance of the AWS support - but in about six years I never had a database FUBAR'ing itself.
As for config parameters, there's an UI for this called "parameter/option groups", you can set almost all config parameters there, and you can use these as templates for other servers you need as well.
Which is of course true, but it is true for all things. Provisioning a cluster in AWS takes a bit of research and learning, but so did learning how to set it up locally. I think most people who know how to do both will agree it is simpler to learn how to use the AWS version than learning how to self host it.
It would be so useful to have an EC2/S3/etc compatible API that maps to a homelab. Again something that Claude should allegedly be able to vibecode give then breadth of documentation, examples, and discussions on the AWS API.
And before someone says Lightsail: is not meant for highly availability/infinite scale.
It's "only a few clicks" after you have spent a signficant amount of time learning AWS.
I haven ever setup a AWS postgres and redis, and know its more then a few clicks. there is simply basic information that you need to link between services, where it does not matter if its cloud or hardware, you still need to do the same steps, be it from CLI or WebInterface.
And frankly, these days with LLMs, its no excuse anymore. You can literally ask a LLM to do the steps, explain them to you, and your off to the races.
> I don't have to worry about OS upgrades and patches
Single command and reboot...
> Or a highly available load balancer with infinite scale.
Unless your google, overrated ...
You literally rent from places like Hetzner for 10 bucks a load balancer, and if your old fascion, you can even do a DNS balancing.
Or you simply rent a server 10x the performance what Amazon gives (for the same price or less), and you do not need a load balancer. I mean, for 200 bucks, you rent a 48 core 96 thread server at Hetzner... Who needs a load balancer again... You will do millions or requests on a single machine.
It costs people and automation.
> You virtually never need scalability so fast that you don't have time to add another server
What do you mean by “time to add another server?” Are you thinking about a minute or two to spin up some on-demand server using an API? Or are you talking about multiple business days to physically procure and install another server?
The former is fine, but I don’t know of any provider that gives me bare metal machines with beefy GPUs in a matter of minutes for low cost.
To someone like me, especially on solo projects, using infra that effectively isolates me from the concerns (and risks) of lower-level devops absolutely makes sense. But I welcome the choice because of my level of competence.
The trap is scaling an org by using that same shortcut until you're bound to it by built-up complexity or a persistent lack of skill/concern in the team. Then you're never really equipped to reevaluate the decision.
I've been in multiple cloud migrations, and it was always solving political problems that were completely self inflicted. The decision was always reasonable if you looked just at the people the org having to decide between the internal process and the cloud bill. But I have little doubt that if there was any goal alignment between the people managing the servers and those using them, most of those migrations would not have happened.
But we don't need one minute response times from the cloud really. So something like hetzner that may just be all right. We'll get it to you within an hour. It's still light years ahead of what we used to be.
And if it makes the entire management and cost side and performance with bare metal or closer to bare metal on the provider side, then that is all good.
And this doesn't even address the fact that yeah, AWA has a lot of hidden costs, but a lot of those managed data center outsourcing contracts where you were subjected to those lead times for new servers... really weren't much cheaper than AWS back in the day.
Like a company should be able to offer 1 day service, or heck 1 week with their internal datacenters. Just have a scheduled buffer of machines to power up and adapt the next week/month supply order based on requests.
The bureaucracy will always find a way.
But I definitely agree, it's usually a self-inflicted problem and the big gamble attempting to work around infrastructure teams. I've had similar issues with security teams when their out of the box testing scripts show a fail, and they just don't comprehend that their test itself is invalid for the architecture of your system.
The current “runners” are heading towards SaaS platforms like Salesforce, which is like the cloud but with ten times worse lock in.
We have a Service Now ticket that you can complete that spins the server up at completion. Kind of an easy way to do it.
Also, what network does the VM land in? With what firewall rules? What software will it be running? Exposed to the Internet? Updated regularly? Backed up? Scanned for malware or vulnerabilities? Etc…
Do you expect every Tom, Dick, and Harry to know the answers to these questions when they “just” want a server?
This is why IT teams invariably have to insert themselves into these processes, because the alternative is an expensive chaos that gets the org hacked by nation states.
The problem is that when interests aren’t forced to align — a failure of senior management — then the IT teams become an untenable overhead instead of a necessary and tolerable one.
The cloud is a technology often misapplied to solve a “people problem”, which is why it won’t ever work when misused in this way.
The first time you do it, you can do a consult with a cloud team member
And of course they get audited every quarter so usage is tracked
The cost for your first on-prem datacenter server is pretty steep...the cost for the second one? Not so much.
It's not really. It just happens that when there is a huge bullshit hype out there, people that fall for it regret and come back to normal after a while.
Better things are still better. And this one was clearly only better for a few use-cases that most people shouldn't care about since the beginning.
I think there is a generational part as well. The ones of us that are now deep in our 40s or 50s grew up professionally in a self-hosted world, and some of us are now in decision-making positions, so we don't necessarily have to take the cloud pill anymore :)
Half-joking, half-serious.
On EBS it does at most 200MB/s disk IO just because the EBS operation latency even on io2 is about 0.5 ms. Even though the disk can go much faster, disk benchmarks can easily do multi-GB/s on nodes that have enough EBS throughput.
On instance local SSD on the same EC2 instance it will happily saturate the whatever instance can do (~2GB/s in my case).
We run a modest graph workload (relatively small dataset wise but an intense on graph edge wise) on Neptune that costs us slightly under USD 600 per month – that is before the enterprise discount, so in reality we pay USD 450-500 a month. But we use Neptune Serverless that bursts out from time to time, which means that monthly charges are averaged out across the spikes/bursts. The monthly charges are for the serverless configuration of 3-16 NPU's.
Disk I/O stats are not available for Neptune, moreso for serverless clusters, and they would not be insightful anyway. The transactions per second rate is what I look at.
My only “yes, but…” is that this:
> 50k API calls per second (on S3 that is $20-$250 _per second_ on API calls!).
kind of smells like abuse of S3. Without knowing the use case, maybe a different AWS service is a better answer?
Not advocating for AWS, just saying that maybe this is the wrong comparison.
Though I do want to learn about Hetzner.
But, yeah, there's certainly a solution to provide better performances for cheaper, using other settings/services on AWS
In those cases, it is great to a) not get a shocking bill, and b) be able to somewhat support this atypical use until it can be remedied.
I'm honestly quite interested to learn more about the usecase that required those 50k API calls!
I've seen a few cases of using S3 for things it was never intended for, but nothing close to this scale
Then the CDN takes the beating. So this still sounds like S3 abuse to me.
But I leave room for being wrong here.
Edit: presumably if your site is big enough to serve 50k RPS it’s big enough for a cache?
You might not realize but you are actually increasing the business case for AWS :-) Also those hardware savings will be eaten away by two days of your hourly bill. I like to look at my project costs across all verticals...
Doubt it. I've personally seen AWS bills in the tens of thousands, he's probably not that costly for a day.
Biggest recent ones were ~200k and ~100k that we managed to lower to ~80k with a couple months of work (but it went back up again after I left).
I fondly remember lowering our Heroku bill from 5k to 2k back in 2016 after a day of work. Management was ecstatic.
That's where the real value lies. Not paying these usurious amounts.
I still think small-midsized orgs may be better off in cloud for security / operations cost optimization.
It is cheaper/easier for me to hire cloud infrastructure _capable_ people easier and cheaper than a server _expert_. And a capable serverless cloud person is MUCH cheaper and easier to find.
You don't need to have 15 years of a Linux experience to read a JSON/YAML blob about setting up a secure static website.. of you need to figure out how to set up an S3 bucket and upload files... And another bucket for logging... And you have to go out of your way now to not be multi-az and to expose it to public read... I find most people can do this with minimal supervision and experience as long as they understand the syntax and can read the docs.
The equivalent to set up a safe and secure server is a MUCH higher bar. What operating system will they pick? Will it be sized correctly? How are application logs offloaded? What are the firewall rules? What is the authentication / ssh setup? Why did we not do LDAP integration? What malware defense was installed? In the event of compromise, do we have backups? Did you setup an instance to gather offloaded system logs? What is the company policy going to be if this machine goes down at 3am? Do we have a backup? Did we configure fail over?
I'm not trying to bash bare metal. I came from that space. I lead a team in the middle of nowhere (by comparison to most folks here) that doesn't have a huge pool of people with the skills for bare metal.. but LOTS of people that can do competent severless with just one highly technical supervisor.
This lets us higher competent coders which are easier to find, and they can be reasonably expected to have or learn secure coding practices... When they need to interact with new serverless stuff, our technical person gets involved to do the templating necessary, and most minor changes are easy for coders to do (e.g. a line of JSON/YAML to toggle a feature)
As with everything, choose the right tool for the job.
If it feels expensive or risky, make a u-turn, you probably went off the rails somewhere unless you’re working on bleeding edge stuff, and lbh most of us are not.
This is why we decided to bundle engineering time with the infrastructure. We'll maintain the cluster as you say, and with the time left over (the majority) we'll help you with all your other DevOps needs too (CI/CD pipelines, containerising software, deploying HA Valkey, etc). And even after all that, it still costs less than AWS.
Edit: We also take on risk with the migration – our billing cycle doesn't start until we complete the migration. This keeps our incentives aligned.
They still use VMs, but as far as I know they have simple reserved instances, not “cloud”-like weather?
Is the performance better and more predictable on large VPSes?
(edit: I guess a big difference is that VPS can have local NVMe that is persistent, whrereas EC2 local disk is ephemeral? )
The rough part was that we had made hardware investments and spent almost a year setting up the system for HA and immediate (i.e. 'low-hanging fruit') performance tuning and should have turned to architectural and more subtle improvements. This was a huge achievement for a very small team that had neither the use nor the wish to go full clown.
I remember the point in my career when I moved from a cranky old .NET company, where we handled millions of users from a single cabinent's worth of beefy servers, to a cloud based shop where we used every cloud buzzword tech under the sun (but mainly everything was containerized node microservices).
I shudder thinking back to the eldritch horrors I saw on the cloud billing side, and the funny thing is, we were constantly fighting performance problems.
On the cloud it takes five seconds to get a new machine I can ssh into and I don't have to ask anyone for the budget.
You can save a lot of money with scaling, you have to actually do that though and very few places do.
I totally agree about Azure being the worst of the three, they wanted us to commit to certain use before even buying hardware themselves. Crazy…
But I also had capacity issues with Google at large scales in many zones.
One of the places I worked that was on-prem enforced a "standard hardware profile" where the servers were all nearly the same except things that could be changed in house (like RAM sticks). When they ordered hardware, they'd order like 5% or 10% more than we thought we'd need to keep an "emergency pool".
If you ended up needing more hardware than you thought and could justify why you needed it right now, they'd dip into that pool to give you more hardware on a pretty rapid schedule.
It cost slightly more, but was invaluable. Need double the hardware for a week to do a migration? No problem. Product more popular than you thought? No problem.
Sure this is made worse by frugality, but I experienced this problem when virtualization was in its infancy, much less cloud anything even existing much less being popular.
There's also "cloud" as the API-driven world of managed services that drain your wallet faster than you can blink.
Excuse me CEO your budgeting process is inconvenient to my work, please change it all to be more responsive.
This is not how things work and not how changes are made. Unless you get into the C-suite or at least become a direct report of someone who is nobody cares and you're told to just deal with it. You can't negotiate because you're four levels of management away from being invited to the table.
An organization that can make agile capital expenditures in response to the needs of an individual contributor is either barely out of the founder's garage or a magical fairyland unicorn.
And once you're a customer you get to deal with sales channels full of quotes and useless meetings and specialists. You can't just order from the website.
Of course Node could not compete, and the cost had to be paid for each thinly sliced microservice carrying heavy runtime alongside it.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
If anyone from oxide computer or similar is reading, maybe you should rebrand to BEEFY server inc...
idea for an ad campaign:
"mmm, beefy!" https://www.youtube.com/watch?v=j6ImwKMRq98&t=21s
i don't know how "worldwide" is the distribution of Chef "Boyardee" canned spaghetti (which today is not too good), but the founder's story is fairly interesting. He was a real Italian immigrant chef named Ettore Boiardi and he gets a lot of credit for originally evangelizing and popularizing "spaghetti" and "spaghetti sauce" in America when most people had never tried it.
https://en.wikipedia.org/wiki/Ettore_Boiardi
you know, "spaghetti code..."?
If you would suggest a word that would make a better substitute in this case, that could move the conversation forward, and perhaps you could improve the aesthetic quality of posts about leaving the cloud.
it also handled some databases and webservers on FreeBSD and Windows, I considered it better than Ansible.
Thousands and thousands of users depending on that hardware.
Extremely robust hardware.
I've found that it's almost impossible to even hire people who aren't terrified of the idea of self-hosting. This is deeply bizarre for someone who installed Linux from floppy disks in 1994, but most modern devs have fully swallowed the idea that cloud handles things for them that mere mortals cannot handle.
This, in turn, is a big reason why companies use cloud in spite of the insane markup: it's hard to staff for anything else. Cloud has utterly dominated the developer and IT mindset.
The answers to all of your questions are a hard: it depends. What are your engineering objectives? What are your business requirements? Uptime? Performance? Cost constraints and considerations? The cloud doesn't take away the need to answer these questions, it's just that self-hosting actually requires you to know what you are doing versus clicking a button and just hoping for the best.
But that said, you can afford a lot more hardware if you’re not using RDS, so the tuning doesn’t need to be perfect.
Being a bit obtuse to tune doesn't really justify going all-in on cloud. It's all there in the documentation.
Are y'all hiring? [1]
I did 15 months at AWS and consider it the worst career move of my life. I much prefer working with self-hosting where I can actually optimize the entire hardware stack I'm working with. Infrastructure is fun to tinker with. Cloud hosting feels like a miserable black box that you dump your software into and "hope"
Funny, I couldn't find a new job for a while because I had no cloud experience, finally and ironically I got hired at AWS. Every now and then these days I get headhunters unsure about my actual AWS experience because of my lack of certifications.
If they had migrated to a bare metal solution they would certainly have enjoyed an even larger increase in perf and decrease in costs, but it makes sense that they opted for the cloud offering instead given where they started from.
the devil is in the details, as they say.
It is worth pointing out that if you look beyond the nickle & diming US-cloud providers, you will very quickly find many S3 providers who don't charge you for API calls and just the actual data-shifting.
Ironically, I think one of them is Hetzner's very own S3 service. :)
Other names IIRC include Upcloud and Exoscale ... but its not hard to find with the help of Mr Google, most results for "EU S3 provider" will likely be similar pricing model.
P.S. Please play nicely and remove the spam from the end of your post.
And now Nvidia is in the game for Sever CPU, much faster time to market for PCIe in the future, and better x86 CPU implementation as well as ARM variants.
The AWS documents clarify this. When you get 1 vCPU in a Lambda you're only going to get up to 50% of the cycles. It improves as you move up the RAM:CPU tree but it's never the case that you get 100% of the vCPU cycles.
And you are still charging half of AWS, which is that case I am just doing these work myself if I really think AWS is too expensive.
The topic of paying hefty amounts of money to AWS when other options are available has been discussed many times before.
My view of AWS is that you have bazillions of things that you might never use, need to learn about it, you are tied to a company across the Atlantic that can basically shut you down anytime they want for whatever reason and finally the cost.
Any advice on price / performance / availability is meaningless unless you explain where you're coming from. The reason we see people overcomplicating everything to do with the web is that they follow advice from people with radically different requirements.
TL;DR: Think of hosting providers like a pricing grid (DIY, Get Started, Pro, Team, Enterprise) and if YAGNI, don't choose it.
Or they've had cloud account managers sneaking into your C-suite's lunchtime meetings.
Other comments in this thread say they get directives to use AWS from the top.
Strangely that directive often comes with AWS's own architects embedded into your team and even more strangely they seem to recommend the most expensive server-less options available.
What they don't tell is you you'll be rebuilding and redeploying your containerised app daily with new Docker OS base images to keep up with the security scanners just like patching an OS on a bare metal server.
you don't need that in 99.9999% of cases.
Was trying to find a good one for 30B quants but there’s so many now and the pricing is all over the place.
A great deal of the work in cloud engineering is ensuring the abstractions meet the service guarantees. Similarly you can make a car much cheaper if you don't need to guarantee the driver will survive a collision. The cost of providing a safety guarantee is much higher than providing a hand-wavy "good enough" feeling.
If your business isn't critical then "good enough" vibes may be all you need, and you can save some money.
IF you need it, soon you wish the lego blocks pulled IAM all the way through and worked with a common API
You can add redundant machines with a failover. You then need to calculate how likely the failover is to fail, how likely the machines are to fail, etc. How likely is the switch to fail. You need engineers with pager rotations to give 24 hour coverage, etc.
What I'm saying is that the cloud providers give you strong guarantees and this is reflected in their pricing. The guarantees apply to every service you consume because with independent failures, the probability of not failing is multiplicative. If you want to build a reliable system out of N components then you need to have bounds on the reliability of each of the components.
Your business may not need this. Your side project almost certainly doesn't. But the money you save isn't free, it's compensation for taking on the increased risk of downtime, losing customer data, etc.
I would be interested to see a comparison of the costs of running a service on Hetzner with the same reliability guarantees as a corresponding cloud service. On the one hand we expect some cloud service markup for convenience. On the other hand they have economies of scale. So it's not obvious to me which is cheaper.
and yet, they go offline all the time.
This setup is probably also easier to reason about and easier to make secure than the messy garbage pushed by Amazon and other cloud providers.
People see Cloud providers with rose-colored glasses, but even something like RDS requires VPCs, subnets, route tables, security groups, Internet/NAT gateways, lots of IAM roles, and CloudWatch to be usable. And to make it properly secure (meaning: not just sharing the main DB password with the team) you need way more as well, and it's hard to orchestrate, it's not just an option in a CloudFormation script.
Sure securing a server is hard too, but people 1. actually share this info and 2. don't have illusions about it.
Ability to do anything doesn't mean do everything.
It's straightforward to be simple on AWS, but if you have trouble denying yourself, consider Lightsail to start: https://aws.amazon.com/lightsail/
Low Cost High Cost
==============================================================
FARM WHOLESALER GROCERY RESTAURANT DOORDASH
BUILD CO-LOCATION HETZNER AWS VERCEL
While it's not a perfect analogy, in principle it holds true.As such, it should come as no surprise that eating at a restaurant every day is going to be way more expensive.
There's different tiers of restaurants.
There's the luxury premium restaurants (Michelin star rated, like AWS), but there is also local dinners that arguably have phenomenal food too (maybe someone like DigitalOcean/Linode).
I hadn't seen your comment when I wrote this, below: https://news.ycombinator.com/item?id=45616366
I love your farm-to-table grid: works for everyone not just HN commenters. And putting DOORDASH on the right is truer from cost perspective than the metaphor I'd used.
For HN, I'd compared to a pricing grid (DIY, Get Started, Pro, Team, Enterprise) with the bottom line that if YAGNI, don't choose it.
Your grid emphasizes my other point, it's about your own labor.
We are unfortunately moving away from self-hosted bare metal. I disagree with the transition to AWS. But it's been made several global layers above me.
It's funny our previous AWS spend was $800 per month and has been for almost 6 years.
We've just migrated some of our things to AWS and the spend is around $4,500 per month.
I've been told the company doesn't care until our monthly is in excessive of five figures.
None of this makes sense to me.
The only thing that makes sense is our parent company is _huge_ and we have some really awesome TAMs and our entire AWS spend is probably in the region of a few million a month, so it really is pennies behind the sofa when global org is concerned.
- client confidence
- labor pool
OpEx good, CapEx bad.
What country is it that applies a 400% income tax to companies?
(Well, seriously, it makes sense in a larger than 80% tax rate. Not that impossibly high, but I doubt any country ever had it.)
Although for our latest App we've switched to using local PostgreSQL (i.e. app/RDBMS on same server) with R2 backups for its better featureset, same cost as we only pay for the 1x Hetzner VM and Cloudflare R2 storage is pretty cheap.
Perspective: this difference is one hour of US fintech engineer time a month. If you have to self-build a single thing on Hetzner you get as "built-in" on AWS, are you ahead?
If this is your price range, and you're spending time thinking about how to save that $400/month (three Starbucks a day) instead of drive revenue or deliver client joy, you likely shouldn't be on AWS in the first place.
AWS is for when you need the most battle tested business continuity through automations driving distributed resilience, or if you have external requirements for security built into all infra, identity and access controls built into all infra at all layers, compliance and governance controls across all infra at all layers, interop with others using AWS (private links, direct connects, sure, but also permission-based data sharing instead of data movement, etc.). If your plans have those in your future, you should start on AWS and learn as you grow so you never have a "digital transformation" in your future.
Whether you're building a SaaS for others or a platform for yourself, “Enterprise” means more than just SSO tax and a call us button. There can be real requirements that you are not going to be able to meet reasonably without AWS's foundational building blocks that have this built in at the lego brick level. Combine that with "cost of delay" to your product and "opportunity cost" for your engineering (devs, SREs, users spending time doing undifferentiated heavy lifting) and those lego blocks can quickly turn out less expensive. Any blog comparing pricing not mentioning these things means someone didn't align their infra with their business model and engineering patterns.
Put another way, think of the enterprise column in the longest pricing grid you've ever seen – the AWS blocks have everything on the right-most column built in. If you don't want those, don't pick that column. Google and Azure are in the Team column second from right. Digital Ocean, CloudFlare, the Pro column third from right. Various Heroku-likes in the Getting Started column at the left, and SuperMicro and Hetzner in the Self-Host column, as in, you're buying or leasing the hardware either way, it's just whose smart hands you're using. ALL of these have their place, with the Getting Started and Pro columns serving most folks on HN, Team best for most SMB, and Enterprise best for actual enterprise but also Pro and Team that need to serve enterprise or intend to grow to that.
Note that if you don't yet need an enterprise column on your own pricing grid, K8s on whoever is a great way to Get Started and go Pro yourself while learning things needed for continuous delivery and system resilience engineering. Those same patterns then can be shifted onto on the Team and Enterprise column offerings from the big three (Google, Azure, AWS).
Here's my TL;DR blog post distilling all this:
If YAGNI, don't choose it.
As the rest of your comment, personally, I see it more like a pitch to use AWS, rather than some conversion whether everyone really needs that enterprise tier. Me, I’d prefer to control as much of my infra as possible, rather than offloading it to others for an insane price tag.
But really, if DIY, someone's got to actually have it meet SLOs and SLAs. So you need a person or two, which is when those hours add up.
These days housing and benefiting an employee can cost 50% to 100% overhead, depending on firm efficiency. So, $400/hr means $800k/yr (because 40 hrs x 50 weeks = rate x 2000) but half that can be considered overhead (recruiting, real estate, benefits, training, vacation, "management" when some number of headcount requires adding a lead or manager who is expensive overhead), so it's really 400k a year which is not out of line at firms with regulatory requirements.
Anyway, if your workload is critical, you can't have only one, so call it 2 at 200k. Point is, when all these things matter, GCP/Azure/AWS isn't the thing that stands out.
---
> As the rest of your comment, personally, I see it more like a pitch to use AWS
Re AWS, I thought I was clear:
If YAGNI, don't choose it.
I'd guesstimate a 2× increase in their operational complexity. So, if they previously required 0.5 DevOps of a full-timer, they'll now need one more DevOps full-timer just to handle the added complexity.
Does that make sense to you?
I get it's their business and they can do as they please with it, however, maybe tell me before I create an account that you don't accept accounts from my continent
It sucks for legitimate customers, but you can sometimes plead your case directly as long as you are willing to provide id and such, but ultimately like you say, it's their business.
I want to move our infra out of AWS but at the end of the day we have too much data there and it is a non starter.
So some Kubernetes experts migrated to AWS for $1k in credits. This is madness. That's weeks of migration work to save the equivalent of a day of contracting.
If you need it, use it, if you don't need it, don't use it. It's not the big revelation people seem to think it is.
(The obvious argument about how it might pay off more in the future are dependent on the startup surviving long enough for that future to arrive.)
My parent company (Healthcare) uses all on prem solutions, has 3 data centers and 10 sys admins just for the data centers. You still need DevOps too.
I don't know how much it would cost to migrate their infra to AWS, but ~ $1.3M (salary) in annual spend buys you a ton of reserved compute on AWS.
$1.3M is 6000 CPU cores, 10TiB of RAM 24/7 with 100TB of storage.
I know for a fact due to redundancy they have no where near that, AND they have to pay for Avamar, VMWare, (~$500k) etc.
There's no way its cheaper than AWS, not even close.
So sure someones self hosted PHP BB forum doesn't need to be on AWS, but I challenge someone to run a 99.99% uptime infra significantly cheaper than the cloud.
But monetarily, even for a startup, $400/month savings is something you shouldn't be pouring the equivalent of $5000 (or more, just picking a reasonable concrete number to anchor the point) into. You really need to solve a $400/month problem by putting your time into something, anything that will promote revenue growth sooner and faster rather than optimizing that particular cost.
Also, 6000 CPU "cores" on the cloud is more like 3000 CPU cores. Which you can get in just 20-50 servers. This is in the range of something that could be taken care of as a part time job.
My point is, when people compare cloud to on prem they use a hypothetical on-prem installation vs a realistic actually working cloud deployment.
We only see these blog posts for things that are just 1-2 servers.
Very few companies are fully on-prem and saving a lot of money, they typically have very specific use cases like high bandwidth or IO usage.
But that cost difference is huge...
It is a interesting tradeoff to consider I think (I'm not criticizing either Hetzner or AWS or any team's decision, provided they've thought the tradeoffs through).
You stop worrying about S3 vs EFS vs FSx, or Lambda cold starts, or EBS burst credits. You just deploy a Docker stacks on a fast NVMe box and it flies. The trade-off is you need a bit more DevOps discipline: monitoring, backups, patching, etc. But that's the kind of stuff that's easy to automate and doesn't really change week to week.
At Elestio we leaned into that simplicity, we provide fully managed open-source stacks for nearly 400 software and also cover CI/CD (from Git push to production) on any provider, including Hetzner.
More info here if you're curious: https://elest.io
(Disclosure: I work at Elestio, where we run managed open-source services on any cloud provider including your own infra.)
We support postgres but also MySQL, redis, opensearch, Clickhouse and many more.
About backups we offer differential snapshots and regular dumps that you can send to your own S3 bucket
https://docs.elest.io/books/databases/page/deploy-a-new-clus...
One weak take however in the article that I felt not quite right is the pricing saving part. By saying 1/4 of the price. I was expecting to see the AWS bill in the range of $10k/month, or more but it turned out to be just around ~$550 or, a total saving of $420.
Whith the above said, it does really make me questioning whether it worth the hassle of migration because, probably, one of the main reasons to move away from AWS is to save the cost.
Finally, let me conclude with this comment from /r/programminghumor:
You're not a real engineer until you've accidentally sponsored Amazon's quarterly earnings
The biggest downside to hetzner only is that it’s really annoying to wrangle shell scripts and GitHub actions to drive all the automations to deploy code.
The portainer team recently started sponsoring the project so Ive been able to dedicate a lot more time to it, close to full time.
Cloud was a reaction to overlong procurement timeline in company managed DC. This is still a thing, it still takes half a year to get a server into a DC!!
However probably 99% of use cases dont need servers in your own DC, they work just fine on a rented server.
One thing though, a rented server can still have hardware failure, and it needs to be fixed, so deployment plans need to take that into account - fargate will do that for you.
I wasted hours on this, and the moment RDS starts to support the postgres version we need it everything was much easier.
I still remember staying up till 3:00 a.m. installing postgres, repeatedly.
While this article is nice, they only save a few hundred dollars a month. If a single engineer has to spend even an hour a month maintaining this, it's probably going to be a wash.
And that's assuming everything goes right, the moment something goes wrong you can easily wipe out a year saving in a single day ( if not an hour depending on your use case).
This is probably best for situations where your time just isn't worth a whole lot. For example let's say you have a hobbyist project, and for some reason you need a very large capacity server.
This can easily cost hundreds of dollars a month on AWS, and since it's coming out of your own pocket it might be worth it to spend that extra time on bare metal.
But, at a certain point you're going to think how much is my time really worth. For example, and forgive me for mixing up terms and situations, ghost blog is about $10 a month via their hosted solution. You can probably run multiple ghost blogs on a single Hetzner instance.
But, and maybe it was just my luck, eventually it's just going to stop working. Do you feel like spending two or three hours fixing something over just spending the $20 a month to host your two blogs ?
If it were only from AWS, they would probably also have mentionned a drastic reduction of API complexity.
Now #1 on HN. Destiny.
found this little bit buried near the end. all that glitters is not gold, i guess
I am in US, I would use Hetzner just the same, but not to save few bucks here and there.
The claim here is that their cloud bills lowered. Nowhere is mentioned the cost of engineering and support. This will increase their overall cost, which is not mentioned here at all.
When you are a startup, you don't have a lot of headcount. You should be using your headcount to focus on your product. The more things you do that eat up your headcount's time, the less time you have to develop your product.
You need a balance between "dirt-cheap cost" and labor-conserving. This means it's a good idea to pay a little bit of a premium, to give you back time and headcount.
> How much does a 2x CPU, 4 GiB RAM container cost on AWS Fargate? Just over $70/month. We run two worker instances, which need these higher resources, along with smaller web instances and all the other infrastructure needed to host an application on AWS (load balancer, relational DB, NAT gateway, ...). All together, our total costs for two environments of tap grew to $449.50/month.
The classic AWS mistake: they're talking retail prices. AWS gives every customer multiple opportunities to lower costs below this retail price. You don't have to be an enterprise. In fact, you can ask them directly how to save money, and they'll tell you.
$70/month is the retail price of Fargate on-demand (for that configuration). Using a Compute Savings Plan[1], you can save 50% of this cost. Additionally, if you switch from x86 to ARM for your Fargate tasks, the cost lowers an additional 25%. So with Graviton and a Compute Savings Plan, their Fargate price would have been $26.25/month, a 62.5% savings.
And there's more to save. Spot instances are discounted at 70%. NAT Instances save you money over NAT Gateways. Scaling your tasks to zero when they're not needed cuts costs more. Choose the cheapest region to pay the least.
[1] https://repost.aws/questions/QUVKyRJcFPStiXYRVPq-KU9Q/saving...
Been on the same shared hosting platform for 15+ years and the hardware's load average dropped to ~16% on a 64-core Epyc /w 512gb RAM. Easily handles half million unique bursts without breaking a sweat.
You've gotta hand it to Amazon for their strategy.
Can you host your own object storage open source software, key vault OSS, VPN, queue service, container registry, logging, host your own Postgres/MySQL? Sure, but you will need to research what is best, what is supported, keep it maintained, make sure to update it and that those updates don't break anything, wake up in the middle of the night when it breaks, make sure it's secure. And you would still need to handle access control across those services. And you would still need a 3rd party service for DDoS protection, likely CDN too. And you would likely need some identity provider.
> minimum resource requirements for good performance to be around 2x CPUs and 4 GiB RAM
This is less compute than I regularly carry in my pocket? And significantly less than a Raspberry Pi? Why is Fargate that expensive?
Newsflash - no one has 100% and your over equitied startup is just burning other people’s money with no clue as per usual.
Have been very happy with the setup. Also hosts a StackGres cluster that's backs up to GCS. Plenty of compute foe the price.
Been using this approach for the past years and if something gets bigger, I move the container to fly or a different k8s cluster in a couple hours max
On my bigger k8s I can then easily add more nodes or scale up pods depending on need, and scale them back down when idle.
Still the main issue with any setup I see is the database. No matter what I use I’d either have a managed Postgres somewhere, or something like litestream, and if that’s not in the same data center it’s gonna add latency sadly
How many dedicated servers do you need to run to afford losing one of them to a hardware failure? What is the cost and lead time for a replacement? How much overprovisioning do you need to do, and how well ahead, in anticipation of seasonal spikes, or a wave of signups from a marketing campaign?
No servers, no VMs, no containers, just our code to focus on.
geenat•17h ago
Amazon gets far too greedy- particularly bad when you need egress.
Also an "amazon core" is like 1/8th of a physical cpu core.
CaptainOfCoit•17h ago
vidarh•17h ago
Clearly when Amazon realised the enormous potential in AWS, they scrapped that principle. But the idea behind it - that an organisation used to fat margins will not be able to adapt in the face of a competitor built from the ground to live of razor thing margins - still applies.
AWS is ripe for the picking. They "can't" drop prices much, because their big competitors have similar margins, and a price war with them would devastate the earnings of all of them no matter how much extra market share they were to win.
The challenge is the enormous mindshare they have, and how many people are emotionally invested even in believing AWS is actually cost effective.
master_crab•17h ago
Yup, that phrase was running through my head as I skimmed the comments.
To that, an interesting observation I’ve made is that their frequency for service price cuts have dropped in the past several years. And the instances of price increases have started to trickle in (like the public IP cost).
If core compute and network keep getting cheaper faster than inflation, and they never drop their prices (or drop them by less relatively) the margins are growing.
hylaride•13h ago
vidarh•13h ago
If you're paying more than a few hundred k/year (worth starting to try below that; success rates will vary greatly) and are still paying the list prices, you might as well set fire to money.