AWS to bare metal two years later: Answering your questions about leaving AWS

https://oneuptime.com/blog/post/2025-10-29-aws-to-bare-metal-two-years-later/view

540•ndhandala•11h ago

Comments

alyxya•11h ago

With AI making it possible to use natural language to modify code, bare metal can make things easier to use with your own code and customization. Abstractions tend to be harder to reason about and have more limited functionality in exchange for being easier to get started on some standard setup.

JCM9•11h ago

For smaller operations I’d still go with a rent-a-server model with AWS. Theirs is a critical mass though where roll your own makes sense.

The long term app model on the market model is shifting much more towards buying services vs renting infrastructure. It’s here where the AWS case falls apart with folks now buying Planet Scale vs RDS, buying DataBricks over the mess that AWS puts for for data lakes, working with model providers directly vs the headaches of Bedrock. The real long term threat is AWS continues to whiff on all the other stuff and gets reduced to a boring rent-a-server shop that market forces will drive to be very low margin.

Yes a lot of those 3rd party services will run on AWS but the future looks like folks renting servers from AWS at 7% gross margin and selling their value-add service on top at 60% gross margin.

cmiles8•11h ago

A bunch written about this recently by analysts. That is the “bear” outlook on AWS

ramon156•11h ago

This doesn't really explain why you wouldn't just get a hetzner. I don't have much experience with either, but if you know how to setup your infra then hetzner seems like a no-brainer? I do not want to be tied to AWS where I have no idea what my bill will be

JCM9•10h ago

Depending on the use case you very much could just use Herzner. A simpler and more transparent customer experience than trying to navigate the mass complexity of AWS for basic stuff.

nik736•10h ago

It's an interesting article, thanks for that.

What people forget about the OVH or Hetzner comparison is that for those entry servers they are known for, think the Advance line with OVH or AX with Hetzner. Those boxes come with some drawbacks.

The OVH Advance line for example comes without ECC memory, in a server, that might host databases. It's a disaster waiting to happen. There is no option to add ECC memory with the Advance line, so you have to use Scale or High Grade servers, which are far from "affordable".

Hetzner per default comes with a single PSU, a single uplink. Yes, if nothing happens this is probably fine, but if you need a reliable private network or 10G this will cost extra.

jammo•10h ago

Yes, but there are options for dedicated server providers who offer dual PSU and ECC ram etc. It's more expensive though for e.g a 24 Core Epyc with 384GB RAM dual 10G netowork is like $500/month (though there's smaller servers on serversearcher.com for other examples)

vjerancrnjak•10h ago

Is there software that works without ECC RAM ? I think most popular databases just assume memory never corrupts .

torginus•8h ago

I'm pretty sure they keep internal internal checksums at various points to make sure the data on disk is intact - so does the filesystem, I think they can catch when memory corruption occurs, and can roll back to a consistent state (you still get some data loss).

But imo, systems like these (like the ones handling bank transaction), should have a degree of resiliency to this kind of failure, as any hw or sw problem can cause something similar.

lossolo•9h ago

These concerns are exaggerated. I've been running on Hetzner, OVH and friends for 20 years. During that time I've had only two issues, one about 15 years ago when a PSU failed on one of the servers, and another a few years ago when an OVH data center caught fire and one of the servers went down. There have been no other hardware issues. YMMV.

hedora•7h ago

They matter at scale, where 1% issues end up happening on a daily or weekly basis.

For a startup with one rack in each of two data centers, it’s probably fine. You’ll end up testing failover a bit more, but you’ll need that if you scale anyway.

If it’s for some back office thing that will never have any load, and must not permanently fail (eg payroll), maybe just slap it on an EC2 VM and enable off-site backup / ransomware protection.

ghaff•5h ago

Wasn't my product as a product manager but my long-ago company came out with an under the desk minicomputer product for distributed sites. And they didn't use ECC memory in the design. The servers didn't fail very often but multiply that fairly low error rate by a large number of servers and a system failure was happening every few days or so. The customer wasn't happy.

torginus•8h ago

I never understood the draw of 'server-grade hardware'. Consumer hardware fails rarely enough that you could 2x your infra and still be paying less.

hedora•7h ago

Their current advance offerings use AMD EPYC 4004 with on-die ECC. I can’t figure out if it’s “real” single correction double detection, or if the data lines between the processor and dimms are protected or not though.

nik736•6h ago

It's only on-die ECC not real ECC

montecarl•6h ago

I can't believe how affordable Hetzner is. I just rented a bare metal 48 core AMD EPYC 9454P with 256 GB of ram and two 2 TB NVME ssds for $200/month (or $0.37 per hour). Its hard to directly compare with AWS, but I think its about 10x cheaper.

titanomachy•5h ago

Wow. Probably performs better too, with a recent CPU and non-"elastic" disk. What about ingress/egress?

aetherspawn•10h ago

Ok but what about a dedicated OVH for example? Those are about 70% cheaper than AWS, so is it still worth it to colo?

bilekas•10h ago

Did you read the article ? The main point of this and the prior article is that YES colocation/baremetal IS a better option for this company (and I would argue the majority of AWS users)

reference : https://news.ycombinator.com/item?id=38294569

bilekas•10h ago

I'm so surprised there is so much pushback against this.. AWS is extremely expensive. The use cases for setting up your system or service entirely in AWS are more rare than people seem to realise. Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?

> We have 730+ days with 99.993% measured availability and we also escaped AWS region wide downtime that happened a week ago.

This is a very nice brag. Given they are using their ddos protection ingress via CloudFlare there is that dependancy, but in that case I can 100% agree than DNS and ingress can absolutely be a full time job. Running some microservices and a database absolutely is not. If your teams are constantly monitoring and adjusting them such as scaling, then the problem is the design. Not the hosting.

Unless you're a small company serving up billions of heavy requests an hour, I would put money on the bet AWS is overcharging you.

JCM9•10h ago

As the author points out AWS can provide a few things that you wouldn’t want to try and replicate (like CloudFront) but for most other things you’re very much correct. AWS is ultimately very expensive for what it is. The complicated billing that’s full of surprises also makes cost management a head-banging experience.

tyingq•10h ago

Fair, though using AWS solely for CloudFront would mean you should compare to Cloudflare, Akamai, Fastly, etc. I'm not sure if the value prop for it looks so great if you don't include the "integrated with your other AWS stuff" benefit.

vidarh•9h ago

I mean, AWS egress is so expensive that I'd put something else in front of it for anyone who has any decent amount of traffic.

JCM9•8h ago

Agree, CloudFront isn’t super competitive with CDN focused vendors. It’s basically the “well you’re already on AWS so may as well just use this” play.

jagged-chisel•10h ago

Forget? You have to hire people for that. We are a software organization. We build software. If we rent in the cloud, there is less HR hassle - hiring, raises, bonuses, benefits, firing … none of that headache involved with the cloud.

Technically? Totally doable. But the owners prefer renting in the cloud over the people-related issues of hiring.

bilekas•10h ago

> Forget? You have to hire people for that. We are a software organization. We build software.

You don't need to hire dedicated people full time. It could even be outsourced and then a small contract for maintenance.

It's the same argument you could say for "accounting persons", or "HR persons" - "We are a software organisation!" - Personally I don't buy the argument.

Foobar8568•10h ago

Outsourcing and cloud cost are always underestimated.

theta_d•10h ago

> It could even be outsourced and then a small contract for maintenance.

Yeah, those people we outsourced to happen to work at AWS.

vidarh•9h ago

They don't though. You still need devops when you use AWS, and most organisations end up needing more time spent on devops when they use AWS.

chasd00•10h ago

Every company I’ve consulted for has hired a team dedicated to just setting up and monitoring AWS for the software devs. Hell, you’d probably reduce headcount running on bare metal.

hobs•10h ago

I have spent about 1 day waiting for every 5 days doing stuff at my last 3 jobs all of which were growing companies thinking that they needed the power of the cloud, but they sure as hell were not paying to make it fast or easy to use.

Pay some "devops" folks and then underfund them and give them a mandate of all ops but with less people and also you need to manage the constant churn of aws services and then deal with normal outages and dumb dev things.

papichulo2023•9h ago

Pretty much this. Most companies have the "devops" folks fully dedicated to maintaining the cloud stuff.

JackSlateur•7h ago

In more than 15 years of experiences, in various compagnies, the number of people who can build and run an on-premise infrastructure sanely can be counted on my right hand fingers

These people exist, but we have far more stupid "admins" around here

When you are not in the infrastructure business (I work in retail at the moment), the public cloud is the sane way to go (which is sad, but anyway)

9cb14c1ec0•10h ago

This is the fallacy that Amazon sold everyone on: that the cloud has no headache or managment needed. This is manifestly untrue. It's also untrue that bare metal takes lots of management time. I have multiple Dell rack servers colocated in several different datacenters, and I don't spend any time at all managing them. They just run.

al_borland•10h ago

> This is the fallacy that Amazon sold everyone on

I’ve been working at a place for a long time and we have our own data centers. Recently there has been a push to move to the public cloud and we were told to go through AWS training. It seems like the first thing AWS does in its training is spend a considerable amount of time on selling their model. As an employee who works in infrastructure, hearing Amazon sell so hard they the company doesn’t need me anymore is not exactly inspiring.

After that section they seem to spend a considerable amount of time on how to control costs. These are things no one really thinks about currently, as we manage our own infra. If I want to spin up a VM and write a bunch of data to it, no one really cares. The capacity already exists and is paid for, adding a VM here or there is inconsequential. In AWS I assume we’ll eventually need to have a business justification for every instance we stand up. Some servers I run today have value, but it would be impossible to financially justify in any real terms when running in AWS where everything has a very real cost assigned to it. What I do is too detached from profit generation, and the money we could save is mostly theoretical, until something happens. I don’t know how this will play out, but I’m not excited for it.

whstl•8h ago

I can confirm this.

The AWS mandatory training I did in the past was 100% marketing of their own solutions, and tests are even designed to make you memorize their entire product line.

The first two levels are not designed for engineers: they're designed for "internal salespeople". Even Product Managers were taking the certification, so they would be able to recommend AWS products to their teams.

snoman•3h ago

As a business owner that pays the hardware bill, what you see as the benefit of your current environment - or a downside of moving to the cloud - I see in a completely different light. To some extent I’d be upset with arbitrary amounts of paid-for capacity just lying around with zero accountability for that spend.

JackSlateur•7h ago

You miss the good time spent debugging a firmware issue, which leads to packet drop on the NIC (or data corruption on the nvme)

I do not miss that crap

tonypapousek•3h ago

> I don't spend any time at all managing them

Who does, then? Even with automatic updates, one can assume some level of maintenance is required for long-term deployments.

Don’t get me wrong, I love running stuff bare metal for my side projects, but scaling is difficult without any ops.

9cb14c1ec0•1h ago

No one. I have automatic backups with proxmox backup server. Updates are automatic and deployments are automated.

jsiepkes•10h ago

This is exactly the rhetoric Microsoft used in the 00's with it's "Get the facts" marketing campaign against Linux and open-source: "Never mind the costs, think about the people hours you are saving!".

It wasn't as simple as that then, at it's still not as simple as that now.

noir_lord•10h ago

Nope and never has been but to (some of) both sides “it depends” means you are on the other side.

It’s become polarised (as everything seems to).

I’ve specced bare metal, I’ve specced AWS, which is used entirely a matter of the problem/costs and relative trade-offs.

That is all it is.

foldr•9h ago

In fairness to Microsoft, this argument should have been correct. It ought to be possible for Microsoft to offer products with better polish and better support than open source alternatives, and that ought to more than compensate for any licensing costs. Whether Microsoft actually managed to do this is debatable, but the principle is sound enough.

ghaff•9h ago

It sort of was especially with respect to desktop software. The licensing costs associated with Microsoft Office etc. were probably not really that much compared to the disruption with switching offices of people who just wanted to do their job to open source alternatives.

ecshafer•7h ago

This is true, but also really funny considering that even today the average windows sysadmin can still barely use powershell and relies on console clicking and batch scripts. A good unix admin can easily admin 10-100x the machines as a windows admin, and this was more true back in the early 00s. So the marketing on getting the facts was absolutely false.

bigstrat2003•3h ago

Citation needed on that one. I've only worked with a minority of Windows sysadmins who are as incompetent as you say. And yeah, of course a good unix admin can run circles around a bad windows one, but the converse is just as true. A good Windows admin can run circles around a bad unix one. It has nothing to do with the operating systems and everything to do with technical competence of the individual.

shishcat•10h ago

Don't you have cloud architects and similiar figures already?

qaq•10h ago

Just because AWS abstracted something doesn't mean you don't need people who understand all the quirks of the black box you supposedly don't have to worry about. Guess what those people are expensive. You also have to deal with a ton of crap like hard resource account limits that on any meaningful size project will push complexity up by forcing you to use multiple accounts.

vidarh•9h ago

I help people run their systems.

Clients that use cloud consistently end up spending more on devops resources, because their setups tends to be wastly more complex and involve more people.

whstl•8h ago

I've worked on both kinds of companies in almost 25 years and I can confirm this is true.

The biggest ops teams I worked alongside were always dedicated to running AWS setups. The slowest too were dedicated to AWS. Proportionally, I mean, of course.

People here are comparing the worst possible of Bare Metal with "hosting my startup on AWS".

wredcoll•8h ago

> The biggest ops teams I worked alongside were always dedicated to running AWS setups. The slowest too were dedicated to AWS.

I wish I could come up with some kind of formalization of this issue. I think it has something to do with communication explosions across multiple people.

bryanlharris•7h ago

Increases in complexity exponentially increases mistakes + MS Teams meetings are just a glorified game of telephone.

Don't make perfect the enemy of the good.

Aldipower•9h ago

Forgot? Driving something on AWS needs also a lot of people. In my experience even more. The term SRE did not exist before.

embedding-shape•9h ago

> We build software

Right, doesn't that include figuring out the right and best way of running it, regardless if it runs on client machines or deployed on servers?

At least I take "software engineering" to mean the full end-to-end process, from "Figure out the right thing to build" to "runs great wherever it's meant to run". I'm not a monkey that builds software on my machine and then hands it off to some deployment engineer who doesn't understand what they're deploying. If I'm building server software, part of my job is ensuring it's deployed in the right environment and runs perfectly there too.

rcxdude•9h ago

I really dislike the fallacy that just because you're buying something it means that you're not building anything. In practice this is never true: there's always some people-in-your-org time cost of buying something just as much as there's some giving-money-to-other-orgs cost to building something. So often organisations wind up buying something and spending way more time in the process than it would cost for them to build it themselves.

With AWS I think this tradeoff is very weak in most cases: the tasks that you are paying AWS for are relatively cheap in time-of-people-in-your-org, and AWS also takes up a significant amount of that time with new tasks as well. Of the organisations I'm personally aware of, the ones who hosted on-prem spent less money on their compute and had smaller teams managing it, with more effective results than those who were cloud-based (to various degrees of egregousness from 'well, I can kinda see how it's worth it because they're growing quickly' to 'holy shit they're setting money on fire and compromising their product because they can't just buy some used tower PCs and plug them in in a closet in the office')

base698•9h ago

Until you factor in the legions of devops writing terraform, iam, and cicd scripts.

canucktrash669•9h ago

Ultimately these owners hire me to cut their 6-figure AWS bill by 50%. It's mostly rearchitecting mistakes. Amongst them is taking AWS blog propaganda at face value. Those savings could be 80% if they chose managed bare metal (no racking and stacking).

ericd•57m ago

You can just set up your own cloud on leased machines, and pocket the huge difference in cost. Devops languages are pretty easy to learn, IME, and the infra stuff takes less maintenance than the AWS proponents seem to think. I guess it depends on your usage profile, but like bandwidth especially is ruinously expensive compared to what you get with leased machines.

fabian2k•10h ago

A large part of the different views on this topic are due to the way people estimate the amount of saved effort and money because you're pushing some admin duties to the cloud provider instead of doing this yourself. And people come to vastly different conclusions on this aspect.

It's also that the requirements vary a lot, discussions here on HN often seem to assume that you need HA and lots of scaling options. That isn't universally true.

nicce•10h ago

> A large part of the different views on this topic are due to the way people estimate the amount of saved effort and money because you're pushing some admin duties to the cloud provider instead of doing this yourself. And people come to vastly different conclusions on this aspect

This applies only if you had an extra customer that pays the difference. Basically argument only holds if you can’t take more customers because upkeeping the infrastructure takes too much time or you need to hire extra person which takes more money than AWS bill difference.

tstrimple•3h ago

> discussions here on HN often seem to assume that you need HA and lots of scaling options.

Funny how our perceptions differ. I seem to mostly see people saying all you need is a cheap Hetzner instance and postgres to solve all technical problems. We clearly all have different working environments and requirements. That's I roll my eyes at the suggestions in threads I see of going all in on colo. My last two major cloud migrations were due to colo facilities shutting down. They were getting kicked out and had a deadline. In one of the cases, the company I was working with was the second largest client at the colo but when the largest client decided to pull out the owners decided the economics of running the datacenter didn't make sense to them anymore. Switching colo facilities when you have a few servers isn't a big deal. It's annoying but manageable. When you have hundreds to thousands of servers, it becomes a major operational risk and is enormously disruptive to business as usual.

fulafel•10h ago

The direct cost is the easy part. The more insidious part is that you're now cultivating a growing staff of technologists whose careers depend on doing things the AWS way, getting AWS certified to ensure they build your systems the AWS Well Architected Way instead of thinking themselves, and can upsell you on AWS lock-in solutions using AWS provided soundbites and sales arguments.

("Shall we make the app very resilient to failure? Yes running on multiple regions makes the AWS bill bigger but you'll get much fewer outages, look at all this technobabble that proves it")

And of course AWS lock-in services are priced to look cheaper compared to their overpricing of standard stuff[1] - if you just spend the engineering effort and IaC coding effort to move onto them, this "savings" can be put to more AWS cloud engineering effort which again makes your cloud eng org bigger and more important.

[1] (For example implementing your app off containers to Lambda, or the db off PostgreSQL to DynamoDB etc)

Hilift•10h ago

> The direct cost is the easy part

I don't think it is easy. I see most organizations struggle with the fact that everything is throttled in the cloud. CPU, storage, network. Tenants often discover large amounts of activity they were previously unaware of, that contributes to the usage and cost. And there may be individuals or teams creating new usages that are grossly impacting their allocation. Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings.

Then you can start adding in the Cloud value, such as incomprehensible networking diagrams that are probably non-compliant in some way (guess which ones!), and security? What is it?

m-gasser•8h ago

> Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings.

Sounds interesting, which setting is that?

infecto•8h ago

Would love to know as well.

Hilift•2h ago

Multiple Active Result Sets (MARS). During large query responses or bulk loads, "full" packets cause an additional packet to be sent over the wire with about five bytes to hold the MARS "wrapper". The net result is one full packet, and one empty packet on the wire, alternating. The performance impact in LAN latency is negligible. However on higher latency between AWS and your premises it has a terrible performance impact.

MARS isn't strictly needed for most things. Some features that requires it are ORM (EF) proxies and lazy loading. If you need MARS, there are third party "accelerators" that workaround this madness.

"MARS Acceleration significantly improves the performance of connections that use the Multiple Active Result Sets (MARS) connection option."

https://documentation.nitrosphere.com/resources/release-note...

infecto•2h ago

Is that not a client connection flag? MARS does not require a setting change on the server?

anonymars•1h ago

I think you may have misinterpreted what he said. I can see why it seems to imply a server setting but that isn't the case

> Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings

infecto•40m ago

You are right. For some reason when I initially sped through the post I read it as if RDS was doing something wrong.

_the_inflator•3h ago

Yes. Cloud sellers new this: Happy path for this flagship project, the shinny new object, and some additional services. After the point of no return what usually happens is, that cloud will be a replica of bare metal development.

As an Computer Science dude and former C64/Amiga coder in Senior Management of a large international Bank, I saw first hand, how cost balloon simply due to the fact, that the bank recreates and replicates its bare metal environment in the cloud.

So increasing costs while nothing changed. Imagine that: fixed resources, no test environments, because virtualisation was out of the equation in the cloud due to policies and SDLC processes. And it goes on: releases on automation? Nope, request per email and attached scan of a paper document as sign-off.

Of course your can buy a Ferrari and use it as a farm tractor. I bet it is possible with a little modification here and there.

Another fact is, that lock in plays a huge role. Once you are in it, no matter what you subscribe to, magically everything slows suddenly down, a bit, but since I am a guy who uses a time tracker to test and monitor apps, I could easily draw a line even without utilizing my Math background: enforced throtelling.

There is a difference between 100, 300 and 500ms for SaaS websites - people without prior knowledge of peceptual psychology feel it but cannot but their finger in the wound. But since we are in the cloud, suddenly a cloud manager will offer you an speed upgrade - just catered for your needs! Here, have a trial period over 3 month for free and experience the difference for your business!

I am a bit of opinionated here and really suppose, that cloud metrics analysed the banks traffic and service usage to willingly slow it down in a way, only professionals could find out. Have you promised to be lightning fast in the first place? No, that's not what the contract says. We fed you with it, but a "normal" speed was agreed upon. It is like getting a Porsche as a rental car for free when you take your VW Beetle to the dealer for a checkup. Hooked, of course. A car is a car after all. How to boil a frog? Slowly.

Of course there will be more sales and this is achilles' heel for every business and indifferent customers - easy prey.

It is a vicious cycle, almost like taxation. You cannot hide from it, no escape and it is always on the rise.

vidarh•10h ago

I was about to rage at you over the first sentence, because this is so often how people start trying to argue bare metal setups are expensive. But after reading the rest: 100% this. I see so many people push AWS setups not because it's the best thing - it can be if you're not cost sensitive - but because it is what they know and they push what they know instead of evaluating the actual requirements.

hibikir•9h ago

Well, they aren't wrong about the bare metal either: Every organization ends up tied to their staff, and said staff was hired to work on the stack you are using. People end up in quite the fights because their supposed experts are more fond of uniformity and learning nothing new.

Many a company was stuck with a datacenter unit that was unresponsive to the company's needs, and people migrated to AWS to avoid dealing with them. This straight out happened in front of my eyes multiple times. At the same time, you also end up in AWS, or even within AWS, using tools that are extremely expensive, because the cost-benefit analysis for the individuals making the decision, who often don't know very much other than what they use right now, are just wrong for the company. The executive on top is often either not much of a technologist or 20 years out of date, so they have no way to discern the quality of their staff. Technical disagreements? They might only know who they like to hang out with, but that's where it ends.

So for path dependent reasons, companies end up making a lot of decisions that in retrospect seem very poor. In startups if often just kills the company. Just don't assume the error is always in one direction.

baq•9h ago

> Many a company was stuck with a datacenter unit that was unresponsive to the company's needs

I'd like to +1 here - it's an understated risk if you've got datacenter-scale workloads. But! You can host a lot of compute on a couple racks nowadays, so IMHO it's a problem only if you're too successful and get complacent. In the datacenter, creative destruction is a must and crucially finance must be made to understand this, or they'll give you budget targets which can only mean ossification.

alemanek•5h ago

In orgs I have seen this it is usually a symptom of the data center unit being starved of resources. It’s like they have only been given the choice of on prem but ridiculous paperwork and long lead times or pay 20x for cloud.

Like can’t we just give the data center org more money and they can over provision hardware. Or can we not have them use that extra money to rent servers from OVH/Hetzner during the discovery phase to keep things going while we are waiting on things to get sized or arrive?

dumbledoren•19m ago

> Or can we not have them use that extra money to rent servers from OVH/Hetzner

Or just use Hetzner for major performance at low cost... Their apis and stuff make it look like its your datacenter.

vidarh•9h ago

It's simple enough to hire people with experience with both, or pay someone else to do it for you. These skills aren't that hard to find.

If you hire people that are not responsive to your needs, then, sure, that is a problem that will be a problem irrespective of what their pet stack is.

whstl•8h ago

Sure but I have seen the exact same thing happen with AWS.

In a large company I worked the Ops team that had the keys to AWS was taking literal months to push things to the cloud, causing problems with bonuses and promotions. Security measures were not in place so there were cyberattacks. Passwords of critical services lapsed because they were not paying attention.

At some point it got so bad that the entire team was demoted, lost privileges, and contractors had to jump in. The CTO was almost fired.

It took months to recover and even to get to an acceptable state, because nothing was really documented.

Edman274•8h ago

The entire value proposition of AWS vs running one's own server is basically this: is it easier to ask for permission, or forgiveness? You're asking for permission to get a million dollars worth of servers / hardware / power upgrades now, or you're asking for forgiveness for spending five million dollars in AWS after 10 months. Which will be easy: permission or forgiveness?

embedding-shape•8h ago

> said staff was hired to work on the stack you are using

Looking back at doing various hiring decisions at various levels of organizations, this is probably the single biggest mistake I've done multiple times, hiring specific people using specific technology because we were specifically using that.

You'll end up with a team unwilling to change, because "you hired me for this, even if it's best for the business with something else, this is what I do".

Once I and the organizations shifted our mindset to hiring people who are more flexible, even if they have expertise in one or two specific technologies, they won't put their head in the sand whenever changes come up, and everything became a lot easier.

vidarh•6h ago

Exactly. If someone has "Cloud Engineer" in the headline of their resume instead of "Devops Engineer" it's already warning and worth probing. If someone has "AWS|VMWare Engineer" in their bio, it's a giant red flag to me. Sometimes it's people just being aware where they'll find demand, but often it's indicative of someone who will push their pet stack - and it doesn't matter if it's VMWare on-prem or AWS (both purely as examples; it doesn't matter which specific tech it is), it's equally bad if they identify with a specific stack irrespective of what the stack is.

I'll also tend to look closely at whether people have "gotten stuck" specialising in a single stack. It won't make me turn them down, but it will make me ask extra questions to determine how open they are to alternatives when suitable.

infecto•8h ago

Your comment also jogged my memory of how terrible bare metal days used to be. I think now with containers it can be better but the other reason so many switched to cloud is we don’t need to think about buying the bare metal ahead of time. We don’t need to justify it to a DevOps gatekeeper.

vidarh•7h ago

That so many people remember bare metal as of 20+ years ago is a large part of the problem.

A modern server can be power cycled remotely, can be reinstalled remotely over networked media, can have its console streamed remotely, can have fans etc. checked remotely without access to the OS it's running etc. It's not very different from managing a cloud - any reasonable server hardware has management boards. Even if you rent space in a colo, most of the time you don't need to set foot there other than for an initial setup (and you can rent people to do that too).

But for most people, bare metal will tend to mean renting bare metal servers already configured anyway.

When the first thing you then tend to do is to deploy a container runtime and an orchestrator, you're effectively usually left with something more or less (depending on your needs) like a private cloud.

As for "buying ahead of time", most managed server providers and some colo operators also offer cloud services, so that even if you don't want to deal with a multi-provider setup, you can still generally scale into cloud instances as needed if your provider can't bring new hardware up fast enough (but many managed server providers can do that in less than a day too).

I never think about buying ahead of time. It hasn't been a thing I've had to worry about for a decade or more.

infecto•4h ago

You are right but I just think people miss the history when we talk about moving to the cloud. It was not that long ago at a reasonable size Bay Area company, I would need to justify new metal to be provisioned to standup a service I was tasked with.

mmarq•3h ago

> A modern server can be power cycled remotely, can be reinstalled remotely over networked media, can have its console streamed remotely, can have fans etc. checked remotely without access to the OS it's running etc. It's not very different from managing a cloud - any reasonable server hardware has management boards. Even if you rent space in a colo, most of the time you don't need to set foot there other than for an initial setup (and you can rent people to do that too).

All of this was already possible 20 years ago, with iLO and DRAC cards.

vidarh•29m ago

Yes, that's true, but 20 years ago a large proportion of lower end servers people were familiar with didn't have anything like it, and so a whole lot even of developers who remember "pre-cloud" servers have never experienced servers with them.

dumbledoren•5h ago

The catch is that bare metal is SO cheap and performant that you can buy legions of it and have it lying around. And datacenters, their APIs and whatnot advanced so much that you can even have automations that automatically provision and set up your bare metal servers. With containers, it gets even better.

And, lets face it - arent you already overprovisioning on the cloud because you cant risk your users waiting 1-2 minutes until your new nodes and pods get up? So basically the 'autoscaling' of cloud has always been a myth.

torginus•8h ago

The weird thing is I'm old enough to have grown up in the pre-cloud world, and most of the stuff, like file servers, proxies, dbs, etc. isn't any more difficult to set up than AWS stuff, it's just that the skills are different

Also there's a mindset difference - if I gave you a server with 32 cores you wouldn't design a microservice system on it, would you? After all there's nowhere to scale to.

But with AWS, you're sold the story of infinite compute you can just expect to be there, but you'll quickly find out just how stingy they can get with giving you more hardware automatically to scale to.

I don't dislike AWS, but I feel this promise of false abundance has driven the growth in complexity and resource use of the backend.

Reality tends to be you hit a bottleneck you have a hard time optimizing away - the more complex your architecture, the harder it is, then you can stew.

vidarh•5h ago

> But with AWS, you're sold the story of infinite compute you can just expect to be there, but you'll quickly find out just how stingy they can get with giving you more hardware automatically to scale to.

This is key.

Most people never scale to a size where they hit that limit, and in most organisations where that happens, someone else have to deal with it, and so most developers are totally unaware of just how fictional the "infinite scalability" actually is.

Yet it gets touted as a critical advantage.

At the same time, most developers have never ever tried to manage modern server harware, and seem think it is somehwat like managing the hardware they're using at home.

torginus•8m ago

But that limit is well below on what you could get even in a gaming machine (AWS cpus are SMT threads, so a 32 core machine is actually 64 cpus by AWS) - you can get that in a high end workstation, and I'd guess that's way more power than most people end up using even in their large-ish scale AWS projects.

ApolloFortyNine•5h ago

>I see so many people push AWS setups not because it's the best thing - it can be if you're not cost sensitive - but because it is what they know and they push what they know instead of evaluating the actual requirements.

I kinda feel like this argument could be used against programming in essentially any language. Your company, or you yourself, likely chose to develop using (whatever language it is) because that's what you knew and what your developers knew. Maybe it would have been some percentage more efficient to use another language, but then you and everyone else has to learn it.

It's the same with the cloud vs bare metal, though at least in the cloud, if your using the right services, if someone asked you tomorrow to scale 100x you likely could during the workday.

And generally speaking if your problem is at a scale where baremetal is trivial to implement, its likely we're only taking about a few hundred dollars a month being 'wasted' in AWS. Which is nothing to most companies, especially when they'd have to consider developer/devops time.

vidarh•5h ago

> if someone asked you tomorrow to scale 100x you likely could during the workday.

I've never seen a cloud setup where that was true.

For starters: Most cloud providers will impose limits on you that often means going 100x would involve pleading with account managers to have limits lifted and/or scrounding a new, previously untested, combination of instance sizes.

But secondly, you'll tend to run into unknown bottlenecks long before that.

And so, in fact, if that is a thing you actually want to be able to do, you need to actually test it.

But it's also generally not a real problem. I more often come across the opposite: Customers who've gotten hit with a crazy bill because of a problem rather than real use.

But it's also easy enough to set up a hybrid setup that will spin up cloud instances if/when you have a genuine need to be able to scale up faster than you can provision new bare metal instances. You'll typically run an orchestrator and run everything in containers on a bare metal setup too, so typically it only requires having an auto-scaling group scaled down to 0, and warm it up if load nears critical level on your bare metal environment, and then flip a switch in your load balancer to start directing traffic there. It's not a complicated thing to do.

Now, incidentally, your bare metal setup is even cheaper because you can get away with a higher load factor when you can scale into cloud to take spikes.

> And generally speaking if your problem is at a scale where baremetal is trivial to implement, its likely we're only taking about a few hundred dollars a month being 'wasted' in AWS. Which is nothing to most companies, especially when they'd have to consider developer/devops time.

Generally speaking, I only relatively rarely work on systems that cost less than in the tens of thousands per month and up, and what I consistently see with my customers is that the higher the cost, the bigger the bare-metal advantage tends to be as it allows you to readily amortise initial setup costs of more streamlined/advanced setups. The few places where cloud wins on cost is the very smallest systems, typically <$5k/month.

12_throw_away•4h ago

> if your using the right services, if someone asked you tomorrow to scale 100x you likely could during the workday.

"The right services" is I think doing a lot of work here. Which services specifically are you thinking of?

- S3? sure, 100x, 1000x, whatever, it doesn't care about your scale at all (your bill is another matter).

- Lambdas? On their own sure you can scale arbitrarily, but they don't really do anything unless they're connected to other stuff both upstream and downstream. Can those services manage 100x the load?

- Managed K8s? Managed DBs? EC2 instances? Really anything where you need to think about networking? Nope, you are not scaling this 100x without a LOT of planning and prep work.

vidarh•24m ago

> Nope, you are not scaling this 100x without a LOT of planning and prep work.

You're note getting 100x increase in instances without justifying it to your account manager, anyway, long before you figure out how to get it to work.

EC2 has limits on the number of instances you can request, and it certainly won't let you 100x unless you've done it before and already gone through the hassle to get them to raise your limits.

On top of that, it is not unusual to hit availability issues with less common instance types. Been there, done that, had to provision several different instance types to get enough.

raw_anon_1111•1h ago

I only work at companies that are using cloud because I hate administering systems and I hate dealing with system administrators when I need resources.

anal_reactor•9h ago

My manager wants me to make this silly AWS certification.

Let me go on a tangent about trains. In Spain before you board a high-speed train you need to go though full security check, like on an airport. In all other EU countries you just show up and board, but in Spain there's the security check. The problem is that even though the security check is an expensive, inefficient theatre, just in case something does blow up, nobody wants to be the politician that removed the security check. There will be no reward for a politician that makes life marginally easier for lots of people, but there will be severe punishment for a politician that is involved in a potential terrorist attack, even if the chance of that happening is ridiculously small.

This is exactly why so many companies love to be balls deep into AWS ecosystem, even if it's expensive.

mrits•9h ago

AWS doesn’t have to be expensive.

embedding-shape•8h ago

Sure, but you outgrow the free ("trial") resources in a blink, and then it starts being expensive compared to the alternatives.

rsav•8h ago

Nobody gets fired for buying IB^H^H AWS

kleiba•8h ago

How does Spain deal with trains that come in from a neighboring country?

hedora•8h ago

The security check has nothing to do with protecting trains or passengers, so your question is irrelevant.

kleiba•8h ago

Thanks for letting me know that my question is irrelevant. Sorry for taking up your time.

snovv_crash•8h ago

French trains come in without any security checks.

embedding-shape•8h ago

> In all other EU countries you just show up and board, but in Spain there's the security check

Just for curiosity's sake, did any other EU countries have any recent terrorist attacks involving bombs on trains in the capital, or is Spain so far alone with this experience?

gtr•8h ago

London had the tube bombings, but there is no security scanning there.

embedding-shape•8h ago

AFAIK, there is no security scanning on the metro/"tube" in Spain either, it's on the national train lines.

Edit: Also, after looking it up, it seems like London did add temporary security scanners at some locations in the wake of those bombings, although they weren't permanent.

Russia is the only other European country besides Spain that after train bombings added permanent security scanners. Belgium, France and a bunch of other countries have had train bombings, but none of them added permanent scanners like Spain or Russia did.

iberator•6h ago

Checkout Madrid 2004 terror attacks... So deadly that Spain left Afghanistan and Iraq afik.

embedding-shape•5h ago

That's exactly the event I was alluding to, good detective work :)

freetanga•7h ago

https://en.wikipedia.org/wiki/2015_Thalys_train_attack

torginus•8h ago

Unfortunately it's not, and it gets more difficult the more cloud-y your app gets.

You can pay for EC2+EBS+network costs, or you can have a fancy cloud native solution where you pay for Lambda, ALBs, CloudWatch, Metrics, Secret Manager, (things you assume they would just give you, like if you eat at a restaurant, you probably won't expect to pay for the parking, toilet, or paying rent for the table and seats).

So cloud billing is its own science and art - and in most orgs devs don't even know how much the stuff they're building costs, until finance people start complaining about the monthly bills.

jmaker•7h ago

We run regular FinOps meetings within departments, so everyone’s aware. I think everyone should. But it’s a lot of overhead of course. So a dev is concerned not only with DevOps anymore but with DevSecFinOps. Not everyone can cope with so many aspects at once. There’s a lot of complexity creep in that.

torginus•5h ago

Yeah, AWS has the billing panel, that's where I usually discover that after I make a rough estimate on how much the thing I'm building should cost by studying the relevant tables, I end up with stuff costing twice as much, because on top of the expected items there's always a ton of miscellaneous stuff I never thought about.

UltraSane•6h ago

I have Claude, ChatGPT, and Gemini analyze our AWS bills and usage metrics once a month and they are surprisingly good at finding savings.

hinkley•7h ago

My last team decided to hand manage a Memcached cluster because it cost half as much as an unmanaged service versus AWS’s alternative. Don’t know how much we really saved versus opportunity cost on dev time though. But it’s close to negative.

jmaker•7h ago

One of the issues there is that pricing a managed service deprives your people or gaining extra experience. There’s a synergy over time, the more you manage yourself. But it’s totally justified to pick a managed service if it checks out for your budget. The problem I saw often emanate was bad decision making, bad opportunity cost estimation. In other words, there’s an opportunity cost to picking the managed service, too, and they offset each other more or less.

hinkley•3h ago

I wonder if there’s enough space for a Do Well By Doing Good company out there to provide a ladder from cheap self managed up to fully automated rolling upgrades.

Because it was mostly fine at first, but later we had some close calls when there were changes that needed to be made on the servers. By the time we managed to mess up our hand managed incremental restart process, we had several layers of cache and so accidentally wiping one didn’t murder our backend, but did throw enough alerts to cause a P2. And because we were doing manual bucketing of caches instead of consistent hashing we hit the OOMKiller a couple times while dialing in.

But at this point it was difficult to move back to managed.

This feels closest to digital ocean’s business model.

jmaker•7h ago

It’s a marketing trap. But also a job guarantee since everyone’s in the same trap. You got a couple cloud engineers or "DevOps" that lobby for AWS or any other hyperscaler, NaiveDate managers that write down some decision report littered with logical fallacies, and a few years in the sink cost is so high you can’t get off of it, and instead of doing productivity work you’re sitting in myriads of FinOps meetings, where even fewer understand what’s going on.

Engineering mangers are promised cost savings on the HR level. Corporate finance managers are promised OpEx for CapEx trade-off, the books look better immediately. Cloud engineers are embarking on their AWS journey of certification being promised an uptick to their salaries. It’s a win/win for everyone, in isolation, a local optimum for everyone, but the organization now has to pay way more than it—hypothetically—would have been paying for bare metal ops. And hypothetical arguments are futile.

And it lends itself well to overengineering and the microservices cargo cult. Your company ends up with a system distributed around the globe across multiple AZs per region of business operations, striving to shave off those 100ms latency off your clients’ RTT. But it’s outgrown your comprehension, and it’s slow anyway, and you can’t scale up because it’s expensive. And instead of having one problem, you now have 99 and your bill is one.

geodel•2h ago

All great points. I have seen in company of smart people CIO/CTO would freely up admit "Look we know cloud may not be cheap or easier to manage but this is the direction we have taken since we are getting out of owning or managing hardware/datacenter"

So it is not like one can dazzle decision makers with any logic or hard data. They are just announcing the decision while calling it a robust discussion over pros and cons of on-prem vs cloud placement.

jmaker•1h ago

Yep. I’ve also seen managerial people worship AWS sales reps as oracles, misconstruing ordinary sales meetings with them as something divine, in which they would disclose a lot of company’s IP in awe for them, just to listen to some blabbing superficial truisms. I mean, ChatGPT could tell you more. To add insult to that, the managerial people wouldn’t listen to their own senior, staff, principal engineers, and prefer to follow what the AWS reps told them.

It’s really disturbing how the human factor controls decision making in corporations.

For my peace of mind, I chose a sane path - if the company as an entity decides to do AWS, I will do my best to meet its goals. I’ve got all Professional and Specialty certs. It’s the human nature. No purpose in tilting at windmills.

ownagefool•10h ago

The consequence of running ingress and DNS poorly is downtime.

The consequence of running a database poorly is lost data.

At the end of the day they're all just processes on a machine somewhere, none of it is particularly difficult, but storing, protecting, and traversing state is pretty much _the_ job and I can't really see how you'd think ingress and DNS would be more work than the datastores done right.

Now with AWS, I have a SaaS that makes 6 figures and the AWS bill is <$1000 a month. I'm entirely capable of doing this on-prem, but the vast majority of the bill is s3 state, so what we're actually talking about is me being on-call for an object store and a database, and the potential consequences of doing so.

With all that said, there's definitely a price point and staffing point where I will consider doing that, and I'm pretty down for the whole on-prem movement generally.

vidarh•9h ago

I'm generally strongly in favour of bare metal (not so much actually on prem) but your case is one of the rare cases wher AWS makes sense. Even for cheap setups like that, bare metal could likely be cheaper even factoring in someone on call to handle issues for you, but the amounts are so small it's a perfectly reasonable choice to just pick whatever you're comfortable with.

That's the sweet spot for AWS customers. Not so much for AWS.

The key thing for AWS is trying to get you locked in by "helping you" depend on services that are hard to replicate elsewhere, so that if your costs grow to a point where moving elsewhere is worth it, it's hard for you to do so.

neves•10h ago

It's always nice to remember that AWS is responsible for 70% of Amazon profits.

vidarh•9h ago

As Jeff Bezos has been quoted as saying "your margin is my opportunity"...

The biggest difficulty in eating into AWS market share is that believing it is cheap has become religion.

mberning•10h ago

It’s expensive and the “design” of the services, if you could call it that, is such that you are forced to pay a lot, or play a lot of games to get around it. If you are going to spend your engineering time working around their ridiculous pricing schemes, you might as well spend the money on building things out yourself.

Perfect example - MSK. The brokers are config locked at certain partition counts, even if your CPU is 5%. But their MSK replicator is capped on topic count. So now I have to work around topic counts at the cluster level, and partition counts at the broker level. Neither of which are inherent limits in the underlying technologies (kafka and mirrormaker)

vb-8448•10h ago

> Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?

It's a way to "commoditize" engineers. You can run on premise or mixed infra better and cheaper, but only if you know what you are doing. This requires experienced guys and doesn't work with new grad hired by big cons and sold ad "cloud experts".

calgoo•10h ago

Also, when something breaks, you are responsible. If you put it in AWS like everyone else and it breaks, then its their problem not yours. We will still implement workarounds and fixes when it happens, but we are not responsible. Basic enterprise rules these days is to always pay someone else to be responsible.

vidarh•10h ago

Unless you put someone on retainer to be responsible, which you can do cheaper than to keep your AWS setup from breaking...

(I do that for people; my AWS using customers consistently end up needing more help)

wredcoll•8h ago

The point isn't cost, it's dodging responsibility.

vidarh•7h ago

You can dodge responsibility equally well by outsourcing to people who'll run your bare metal setup for you. We exist from small consultancies like mine to huge multinationals.

vb-8448•9h ago

Actually nothing new here, this was the same in the pre-cloud era where everyone in enterprises prefer big names(ibm, microsoft, oracle, ecc) to pass the responsibility to them in case of failures ... aka "nobody get fired because of buying IBM"

marcosdumay•8h ago

And the big name companies always refuse to take responsibility, and have worse reliability metrics than the lean alternatives...

but somehow that is never a problem.

nickstinemates•7h ago

Reality matters less than perception.

iso1631•4h ago

The only metric that's important is the CTO's bonus

When everyone is suffering because AWS is having its bi-yearly 8 hour outage, the CTO isn't blamed, bonus all round, and maybe the AWS sales team takes him for an apology lunch

When the CTO is up for 1500 days straight then has a 2 hour downtime when nobody else does, the CTO is blamed, no bonus, and more likely to get fired

snoman•3h ago

This fired of some warning bells in my head. Is the data available to actually make a verifiable claim regarding those reliability metrics like you are.

marcosdumay•47m ago

Microsoft and Oracle were on the vanguard of suing people that published metrics about them into bankruptcy... So, do you trust the metrics they publish?

IBM is older, and it's incredibly well documented how mainframes are more expensive to run than normal servers.

chasd00•9h ago

> then its their problem not yours

this is the main advantage of cloud, no one cares if the site/service/app is down as long as it's someone else's fault and responsibility.

bbarnett•6h ago

It's always your problem. The difference is, if you control things, you can fix it, work around it, resolve it.

If not, you're at the mercy of others.

esskay•10h ago

> I'm so surprised there is so much pushback against this

I'm not. It seems to be happening a lot. Any time a topic about not using AWS comes up here, or on Reddit there a sudden surge of people appearing out of nowhere shouting down anyone who suggests other options. It's honestly starting to feel like paid shilling.

7thaccount•10h ago

I think some of that is a certain group of people will do anything to play with the new shiny stuff. In my org it's cloud and now GPU.

The cloud stuff is extremely expensive and doesn't work any better than our existing solutions. Like a commentator said below, it's insidious as your entire organization later becomes dependent on that. If you buy a cloud solution, you're also stuck with the vendor deciding to double the cost of the product once you're locked in.

The GPU stuff is annoying as all of our needs are fine with normal CPU workloads today. There are no performance issues, so again...what's the point? Well... somebody wants to play with GPUs I guess.

ghaff•9h ago

Resume-driven development. It's probably pretty much always been a thing.

TheCondor•10h ago

It’s the current version of CCIE or some of the other certs. People pay money to learn how to operate AWS, other thing erode the value of their investment.

Spooky23•9h ago

I don’t think it’s paid shilling, it’s dogma that reflects where people are working here. The individual engineers are hammers and AWS is the nail.

AWS/Azure/GCP is great, but like any tool or platform you need to do some financial/process engineering to make an optimal choice. For small companies, time to market is often key, hence AWS.

Once you’re a little bigger, you may develop frameworks to operate efficiently. I have apps that I run in a data center because they’d cot 10-20x at a cloud provider. Conversely, I have apps that get more favorable licensing terms in AWS that I run there, even though the compute is slower and less efficient.

You also have people who treat AWS with the old “nobody gets fired for buying IBM” mentality.

dangus•9h ago

I think a lot of engineers who remember the bare metal days have legitimate qualms about going back to the way that world used to work especially before containerization/Kubernetes.

I imagine a lot of people who use Linux/AWS now started out with bare metal Microsoft/VMWare/Oracle type of environments where AWS services seemed like a massive breath of fresh air.

baq•9h ago

I remember having to put in orders for pallets of servers which then ended up storage somewhere because there were not enough people to carry and wire them up and/or there wasn't enough rack space to install them.

Having an ability to spin up a server or a vm when you need it without having to ask a single question is very liberating. Sometimes such elasticity is exactly what's needed. OTOH other people's servers aren't always the wise choice, but you have to know both environments to make the right choice, and nowadays I feel most people don't really know anything about bare metal.

lazyfanatic42•9h ago

the best is having rackspace & power but not enough cooling, hahaha murder me

snark42•7h ago

That only happens when you have your own data center. That's a whole different issue and most people with their own hardware don't have their own data centers as it's not particularly cost efficient except at incredibly large scale.

kijin•8h ago

That's the beauty of VMs.

Luckily, Amazon is far from the only VM provider out there, so this discussion doesn't need to be polarized between "AWS everything" and "on-premise everything". You can rent VMs elsewhere for a fraction of the cost. There are many places that will rent you bare metal servers by the hour, just as if they were VMs. You can even mix VMs and bare metal servers in the same datacenter.

iso1631•7h ago

I spin up a VM on my xen vm estate whenever I want it with just some clickops or teraform (depending on the environment)

baq•3h ago

What do you think the pallets of servers were intended for

tayo42•7h ago

Containers with k8s and bare metal aren't mutually exclusive.

If anything it enables a hybrid environment

Spooky23•7h ago

No doubt -- there are plenty of downsides to running your own stuff. I'm not anti-AWS. I'm pro-efficiency, and pro making deliberate choices. If there's a choice is spend $10M extra on AWS because the engineers get a good vibe -- there should be a compelling reason why that vibe is worth $10M. (And there may well be)

Look at what Amazon/Google/Microsoft does. If you told me you advocate running your own power plants, I'd eyeroll. But... if you're as large a power consumer as a hyper-scaler, totally different story. Google and Microsoft are investing in lighting up old nuclear plants.

array_key_first•4h ago

My company runs all their own bare metal data centers but it's containerized, and it's basically magic.

briffle•4h ago

The tooling should be getting close to manage this on-prem now, with VM's, K8s clusters, networking, storage, etc. I know that oxide computers exists, and they look fantastic, but there has got to be more 'open' ways to run things on your own Dell/HP/Supermicro servers with NVMe drives. Especially since VMware has jacked up their prices since being acquired.

Talos OS looks really interesting. But I also need the storage parts, networking parts, etc.

glitchcrab•3h ago

I run several Talos clusters (provisioned by Cluster API) on commodity hardware which is part of a Proxmox cluster in my homelab

BirAdam•9h ago

I'm not either. I used to do fully managed hosting solutions at a datacenter. I had to do everything from hardware through debugging customer applications. Now, people pay me to do the same but on cloud platforms and the occasional on-prem stuff. In general, the younger people I've come across have no idea how to set anything up. They've always just used awscli, the AWS Console, or terraform. I've even been ridiculed for suggesting people not use AWS. Thing is, public cloud really killed my passion for the industry in general.

Beyond public cloud being bad for the planet, I also hate that it drains companies of money, centralizes everyone's risk, and helps to entrench Amazon as yet another tech oligarchic fiefdom. For most people, these things just don't matter apparently.

palata•8h ago

> Thing is, public cloud really killed my passion for the industry in general.

Similar here, I think. I got into Computer Science because I liked software... the way it was. Now I truly think that most software completely sucks.

The thing is that it has grown so much since then, that most developers come from a different angle.

ecshafer•7h ago

I think in 5-10 years there is going to be very profitable consulting on setting up data center infrastructure, and de-clouding for companies.

alphager•7h ago

Why do you think public cloud is worse for the environment than a private dc? I'd expect the larger dcs to be more energy efficient.

mrits•9h ago

I think people that lived through the time where their severs are down because the admin forgot to turn them back on after he drove 50 miles back from the colo might not want to live through that again

indymike•6h ago

A lot of people here's careers have been made by moving into AWS. A lot of people's future careers will be made by moving out of AWS. That's just the tech treadmill in action.

Do what works best for your situation.

sneak•6h ago

If your spend is less than a few thousand per month, using cloud services is a no-brainer. For most startups starting up, their spend is minimal, so launching on the cloud is the default (and correct!) option.

Migrating to lower cost options thereafter when scaling is prudent, but you "build one to throw away", as it were.

red-iron-pine•5h ago

> It's honestly starting to feel like paid shilling.

the companies selling Cloud are also massive IT giants with unlimited compute resources and extensive online marketing operations.

like of fucking course they're using shillbots, they run the backend shillbot infrastructure.

they literally have LLM chatbot agents as an offering, and it's trivially easy to create fake users and repost / retweet last weeks comments to create realistic looking accounts, when then shill hard for whatever their goals are.

dumbledoren•4h ago

Possible. However what is more likely is that a lot of long-time tech workers have vested stocks or investments in Amazon and they dont want the cash cow (AWS) to get hampered. And similarly a lot of tech workers have invested in AWS skills, so they cant risk those skills becoming less valued in the marketplace due to alternatives.

parliament32•4h ago

I don't think it's paid shilling, I think it's people who got bamboozled into learning cloud-provider-clickops over actual systems work and feel threatened when you suggest hyperscalers aren't the future.

realitysballs•10h ago

For my org. I don’t have budget for a dedicated in-house opsec team, so if I on-prem it triggers additional salary burden for security . How would I overcome this?

Msurrow•10h ago

Familiarize yourself with your company’s decision process on strategic decisions like this. Ensure you have a way to submit a proposal for a decision on making the change (or find someone who has that access to sponsor your proposal), build a business case that shows cost of opsec team, hardware and everything else is lower than AWS (or if cost is higher then some other business value is gained from making the change — currently digital sovereignty could be a strong argument if you are EU based).

If you cant build a positive business case then its not the correct move. Cash is king. Sadly.

Ensorceled•9h ago

You can't. That's the use case FOR AWS/GCP. Once the differential between having a in-house team and the AWS premium becomes positive is when you make the switch.

A lot of the discussion here is that the cost of the in-house team is less than people think.

For instance: at a former gig, we used a service in the EU that handled weekends, holidays and night time issues and escalated to our team as needed. It was pretty cheap, approximately $10K monthly fee for availability and hourly rate when there were any issues to be resolved. There were a few mornings I had an email with a post-mortem report and an invoice for a hundred euros or so. We came pretty close to 5 9's uptime but we didn't have to worry about SLA's or anything.

spwa4•9h ago

There is also the factor that the idea that you don't need administrators for AWS is bullshit. Cool idea, bro. Go to your favorite jobs portal. Search for "devops" ... 1000s of jobs. I click on the first link.

Well, well, they have a whole team doing "devops administration" on AWS and require extra people. So not having the money for an in-house team ... no AWS for you.

I've worked for 2 large-ish firms in the past 3 years. One huge telco, one "medium" telco (still 100s of people). BOTH had a team just for AWS IAM administration. Only for that one thing, because that was company-wide (and was regularly demonstrated to be a single point of failure). And they had AWS administrator teams, yes teams, for every department (even HR had one, though in the medium telco all management had a shared team, but the networking and development departments still had their own AWS teams, who, btw, also did IAM. The company-wide IAM team maintained an AWS IAM and some solution they'd bought that also worked for their windows domain and ticketing system (I hate you IBM remedy), and eqiupment ordering portal and ...)

AND there were "devops" positions on every development team, and on the network engineering team, and even a small one for the building "technics" team.

Oh and they both had an internal cluster on top of AWS, part on-premise, part rented DC space, which did at least half the compute work (but presumably a lot less of the weird edge-cases), that one ran the company services that are just insane on AWS like any kind of video.

1oooqooq•8h ago

Exactly. this is the margin aws trives from.

they sell "you don't need a team"... which is true om your prototype and mvp phase. and you know when you grow you will have an ops team and maybe move out.

but in the very long middle time... you will be supporting clients and sla etc, and will end up paying both aws AND an ops team without even realizing.

Ensorceled•7h ago

Yeah, you need less admin, depending but not none. And AWS pushes you towards devops heavy solutions.

vidarh•9h ago

If you don't have budget for someone to handle this for you, you can't afford AWS either, as you still need to handle the same things and they're generally more complex when you use AWS.

izacus•9h ago

Use the same people who are now maintaining your complex AWS setup. It's not like that doesn't need maintenance or oncall.

yomismoaqui•10h ago

> Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?

We should coin the term "Cloud Learned Helplessness"

vidarh•10h ago

There is this belief that it is not extremely expensive and/or that the ops cost of bare metal will outpace it. It is a belief, and it is very rarely supported by facts.

Having done consulting in this space for a decade, and worked with containerised systems since before AWS existed, my experience is that managing an AWS system is consistently more expensive and that in fact the devops cost is part of what makes AWS an expensive option.

steelegbr•9h ago

AWS may be overcharging but it's a balancing act. Going on-prem (well, shared DC) will be cheaper but comes with requirements for either jack of all trades sysadmins or a bunch of specialists. It can work well if your product is simple and scalable. A lot of places quietly achieve this.

That said, I've seen real world scenarios where complexity is up the wazoo and an opex cost focus means you're hiring under skilled staff to manage offerings built on components with low sticker prices. Throw in a bit of the old NIH mindset (DIY all the things!) and it's large blast radii with expensive service credits being dished out to customers regularly. On a human factors front your team will be seeing countless middle of the night conference calls.

While I'm not 100% happy with the AWS/Azure/GCP world, the reality is that on-prem skillsets are becoming rarer and more specialist. Hiring good people can be either really expensive or a bit of a unicorn hunt.

PenguinCoder•9h ago

I'm proudly 100% on prem Linux sys admin. There are not openings for my skills and they do not pay as well as whatever cloud hotness is "needed".

whstl•8h ago

That's the crazy thing.

Most AWS-only Ops engineers I know are making bank and in high demand, and Ops teams are always HUGE in terms of headcount outside of startups.

The "AWS is cheaper" thing is the biggest grift in our industry.

hedora•8h ago

I wonder how vibe coding will impact this.

You can easily get your service up by asking claude code or whatever to just do it

It produces aws yaml that’s better than many devops people I’ve worked with. In other words, it absolutely should not be trusted with trivial tasks, but you could easily blow $100K’s per year for worse.

throwforfeds•7h ago

I've been contemplating this a lot lately, as I just did code review on a system that was moving all the AWS infrastructure into CDK, and it was very clear the person doing it was using an LLM which created a really complicated, over engineered solution to everything. I basically rewrote the entire thing (still pairing with Claude), and it's now much simpler and easier to follow.

So I think for developers that have deep experience with systems LLMs are great -- I did a huge migration in a few weeks that probably would have taken many months or even half a year before. But I worry that people that don't really know what's going on will end up with a horrible mess of infra code.

whstl•6h ago

To me it's clear that most Ops engineers are vibe coding their scripts/yamls today.

The time difference between having a script ready has decreased dramatically in the last 3 years. The amount of problems when deploying the first time has also increased in the same period.

The difference between the ones who actually know what they're doing and the ones who don't is whether they will refactor and test.

haik90•8h ago

I think this is driven by the market itself and the way cloud promotes their product.

After fully in cloud for sometimes, we’re moving to hybrid solutions. The upper management happy with costs and the cloud engineer had new toy's

devnullbrain•32m ago

1. large, homogenous domain where the budget for your department is large

2. niche, bespoke domain primarily occupied by companies looking to cut costs

marcosdumay•8h ago

Nobody is hiring generalists nowadays.

At the same time, the incredible complexity of the software infrastructure is making specialists more and more useless. To the point that almost every successful specialist out there is just some disguised generalist that decided to focus their presentation in a single area.

zer00eyz•8h ago

> Nobody is hiring generalists nowadays.

What?

I throw up in my mouth every time I see "full stack" in a job listing.

We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end. Full Stack is the "webmaster" of the modern era. It might mean front and back end, it might mean sysadmin and DBA as well.

marcosdumay•4h ago

Even full stack listings come with a list of technologies that the candidate must have deep knowledge of.

> We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end.

On a first approximation, those roles were all wrong. If your people don't wear many of those hats at the same time, they won't be able to create software.

But yeah, we did get rid of roles. And still require people to be specialized to the point it's close to impossible to match the requirements of a random job.

NDizzle•7h ago

Maybe everyone is retaining generalists. I keep being given retention bonuses every year, without asking for a single one so far.

As mentioned below, never labeled "full stack", never plan on it. "Generalist" is what my actual title became back in the mid 2000s. My career has been all over the place... the key is being stubborn when confronted with challenges and being able to scale up (mentally and sometimes physically) to meet the needs, when needed. And chill out when it's not.

hibikir•9h ago

And don't forget the real crux of the problem: Do I even know whether a specialist is good or not? Hiring experts is really difficult if you don't have the skill in the topic, and if you do, you either not need an expert, or you will be biased towards those that agree with you.

It's not even limited to sysadmins, or in tech. How do you know whether a mechanic is very good, or iffy? Is a financial advisor giving you good advice, or basically robbing you? It's not as if many companies are going to hire 4 business units worth of on prem admins, and then decide which one does better after running for 3 years, or something empirical like that. You might be the poor sob that hires the very expensive, yet incompetent and out of date specialist, whose only remaining good skill is selling confidence to employers.

everfrustrated•9h ago

This only gets worse as you go higher in management. How does a technical founder know what good sales or marketing looks like? They are often swayed by people who can talk a good talk and deliver nothing.

ambicapter•9h ago

The good news with marketing and sales is that you want the people who talk a good talk, so you're halfway there, you just gotta direct them towards the market and away from bilking you.

dns_snek•9h ago

> Do I even know whether a specialist is good or not?

Of course but unless I misunderstood what you meant to say, you don't escape that by buying from AWS. It's just that instead of "sysadmin specialists" you need "AWS specialists".

If you want to outsource the job then you need to go up at least 1 more layer of abstraction (and likely an order of magnitude in price) and buy fully managed services.

canucktrash669•9h ago

Managed servers reduce the on-prem skillset requirement and can also deliver a lot of value.

The most frustrating part of hyperscalers is that it's so easy to make mistakes. Active tracking of you bill is a must, but the data is 24-48h late in some cases. So a single engineer can cause 5-figure regrettable spend very quickly.

tayo42•7h ago

What size companies are we talking about

mhitza•8h ago

It's a chicken and egg problem. If the cloud didn't become such a proeminent thing, the last decade and a half would have seen the rise of much better tools to manage on-premise servers (= requiring less in-depth sysadmin expertise). I think we're starting to see such tools appear in the last few years after enough people got burned by cloud bills and lockin.

bcrosby95•6h ago

It depends upon how many resources your software needs. At 20 servers we spend almost zero time managing our servers, and with modern hardware 20 servers can get you a lot.

Its easier than ever to do this but people are doing it less and less.

dumbledoren•4h ago

> AWS may be overcharging but it's a balancing act. Going on-prem (well, shared DC) will be cheaper but comes with requirements for either jack of all trades sysadmins or a bunch of specialists

Much easier to find. Even more, they are skills much easier to learn for existing engineers. What's better, they are fundamental skills that will never lose their value as those systems are what everything else is built on.

speleding•9h ago

The complexity of AWS versus bare metal depends on what you are doing. Setting up an apache app server: just as easy on bare metal. Setting up high availability MySQL with hot failover: much easier on AWS. And a lot of businesses need a highly available database.

spwa4•9h ago

A high availability MySQL server on AWS is about the same difficulty as on your own kubernetes instance (I've got a play one on one of those $100 N100 machines, got one with 16G mem). Then:

    helm repo add mariadb-operator https://mariadb-operator.github.io/mariadb-operator
    helm install mariadb-operator mariadb-operator/mariadb-operator

And then you can just provision MariaDB "kind", ie. you kubectl apply with something specifying database name, maximum memory, type of high availability (single primary or multimaster) and secret reference and there you go: new database, ready to be plugged into other pods.

papichulo2023•9h ago

Dont you need ECC in your db nodes?

dd_xplore•8h ago

N100 supports DDR5 memory (although 1 channel) but I believe DDR5 has some error correction... May not be full ECC

1oooqooq•8h ago

amazing how nobody even know about ECC these days.

see so many series B+ companies running DB and storage without a care in the world.

spwa4•1h ago

N100 is my homelab, for playing. For instance I have a kubernetes cluster running KubeVirt, which runs 5 VMs, which ... have a kubernetes installation. My production servers are generally older Xeons with ECC ram, which are also running kubernetes.

PenguinCoder•9h ago

Most businesses really don't need that complexity. They think they do. Premature optimization.

speleding•9h ago

If your database has a hardware failure then you could loose all sales and customer data since your last backup, plus cost of the down time while you restore. I struggle to think of a business where that is acceptable.

wredcoll•8h ago

That's not the same as a "high availibility hot swap redundant multi region database".

Running mysqldump to a usb disk in the office once a day is pretty cheap.

danhor•8h ago

My "Homeserver" with its database running on an old laptop has less downtime than AWS.

I expect most, if not 99%, of all businesses can cope with a hardware failure and the associated downtime while restoring to a different server, judging from the impact of the recent AWS outage and the collective shrug in response. With a proper raid setup, data loss should be quite rare, if more is required a primary + secondary setup with a manual failover isn't hard.

evanelias•8h ago

Why are you ignoring the huge middle ground between "HA with fully automated failover" and "no replication at all"?

Basic async logical replication in MySQL/MariaDB is extremely easy to set up, literally just a few commands to type.

Ditto for doing failover manually the rare times it is needed. Sure, you'll have a few minutes of downtime until a human can respond to the "db is down" alert and initiates failover, but that's tolerable for many small to medium sized businesses with relatively small databases.

That approach was extremely common ~10-15 years ago, and online businesses didn't have much worse availability than they do today.

speleding•7h ago

I've done quite a few MySQL setups with replication. I would not call setup "extremely easy", but then, I'm not a full time DB admin. MySQL upgrades and general trouble shooting is so much more painful than AWS aurora where everything just takes a few clicks. And things like blue/green deployment, where you replicate your entire setup to try out a DB upgrade, are really hard to do onprem.

evanelias•7h ago

Without specifics it's hard to respond. But speaking as a software engineer who has been using MySQL for 22 years and learned administrative tasks as-needed over the years, personally I can't relate to anything you are saying here! What part of async replication setup did you find painful? How does Aurora help with troubleshooting? Why use blue/green for upgrade testing when there are much simpler and less expensive approaches using open source tools?

izacus•9h ago

A lot of people here have built their whole professional careers around knowing AWS and deploying to it.

Moving away is an existential issue for them - this is why there's such pushback. A huge % of new developer and devops generation doesn't know anything about deploying software on bare metal or even other clouds and they're terrified about being unemployed.

goalieca•9h ago

meanwhile skills in operating systems, networking, and optimization are declining. Every system i've seen in the last 10 years or so has left huge cash on the table by not being aware of the basics.

snoman•3h ago

That could have more to do with containerization than the cloud - and that was a goal if I recall.

Aurornis•9h ago

> I'm so surprised there is so much pushback against this.. AWS is extremely expensive.

I see more comments in favor than pushing back.

The problem I have with these stories is the confirmation bias that comes with them. Going self-hosted or on-premises does make sense in some carefully selected use cases, but I have dozens of stories of startup teams spinning their wheels with self-hosting strategies that turn into a big waste of time and headcount that they should have been using to grow their businesses instead.

The shared theme of all of the failure stories is missing the true cost of self-hosting: The hours spent getting the servers just right, managing the hosting, debating the best way to run things, and dealing with little issues add up but are easily lost in the noise if you’re not looking closely. Everyone goes through a honeymoon phase where the servers arrive and your software is up and running and you’re busy patting yourselves on the back about how you’re saving money. The real test comes 12 months later when the person who last set up the servers has left for a new job and the team is trying to do forensics to understand why the documentation they wrote doesn’t actually match what’s happening on the servers, or your project managers look back at the sprints and realize that the average time spent on self-hosting related tasks and ideas has added up to a lot more than anyone would have guessed.

Those stories aren’t shared as often. When they are, they’re not upvoted. A lot of people in my local startup scene have sheepish stories about how they finally threw in the towel on self-hosting and went to AWS and got back to focusing on their core product. Few people are writing blog posts about that because it’s not a story people want to hear. We like the heroic stories where someone sets up some servers and everything just works perfectly and there are no downsides.

You really need to weigh the tradeoffs, but many people are not equipped to do that. They just think their chosen solution will be perfect and the other side will be the bad one.

DrewADesign•8h ago

> The shared theme of all of the failure stories is missing the true cost of self-hosting: The hours spent getting the servers just right, managing the hosting, debating the best way to run things, and dealing with little issues add up but are easily lost in the noise if you’re not looking closely.

What the modern software business seems to have lost is the understanding that ops and dev are two different universes. DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems and the role is absolutely no substitute for a systems administrator. Having someone that helps derive the requirements for your infrastructure, then designs it, builds it , backs it up, maintains it, troubleshoots it, monitors performance, determines appropriate redundancy, etc. etc. etc. and then tells the developers how to work with it is the missing link. Hit-by-a-bus documentation, support and update procedures, security incident response… these are all problems we solved a long time ago, but sort of forgot about moving everything to cloud architecture.

wredcoll•8h ago

> What the modern software business seems to have lost is the understanding that ops and dev are two different universes.

This is a fascinating take, if you ask me, treating them as separate is the whole problem!

The point of being an engineer is to solve real world problems, not to live inside your own little specialist world.

Obviously there's a lot to be said for being really good at a specialized set of skills, but thats only relevant to the part where you're actually solving problems.

hndc•8h ago

> DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems

DevOps, conceptually, goes back to the 90s. I was using the term in 2001. If memory serves, AWS didn't really start to take off until the mid/late aughts, or at least not until they launched S3.

DevOps was a reaction to the software lifecycle problem and didn't have anything to do with AWS. If anything it's the other way around: AWS and cloud hosting gained popularity in part due to DevOps culture.

mjr00•7h ago

> DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems and the role is absolutely no substitute for a systems administrator.

This is revisionist history. DevOps was a reaction to the fact that many/most software development organizations had a clear separation between "developers" and "sysadmins". Developers' responsibility ended when they compiled an EXE/JAR file/whatever, then they tossed it over the fence to the sysadmins who were responsible for running it. DevOps was the realization that, huh, software works between when the people responsible for building the software ("Dev") are also the same people responsible for keeping it running ("Ops").

tstrimple•3h ago

It was very much this for me. I knew the hosting side of things because my second job as a programmer was at a small ISP that hosted custom websites. I got used to maintaining Linux web and email servers by hand over SSH. There were some common scripts, but for the most part the pattern was SSH into the server and make the changes you need to make. Most of my early startup career was like this. Closely working with hardware, the server installs, hosting configs as well as the code that actually powered things.

Jump to my first "enterprise" job and suddenly I can't fix things anymore. I have to submit tickets to other teams to look at why the thing I built isn't running as expected. That, to me, was pure insanity. The sysadmins knew fuck all about my app and as far as I was concerned barely knew how to admin systems. I knew a lot more in my 20's after all. But the friction of not running what I wrote was absolutely real and one of the main killers of productivity versus my startup days.

I also have seen this from most of the "enterprise" companies that do "DevOps" when really they just mean they have a sysadmin team who uses modern tools and IaC. The same exact friction and issues exist between dev and ops as before DevOps days. Those companies are explicitly doing DevOps wrong. When you look at the troubleshooting steps during an incident, it's identical. Bring in the devs and the ops team so we can figure out what's going on. I do think startups are more likely to get DevOps right because they aren't trying to force it on the only mental model they seem to be able to understand.

I've also found that dev teams who run and maintain their own stacks are better about automatic failure recovery and overall more reliable solutions. Whether that's due to better alignment between the app code and the app stack during development or because the dev team is now the first call when things aren't working I'm not entirely sure. Likely a mix of both.

mjr00•7h ago

> I have dozens of stories of startup teams spinning their wheels with self-hosting strategies that turn into a big waste of time and headcount that they should have been using to grow their businesses instead.

Funnily enough, the article even affirms this, though most people seemed to have skimmed over it (or not read it at all).

> Cloud-first was the right call for our first five years. Bare metal became the right call once our compute footprint, data gravity, and independence requirements stabilised.

Unless you've got uncommon data egress requirements, if you're worried about optimizing cloud spend instead of growing your business in the first 5 years you're almost certainly focusing on the wrong problem.

> You really need to weigh the tradeoffs, but many people are not equipped to do that. They just think their chosen solution will be perfect and the other side will be the bad one.

This too. Most of the massive AWS savings articles in the past few days have been from companies that do a massive amount of data egress i.e. video transfer, or in this case log data. If your product is sending out multiple terabytes of data monthly, hosting everything on AWS is certainly not the right choice. If your product is a typical n-tier webapp with database, web servers, load balancer, and some static assets, you're going to be wasting tons of time reinventing the wheel when you can spin up everything with redundancy & backups on AWS (or GCP, or Azure) in 30 minutes.

tetha•3h ago

To me it feels like nuance has been lost.

Personally, I would never self-host some B2C or B2B application if you have less than 50 - 100 techies in a healthy org. You can get just too much from a few VMs and/or a few dedicated servers at like Hetzner, OVH, or AWS managed services. At least for the average web rest thingy with a DB and some file storage. I'm sure it's possible to find counter-examples.

On the other hand, we are about 120 devs at work now, couple thousand B2B customers, 10 Platform Ops, 7 HW & DC Ops. I guess we have more ops-people than a startup may have people. Once we get rid of VMWare licensing, our colos are ridiculously cheap when amortized across 5 years compared to AWS or cloud hosting. Once EOL, they'll also reduce cloud-costs on cheaper providers for test systems and provide spontaneous failover and disaster recovery tests.

We're now also getting good cross-team scaling processes going and at this point the big barriers are actually getting enough power and cooling, not buying/racking/maintaining systems. That will be a big price tag next year, but we've not paid that money to AWS the last two years, so it's fine.

As I keep saying internally, self-hosting is like buying a 40 ton excavator, like Large Marge or a 40 ton truck. If you have enough stuff to utilize a 40 ton truck, it's good. If you need to move food around in an urban environment, or need to move an organ transplant between hospitals, a 40 ton truck tends to be rather inefficient and very expensive to maintain and run.

chickensong•2h ago

All valid and important points, but missing a painful one, also rarely represented in threads like this: flaky hardware.

Almost every bare metal success story paints a rosy picture of perfect hardware (which thankfully is often the case), or basic hard failures which are easily dealt with. Disk replacement or swapping 1u compute nodes is expected and you probably have spares on hand. But it's a special feeling to debug the more critical parts that likely don't have idle spares just sitting around. The raid controller that corrupts it's memory, reboots, and rolls back to it's previous known-good state. The network equipment that locks up with no explanation. Critical components that worked flawless for months or years, then shit the bed, but reboot cleanly.

Of course everyone built a secure management vlan and has remote serial consoles hooked up to all such devices right? Right? Oh good, they captured some garbled symbols. The vendor's first tier of support will surely not be outsourced offshore or read from a script, and will have a quick answer that explains and fixes everything. Right?

The cloud isn't always the right choice, but if you can make it work, it sure is nice to not deal with entire categories of problems when using it.

zjaffee•8h ago

AWS (along with the vast majority of B2B services in the software development industry) is good because it allows you to focus on building your product or business without needing to worry about managing servers nearly as much.

The problems here are no different than using SaaS anywhere else in a business, you can also run all your sales tracking through excel, it's just that once you have more than a few people doing sales that becomes a major bottleneck the same way not having an easier to manage infrastructure system.

maccard•8h ago

I work for a small company owned by a huge company. We are entirely independent except for purchasing, IT, and budget approval. We run our CI on AWS, and it’s slow and flaky for a variety of reasons (compiling large c++ projects combined with instance type pressure). It’s also expensive.

We planned a migration to move from 4OD instances to one on prem machine and we guessed we’d save $1000/mo, our builds would be faster and we’d have less failures due to capacity issues. We even had a spare workstation and a rack in the office that so the capex was 0.

I plugged the machine into the rack and no internet connectivity. Put in an IT ticket which took 2 days for a reply, only to be told that this was an unauthorised machine and needed to be imaged by IT. The back and forth took 4 weeks, multiple meetings and multiple approvals. My guess is that 4 people spent probably 10 hours arguing whether we should do this or not.

On AWS I can write a python script and have a running windows instance in 15 minutes.

wredcoll•8h ago

This is the root success of aws, it lets internal teams bypass sysadmin departments.

ghaff•8h ago

Working around official IT was certainly a significant factor early on. I'm less convinced it is nearly as big a driver (or a downside depending on your perspective) today.

whstl•6h ago

Especially considering that outside of startups (where approval would be fast with or without cloud), virtual infrastructure also got its own bureaucratic process.

ghaff•6h ago

A lot of people forget that, when server virtualization was still gaining momentum in a lot of circles, it wasn't uncommon at less technically savvy customers--say a regional bank at the time--to be told that it might take 2 months to provision a new server.

whstl•6h ago

I don't think anyone is forgetting that in this thread, as there's dozens of answers mentioning this.

But as an example: It took about 3 months to provision an AWS server in a recent company I consulted for due to their own bureaucracy and ineptitude of the Ops team.

On the other hand, when I needed a few CI servers for a startup I worked at, I just collected them from AppleStore during lunch hour.

Now this above is what people are "forgetting" and don't want to listen to.

maccard•1h ago

For us the problem is every device that gets plugged into our network is disabled by default, IT need to enable the port and they'll only enable it on machines that they've imaged.

But because AWS isn't in the office, it's fine. We could probably use Hetzner or OVH, but then we have to go through procurement which is as much of as hassle as going through IT.

mrktf•4h ago

It depends on organization size, just my anecdotal example, I would say the moment IT department becomes own island (for example: can totally ignore requests, with excuses staff overbooked/we need extra planning/6 months extra meetings. Or even worse - process request,but up to point where it can show for upper management and blame you for wasting resources) - you can go full cloud, at least there it is possible get something working in reasonable time.

maccard•6h ago

The same story applies for software. If I want to buy a license of X for someone, I have to go through procurement, and it takes weeks even for <$50 purchases. Yet if its on the AWS marketplace it’s pre approved as long is doesn’t breach the AWS budget.

UltraSane•6h ago

I'm not going to argue that AWS can be expensive but in my experience its biggest advantage is SPEED. In every company I worked for that ran their own data centers ever damn thing took FOREVER. new servers took months to buy and rack. any network change like a new VLAN took days to weeks. It was so annoying. But in AWS almost anything is just an API call and a few minutes at most from being enabled. It is so much more productive.

SJC_Hacker•6h ago

> I'm so surprised there is so much pushback against this.. AWS is extremely expensive. The use cases for setting up your system or service entirely in AWS are more rare than people seem to realise. Maybe I'm just the old man screaming at cloud (no pun intended) but when did people forget how to run a baremetal server ?

Long term yes you can save money rolling your own.

But with cloud you can get something up and running within maybe a few days, sometimes even faster. Often with built in scalability.

This is a much easier sell to the non-tech (i.e., money) people.

If the project continues, the path of least resistance is often to just continue with the cloud solution. At a certain point, there will be so much tech debt that any savings from long term costs from the traditional on-premises, co-location or managed hosting, are vastly by the cost of migration.

comprev•6h ago

I'm on a Platform team of <8 people and only 3 of us (most experienced too) come from sysadmin backgrounds. The rest have only ever known containers/cloud and never touched (both figuratively and literally :-) bare metal servers in their careers.

They've never used tools like Ansible (or Anaconda) or been in situations where they couldn't destroy the container and start afresh instantly.

eek2121•5h ago

I once moved a small site from AWS to Digital Ocean + Cloudflare.

$100-$300 on AWS -> $35/mo for DO + CF. Coincidentally, AWS had an outage soon after, which was avoided thanks to the move.

I have used DO for both clients and myself, and have not had any huge problems with them.

rdtsc•5h ago

> I'm so surprised there is so much pushback against this.. AWS is extremely expensive.

Basic rationalization. People will go to extraordinary lengths to justify and defend the choices they made. It's a defense mechanism: if they spent millions on AWS they are not going to sit idly while HN discusses saving hundreds of thousands with everyone nodding and agreeing. It's important for their own sanity to defend the choice they made.

j45•5h ago

The cloud is incredibly profitable for the efficiencies and improvements its introduced and held onto.

Easy to push back against what is now the unknown (bare metal), when the layers extending bare metal to cloud service have become better and better, as well as more accessible.

dumbledoren•5h ago

> when did people forget how to run a baremetal server ?

Bigger question: When did people forget that doing that is much easier than AWS...

citizenpaul•5h ago

The "value add" of AWS has never been what it can do or does. It has always appealed to weak/incompetent/sociopathic managers and execs desire to not have to deal with capable employees.

As far as they are concerned AWS is taking care of computing AND hiring for them.

I've never worked anywhere that at least some sort of power holder would instantly go to consultants or outsourcing rather than in house because they believe that if you work for the company you must be incompetent, dumb or below average. If you don't work for them you must be exceptional.

jsight•2h ago

> I'm so surprised there is so much pushback against this..

Same, this trend towards "AWS all the things" has really amazed me.

We've all mocked small companies copying big companies by trying to make their app super-duper scalable from the very start. After all, everyone things they are the next google, despite their 5 total users right now.

But this is really the opposite. AWS is phenomenal for the startup that would readily trade high opex for lower capex. Servers aren't the cheapest things in the world to buy and they depreciate. It makes total sense for startups to start this way.

But why are big companies, with an actual budget for staff, copying the behavior of their favorite startups?

guax•1h ago

Opex looks nicer on the sheets than capex for large deployments. Incredible high investment from AWS on luring in C level with "white-papers" and promises of cost and governance magical revolutions. I've heard the promise of cheaper, faster where you can focus on "innovation". I am yet to see any of it become a reality.

mk89•1h ago

How would you do multi-region deployments with your own DC?

This is an issue for several companies that start small and within 5 years they find the need to expand abroad. Be it for data sovereignty or so, which is becoming more important than ever in the last 10 years.

Duplicating a region is "a few clicks away" on AWS. This is what the provider enables you to do.

This and a lot of other things. And for such things, yes, you gotta pay.

Hikikomori•1h ago

I mean its not that complicated. Rent space in another location, get separate fibers/wavelength between them, redundant internet connection.

But if you're in a growth/startup phase it doesn't make much sense to spend engineering time on this, not that multi region setups in Aws is one button either. Once you're past that and paying aws a million per week or so I think it can make sense to offload expensive services to your own hardware.

axegon_•1h ago

I am not - I hate AWS(and cloud in general) with a passion - overpriced, you are getting locked in by a closed ecosystem the moment you say "hey this feature is neat it will save me so much work", only to realize that you are stuck paying for it for years if you decide to move away from it. But people are inclined to jump on a hype train and become evangelists for life. Truth is AWS(or GCP or Azure or anything else) is a viable option in two cases:

1. You are making a product with 3 friends on evenings and you want to ship asap without having the capacity to invest and setup infrastructure. 2. You are a huge corporation with tens of thousands of employees and hardware needs that you simply cannot source yourself easily or sort out the collocation of the hardware.

Everyone else - get a dozen second-hand servers, shove them in a rack in a data center and you will own the hardware and everything associated with it at half the price of what you'd be paying AWS in a year.

seidleroni•10h ago

As someone who works with firmware, it is funny how different our definitions of "bare metal" is.

embedding-shape•10h ago

As someone who does material science, it's funny how our definition of "bare metal" is so different.

onionisafruit•9h ago

As someone who listens to loud rock and roll music …

amluto•6h ago

Ask an astronomer what a “metal” is.

andrewl-hn•10h ago

In similar way I once worked on a financial system, where a COBOL-powered mainframe was referred to as "Backend", and all other systems around it written in C++, Java, .NET, etc. since early 80s - as "Frontend".

embedding-shape•9h ago

Had somewhat similar experience, the first "frontend" I worked on was a sort of proxy server that sat in front of a database basically, meant as a barrier for other applications to communicate via. At one point we called the client side web application "frontend-frontend" as it was the frontend for the frontend.

pgwhalen•10h ago

I don't work in firmware at all, but I'm working next to a team now migrating an application from VMs to K8S, and they refer to the VMs as "bare metal" which I find slightly cringeworthy - but hey, whatever language works to communicate an idea.

ghaff•5h ago

I'm not sure I've ever heard bare metal used to refer to virtualized instances. (There were debates around Type 1 and Type 2 (hosted) hypervisors at one point but haven't heard that come up in years.

Joeboy•9h ago

Wikipedia still thinks it means the thing I (and presumably you) do.

https://en.wikipedia.org/wiki/Bare_metal

Edit: For clarity, wikipedia does also have pages with other meanings of "bare metal", including "bare metal server". The above link is what you get if you just look up "bare metal".

I do aim to be some combination of clear, accurate and succinct, but I very often seem to end up in these HN pissing matches so I suppose I'm doing something wrong. Possibly the mistake is just commenting on HN in itself.

embedding-shape•9h ago

Seems there is a difference between "Bare Metal" and "Bare Machine".

I'm not sure what you did, but when you go to that Wikipedia article, it redirects to "Bare Machine", and the article contents is about "Bare Machine". Clicking the link you have sends you to https://en.wikipedia.org/wiki/Bare_machine

So it seems like you almost intentionally shared the article that redirects, instead of linking to the proper page?

Joeboy•8h ago

I indeed deliberately pasted a link that shows what happens when you try to go to the Wikipedia page for "bare metal".

embedding-shape•8h ago

Right, slightly misleading though, as https://en.wikipedia.org/wiki/Bare-metal_server is a separate page.

Joeboy•8h ago

Yes, but if you look up "bare metal" it goes to the page about actual bare metal (aka "bare machines" or whatever).

Can we stop this now? Please?

embedding-shape•8h ago

> Yes, but if you look up "bare metal" it goes to the page about actual bare metal (or bare machines or whatever).

Fix it then, if you think it's incorrect. Otherwise, link to https://en.wikipedia.org/wiki/Bare_metal_(disambiguation) like any normal and charitable commentator would do.

> Can we stop this now? Please?

Sure, feel free to stop at any point you want to.

Joeboy•5h ago

There is nothing that needs fixing? Both my link and yours give the same "primary" definition for "bare metal". Which is not unequivocally the correct definition, but it's the one I and the person I was replying to favour.

I thought my link made the point a bit better. I think maybe you've misunderstood something about how Wikipedia works, or about what I'm saying, or something. Which is OK, but maybe you could try to be a bit more polite about it? Or charitable, to use your own word?

Edit: In case this part isn't obvious, Wikipedia redirects are managed by Wikipedia editors, just like the rest of Wikipedia. Where the redirect goes is as much an indication of the collective will of Wikipedia editors as eg. a disambiguation page. I don't decide where a request for the "bare metal" page goes, that's Wikipedia.

Edit2: Unless you're suggesting I edited the redirect page? The redirect looks to have been created in 2013, and hasn't been changed since.

cs702•10h ago

In the early days of cloud service providers, they offered a handful of high-value services, all at great prices, making them cost-competitive with bare metal but much easier. That was then.

Things today are different. As cloud service providers have grown to become dominant, they now offer a vast, complicated tangle of services, microservices, control panels, etc., at prices that can spiral out of control if you are not constantly on top of them, making bare metal cheaper for many use cases.

embedding-shape•10h ago

> they offered a handful of high-value services, all at great prices, making them cost-competitive with bare metal but much easier

That was never the case for AWS, the point was never "We're cheap" but "We let you scale faster for a premium".

I first came across cloud services around 2010-2011 I think, when the company I worked at at the time started growing and we needed something better than shared hosting. AWS was brought up as a "fresh but expensive" alternative, and the CTO managed to convince the management that we needed AWS even if it was expensive, because it'll be a lot easier to tear up/down servers as we need it. Bandwidth costs I think was the most expensive part of the package, at least back then.

When I look at what performance per $ you get with AWS et al today, it looks the same, incredibly expensive for the performance you (don't) get. Better off with dedicated instances unless you team is lacking the basic skills of server management, or until the company really grown so it keeps being difficult dealing with the infrastructure, then hire a dedicated person and let them make the calls for what's next.

everfrustrated•9h ago

I'd agree that AWS never sold on being cheaper, but there is one particular way AWS could be cheaper and that is their approach to billing-by-the-unit with no fixed costs or minimum charges.

Being able to start small from a $1/mth bill without any fixed cost overheads is incredibly powerful for small startups.

If I wanted to store bytes in a DC it would cost $10k/mth by the time I was paying colo/ servers/ disks before I stored my first byte. Sure there wouldn't be any incremental costs for the second byte but thats a steep jump. S3 would have cost me $0.02. Being able to try technology and prove concepts at the product development stage is very powerful and why AWS became not just a vendor but a _technology partner_ for many companies.

embedding-shape•8h ago

> Being able to start small from a $1/mth bill without any fixed cost overheads is incredibly powerful for small startups.

Yes, no doubt about it. Initially AWS was mostly sold as "You never know when you might want to scale fast, imagine being featured in a newspaper and your servers can't handle the load, you need cloud for that!" to growing startups, and in that context it kind of makes sense, pay extra but at least be online.

But initially when you're small, or later when you're big and establish, other things make more sense. But yes, I agree that if you need to aggressively be able to scale up or down, cloud resources make sense to use for that, in addition to your base infrastructure.

torginus•8h ago

But if AWS didn't have that anti-competitive data transfer fee that gets waived if your traffic goes to an internal server, why would you choose S3 vs a white-label storage vendor's similar offering?

cs702•7h ago

> the point was never "We're cheap" but "We let you scale faster for a premium"

Actually, it was more like "Scale faster, easier, more reliably, with proven hardware and software infrastructure, operated by a proven organization, at a price point that is competitive with the investment you'd have to make to get comparable hardware, software, and organizational infrastructure." But that was then. Today, things are different. Cloud services have become giant hairballs of complexity, with plenty of shoot-yourself-in-the-foot-by-default traps, at prices that can quickly spiral out of control if you're not on top of them.

JCM9•10h ago

This. When AWS was 10 solid core services it made sense and was exciting. It’s now a bloated mess of 200+ services (many of which almost nobody uses) with all that complexity starting to create headaches and cracks.

AWS needs to stop trying to have a half-arsed solution to every possible use case and instead focus on doing a few basic things really well.

genidoi•10h ago

Imo the fact that an "AWS Certified Solutions Architect" is yet another AWS service/thing that is attainable, via an actual exam[0] for $300, is indicative of just how intentionally bloated the entire system has become.

[0] https://aws.amazon.com/certification/certified-solutions-arc...

cmiles8•10h ago

Word on the street is that Amazon leadership basically agrees with this and recognizes things have gotten off course. AWS is a small number of things that make money and then a whole bunch of slop and bloat.

AWS was mostly spared from yesterday’s big cuts but have been told to “watch this space” in the new year after re:Invent.

jrochkind1•9h ago

(Real question, not meant to be sarcastic or challenging!) -- What are the challenges in trying to use just the ~10 core services you want/need and ignoring the others? What problems do the others you don't use cause with this use case?

whstl•8h ago

The early services were mostly self-contained.

A lot of newer stuff that actually scales (so Lightsail doesn't count) is entangled with "security", "observability" and "network" services. So if you just want to run EC2 + RDS today, you also have to deal with VPC, Subnets, IAM, KMS, CloudWatch, CloudTrail, etc.

Since security and logs are not optional, you have very limited choice.

Having that many required additional services means lots of hidden charges, complexity and problems. And you need a team if you're not doing small-scale stuff.

aaronax•7h ago

Costs have not dropped. Computing becomes cheaper over time, but AWS largely does not.

hinkley•7h ago

They used to release new ec2 sizes at the same price as the previous gen which made upgrading a no brainer. That stopped with m7 and doesn’t seem to be coming back.

Not sure what Amazon plans to do when the m6 hardware starts wearing out.

rossdavidh•8h ago

"Embrace, extend, extinguish". It was a Microsoft saying, but it explains Amazon's approach to Linux. Once your customers are skilled in how to do things on your platform, using your specialized products, they won't price-comparison (or compare in any other way) to competing options. Whether those countless other "half-arsed solutions" actually make money is beside the point; as long as the customer has baked at least one into their tech stack, they can't easily leave.

dumbledoren•30m ago

Likely the best comment in the thread: Microsoft couldnt kill Linux. But AWS did it by adding itself as a layer on top of Linux and literally taking control of the web that Linux liberated by taking over the entire server space in the mid-2000s.

hinkley•7h ago

I don’t think I’ve seen a menu as hilariously bad as the AWS dashboard menu. No popup menu should consume the entire screen edge to edge. Just a wall of cryptic service names with ambiguous icons.

rco8786•10h ago

Anytime I have to go into the AWS control panel (which is often) I am immediately overwhelmed with a sense of dread. It's just the most bloated overcomplicated thing I could possibly imagine.

antonkochubey•10h ago

You're lucky not to have dealt with Azure and GCP control panels, in that case :-)

the_duke•6h ago

GCP is pretty good though, considering the complexity.

Azure is ... a different story...

rob74•10h ago

...while on the other side, the "traditional" hosting/colocation providers feel the squeeze and have to offer more competitive prices to stay in business?

__alexs•9h ago

AFAICT no AWS service has ever had a price increase. This is nonsense.

raincole•9h ago

Cloud has been generally getting cheaper if you take inflation into account. But hating AWS is the fad so...

array_key_first•4h ago

Cloud is literally never cheaper, especially if you perform benchmarks.

Yes, EC2 might seem to be only 2.5 times the cost of storage... Except that, even if you buy the high speed storage, it's going to be 10x - 100x slower than bare metal. Which then means you can buy much slower drives, if you wanted to, and save a shit ton of money.

torginus•8h ago

Considering you get exponentially more compute/hardware for the same money every 2 years or so, they haven't been getting that much cheaper.

__alexs•6h ago

Every generation of CPU has cost more than the last one for years now.

dgemm•3h ago

This is the right take - there is a huge variation in "value per dollar" across AWS services. The base ones that solve hard problems like durable persistent state can be very much worth it. They tend to be the older ones.

doctorpangloss•10h ago

Microk8s has common, catastrophic performance bugs. There are also catastrophic problems with microk8s Ceph addons. So is this post true? Microk8s, for people who know stuff, is a canary for clusters / applications that don’t really work.

ndhandala•10h ago

We havent found those bugs in our cluster, but we're also moving to Talos (but for diff reasons)

acejam•5h ago

Source? Links?

jammo•10h ago

Equinix Metal is now EOL, so worth bearing that in mind..

darkwater•10h ago

The core of this success is this, IMO:

  > Our workload is 24/7 steady. We were already at >90% reservation coverage; there was no idle burst capacity to “right size” away. If we had the kind of bursty compute profile many commenters referenced, the choice would be different.

Which TBH applies to many, many places, even if they are not aware of it.

marcinzm•10h ago

I'd say the core of their success is running everything in a single rack in a single datacenter at first (for months? a year?) and getting lucky. Life is simple when you don't need the costs and effort of reliability upfront.

darkwater•9h ago

They mention having a second half-rack in a different DC.

In any case, not everyone need five nines, and usually it's just much easier to bring down a platform due to some bug in your own software rather that the core infrastructure going down at a rack level.

sceptic123•7h ago

The point is valid, they mention adding that, so at one point they didn't have that. They're also only storing monitoring & observability data, that's never going to be mission critical for their customers.

It's probably the main reason why they were able to get away with this and why their application does not need scalability. I see they themselves are only offering two 9s of uptime.

Hardwired8976•5h ago

They mentioned having a backup AWS cluster that would spin up when something happens.

jdsully•2h ago

Even if you have that you'll find AWS is "out of stock" and wants you to create reservations that essentially cost the same as just having the machine 24/7.

Nemo_bis•1h ago

Reminds me of https://www.specbranch.com/posts/one-big-server/

dumbledoren•24m ago

Nah. They could have just overprovisioned to hell for much cheaper. Boxes at Hetzner cost up to 10 times less than equal level of AWS compute. Just overprovision for cheaper. You have to overprovision on the cloud anyway - you cant risk your users waiting 1-2 minutes until your new nodes/pods come up. So 'cloud is good for spiky load' argument is just a lie we tell ourselves.

cornfieldlabs•10h ago

> Equinix Metal got the closest, but bare metal on-demand still carried a 25-30% premium over our CapEx plan. Their global footprint is tempting; we may still use them for short-lived expansion.

> The Equinix Metal service will be sunset on June 30, 2026.

https://docs.equinix.com/metal/

mythz•10h ago

Several years off AWS, the only thing I still prefer AWS for is SES, otherwise Cloudflare has the more cost effective managed services. For everything else we use Hetzner US Cloud VMs for hosting all App Servers and Server Software.

Our .NET Apps are still deployed as Docker Compose Apps which we use GitHub Actions and Kamal [1] to deploy. Most Apps use SQLite + Litestream with real-time replication to R2, but have switched to a local PostgreSQL for our Latest App with regular backups to R2.

Thanks to AI that can walk you through any hurdle and create whatever deployment, backup and automation scripts you need, it's never been easier to self-host.

[1] https://docs.servicestack.net/kamal-deploy

sondr3•10h ago

> Cloud makes sense when elasticity matters; bare metal wins when baseload dominates.

This really is the crux of the matter in my opinion, at least for applications (databases and so on is in my opinion more nuanced). I've only worked at one place where using cloud functions made sense (keeping it somewhat vague here): data ingestion from stations that could be EXTREMELY bursty. Usually we got data from the stations at roughly midnight every day, nothing a regular server couldn't handle, but occasionally a station would come back online after weeks or new stations got connected etc which produced incredible load for a very short amount of time when we fetched, parsed and handled each packet. Instead of queuing things for ages we could instead just horizontally scale it out to handle the pressure.

marcinzm•10h ago

They were running for a long time (months? over a year?) on a single rack in a single datacenter. Eventually they scaled out but the word is eventually. I think that summarizes both sides of this debate in a nutshell. You can move off of AWS but unless you invest a lot you will take on increased risk. Maybe you'll get lucky and your one rack won't burn down. Maybe you won't. They did get lucky.

athrowaway3z•9h ago

From the story, they seem to have kept the option to fallback on AWS.

gizzlon•8h ago

Hm.. I wonder what the risk of a rack going offline is? Maybe 5% in a given year? Less? More?

Compared to all the other things that can and will go wrong, this risk seems pretty small, but I have no data to back that up.

shakow•6h ago

> Maybe you'll get lucky and your one rack won't burn down

Given the rates of fires in DCs, you'd rather need to be quite unlucky for it to happen to you.

cornfieldlabs•10h ago

Managed DB costs a lot.

Is there a simple safe setup that we can run on an Ubuntu server?

We self-host the Postgres db with frequent backups to s3 but just in case the site takes off, we need an affordable reliable solution.

Does anyone here run their own db servers? Any advise?

Backups, security, upgrades etc

lofties•10h ago

I love the argument that Managed DBs cost a lot, but they're supposedly safer. Meanwhile people can't figure out the IAM permission models so they give the entire world access with root:root.

ndhandala•10h ago

If you're running k8s cluster. Check out cloudnative pg. That thing is a beast.

cornfieldlabs•9h ago

We have hosted on everything on a tiny Hetzner. The site barely has any users apart from our friends:) :(

Info noted

vpShane•8h ago

Worth checking out the different server hosts. You can get a cheap OVH server with 64GB of RAM, 4-6cores with 2TB of disk space from OVH for $30, better servers for $70 with 1gbps - 2gbps bandwidth.

Setting up a DB isn't hard, using an LLM to ask questions will guide you to the right places. I'm always talking with Gemini because I switched from Ubuntu to Fedora 42 server and things are slightly different here and there.

But, different server hosts offer DB-ready OS's so all you have to do is load the OS on the server and you'll be ready to go.

The joy of Linux is getting everything _just right_ and so much _just right_ that you can launch a second server and set it up that way _just right_ within minutes.

film42•3h ago

Maybe look at R2 or Wasabi instead of S3. That would cut your storage bill by 3x and take your cloud network bill to zero. IMO self-managing DBs always sucks no matter what you do.

blindriver•10h ago

Have they done a complete failover to their second data center? It wasn’t clear how committed of a failover it was during the tests.

aeve890•9h ago

>We're now moving to Talos. We PXE boot with Tinkerbell, image with Talos, manage configs through Flux and Terraform, and run conformance suites before each Kubernetes upgrade.

Gee, how hard is to find SE experts in that particular combination of available ops tools? While in AWS every AWS certified engineer would speak the same language, the DIY approach surely suffers from the lack of "one way" to do things. Change Flux with Argo for example (assuming the post is talking about that Flex and no another tool with the same name), and you have a almost completely different gitops workflow. How do they manage to settle with a specific set of tools?

zppln•9h ago

If you're that much of a slave to your tool chain you don't get to call yourself an engineer.

film42•3h ago

Or you have PTSD after 10 years of being on-call 24/7 for your company's stack. I've built my next chapter around offloading the pager. Worth every penny.

63stack•9h ago

Argocd and flux are "almost completely different"? The last time I looked was about a year ago, and there seemed to be only minor differences.

What are the major differences?

amluto•6h ago

I would not want to hire an engineer who claimed to be proficient with any cloud Kubernetes stack but couldn’t learn Talos in a week.

ecshafer•9h ago

AWS is extremely expensive, and I think I have to agree with DHH's assessment that many developers are afraid of computers. AWS is taking advantage of that fear of actually just setting up linux and configuring a computer.

However to steelman AWS use. Many businesses are STILL running mainframes. Many run terrible setups like Access as a production database. In 2025 there are large companies with no CICD platforms or IAC, and some companies where even VC is still a new concept or a dark art. So not every company is in the position to actually hire competent system administrators and system engineers to set up some bare metal machines and configure Ceph, much less Hadoop or Kubernetes. So AWS lets these companies just buy this capabilities while forcing the software stack to modernize.

faxmeyourcode•8h ago

I worked at a company like this, I was an intern with wide eyes seeing the migration to git via bitbucket in the year ... 2018? What a sight to see.

That company had its own data center, tape archives, etc. It had been running largely the same way continuously since the 90s. When I left for a better job, the company had split into two camps. The old curmudgeonly on-prem activists and the over-optimistic cloud native AWS/GCP certified evangelist with no real experience in the cloud (because they worked at a company with no cloud presence). I'm humble enough to admit that I was part of the second camp and I didn't know shit, I was cargo culting.

This migration is still not complete as far as I'm aware. Hopefully the teams that resisted this long and never left for the cloud get to settle in for another decade of on-prem superiority lol.

ecshafer•7h ago

I was a at a company that was doing their SVN/Jenkins migration to Git/Bitbucket/Bamboo around 2016/2018. But they were using source control and a build system already, so you have to hand it to them. But I have an associate that was at one of the large health insurance companies in 2024, complaining that he couldn't get them to use git and stop deploying via FTP to a server. There is danger with being too much on the cargo cult side, but also danger with being too resistant to change. I don't know how you can look at source control, a CICD pipeline, artifacts, IaC, and say "This looks like a bad idea".

iLoveOncall•9h ago

This is a completely meaningless article if they don't provide information about their technical stack, which AWS services they used to use, what TPS they are hitting, what storage size they're using, etc.

The story will be different for every business because every business has different needs.

Given the answer to "How much did migration and ongoing ops really cost?" it seems like they had an incredibly simple infrastructure on AWS, and it was really easy to move out. If you use a wider-range of services the cost savings are much more likely to cancel themselves.

globular-toast•8h ago

TFA begins with a link to the original article with those details.

iLoveOncall•7h ago

If you called "We used EKS" details, then yeah they provide those details.

Assuming this is indeed all they used, this was admittedly nonsense, they were essentially using cloud-based bare-metal.

tuhgdetzhh•9h ago

Quite recently I made a TCO analysis between AWS and bare metal Hetzner including salary. https://beuke.org/hetzner-aws/

TYPE_FASTER•9h ago

> It depends on your workload.

Very much this.

Small team in a large company who has an enterprise agreement (discount) with a cloud provider? The cloud can be very empowering, in that teams who own their infra in the cloud can make changes that benefit the product in a fraction of the time it would take to work those changes through the org on prem. This depends on having a team that has enough of an understanding of database, network and systems administration to own their infrastructure. If you have more than one team like this, it also pays to have a central cloud enablement team who provides common config and controls to make sure teams have room to work without accidentally overrunning a budget or creating a potential security vulnerability.

Startup who wants to be able to scale? You can start in the cloud without tying yourself to the cloud or a provider if you are really careful. Or, at least design your system architecture in such a way that you can migrate in the future if/when it makes sense.

mr_toad•9h ago

This is a tech company and it’s adjacent to their core competency. Most companies wouldn’t know MicroK8s from a brand of cereal, they’d only create a mess if they tried this themselves.

gizzlon•8h ago

Sure, but they also create a mess in AWS

stuff4ben•9h ago

Never heard of Talos before now. That looks pretty cool and I might start playing with that on my home lab. Can't use it at work for reasons, but good to keep on top of tech (even if I am a little behind)

globular-toast•8h ago

This dude did a complete walkthrough setting up a Talos cluster on bare metal: https://datavirke.dk/posts/bare-metal-kubernetes-part-1-talo... It's a nice read. I have my own Talos cluster running in my homelab now for over a year with similar stuff (but no Ceph).

roschdal•9h ago

Bare metal is the best metal.

Aldipower•7h ago

Never ever. True metal it is!

ksec•9h ago

Many other points. When the Cloud Started, they offered great value in adjacent product and services. Scaling was painful, getting bare metal hardware have long lead time, provisioning takes time. DC was not of as high quality, Network wasn't as redundant. A lot of these today are much less of an issue.

In 2010 you could only get 64 Core Xeon CPU coming in 8 Sockets, or maximum or 8 Core per socket. And that is ignoring NUMA issues. Today you could get 256 Core per socket that is at least twice as fast per core. What used to be 64 Server could now be fitted into 1. And by 2030, it would be closer to 100 to 1 ratio. Not to mention Software on Server has gotten a lot faster compared to 2010. PHP, Python, Ruby, Java, ASP or even Perl. If we added up everything I wouldn't be surprised we are 200 or 300 to 1 ratio compared to 2010.

I am pretty sure there is some version of Oxide in the pipeline that will catch up to latest Zen CPU Core. If a server isn't enough, a few Oxide Rack should fit 99% of Internet companies usage.

submeta•9h ago

There is so much hidden cost in maintaining your own bare metal infrastructure. I am always astounded by how people overlook the massive opportunity cost involved in not only setting up, securing, and maintaining your bare metal infrastructure, but also make it state of the art, including best practices, making sure you have required uptime, monitoring and intervening if necessary. - I work in a highly regulated market with 700 coworkers, our IT maintains an endless amount of VMs. And you cannot imagine how much more work they have to do compared to a setup where you spin up services in AWS or Azure. And destroy it when you don’t need it. No updates, no patches. No misconfiguration. Not every company uses automation either (chef, ansible and whatnot)

saxenaabhi•8h ago

I agree, I have a restaurant POS system and I think self-hosting would easily kill the product velocity, and if we screw up bad, even the company.

However, I do get the point about cost-premium and more importantly vendor-risk that's paid when using managed services.

We are hosted on cloudflare workers which is very cheap, but to mitigate the vendor risk we have also setup up replicas of our api servers on bunny.net and render.com.

pingoo101010•9h ago

Many startups and companies couldn't exist if there was only AWS (or GCP / Azure) due to how much they overcharge.

For example, we couldn't offer free GeoIP downloads[0] if we were charged the outrageous $0.09 / GB, and the same is true for companies serving AI models or game assets.

But what makes me almost sick is how slow is the cloud. From network-attached disks to overcrowded CPUs, everything is so slooooow.

My experience is that the cloud is a good thing between 0-10,000 $ / month. But you should seriously consider renting bare-metal servers or owning your own after that. You can "over-provision" as much as you want when you get 10-20x (real numbers) the performance for 25% of the price.

[0] https://downloads.pingoo.io

hedora•7h ago

I’ve seen cloud slowness create weird Stockholm syndrome effects, especially around disk latency.

It always makes sense to compare to back of the envelope bare metal numbers before rearchitecting your stack to work around some dumb cloud performance issue.

ed_mercer•9h ago

Talos is great until it's not. We ran into Ceph IO speed bottlenecks and found it was impossible to debug ("talosctl cgroups —preset=io" is a mess) because the devs didn't want to add an SSH escape hatch into their black box OS. Our Talos nodes would also randomly become unhealthy and you have no way of knowing why. Switched to PXE booted Alpine linux with vanille k8s, and we had a much more stable experience with no surprises, and the ability to SSH whenever we want has been hugely helpful.

dev_l1x_be•9h ago

> AWS is extremely expensive.

I really like how people throw around these baseless accusations.

S3 is one of the cheapest storage solutions ever created. The last 10 years I have migrated roughly 10-20PB worth of data to AWS S3 and it resulted in significant cost saving every single time.

If you do not know how to use cloud computing than yes, AWS can be really expensive.

Aurornis•8h ago

The implicit claims are more misleading, in my opinion: The claim that self-hosting is free or nearly free in terms of time and engineering brain drain.

The real cost of self-hosting, in my direct experience with multiple startup teams trying it, is the endless small tasks, decisions, debates, and little changes that add up over time to more overhead than anyone would have expected. Everyone thinks it’s going to be as simple as having the colo put the boxes in the rack and then doing some SSH stuff, then you’re free of those AWS bills. In my experience it’s a Pandora’s box of tiny little tasks, decisions, debates, and “one more thing” small changes and overhauls that add up to a drain on the team after the honeymoon period is over.

If you’re a stable business with engineers sitting idle that could be the right choice. For most startups who just need to get a product out there and get customers, pulling limited headcount away from the core product to save pennies (relatively speaking) on a potential AWS bill can be a trap.

marcosdumay•7h ago

> The claim that self-hosting is free or nearly free in terms of time and engineering brain drain.

Free? No, it's not free. It only costs less engineering time than AWS.

MontyCarloHall•8h ago

Assuming those 20PB are hot/warm storage, S3 costs roughly $0.015/GB/month (50:50 average of S3 standard/infrequent access). That comes out to roughly $3.6M/year, before taking into account egress/retrieval costs. Does it really cost that much to maintain your own 20PB storage cluster?

If those 20PB are deep archive, the S3 Glacier bill comes out to around $235k/year, which also seems ludicrous: it does not cost six figures a year to maintain your own tape archive. That's the equivalent of a full-time sysadmin (~$150k/year) plus $100k in hardware amortization/overhead.

The real advantage of S3 here is flexibility and ease-of-use. It's trivial to migrate objects between storage classes, and trivial to get efficient access to any S3 object anywhere in the world. Avoiding the headache of rolling this functionality yourself could well be worth $3.6M/year, but if this flexibility is not necessary, I doubt S3 is cheaper in any sense of the word.

torginus•7h ago

How the heck does anyone have that much data? I once built myself a compressed plaintext library from one of those data-hoarder sources that had almost every fiction book in existence, and that was like 4TB compressed (but would've been much less if I bothered hunting for duplicates and dropped non-English).

I suspect the only way you could have 20PB is if you have metrics you don't aggregate or keep ancient logs (why do you need to know your auth service had a transient timeout a year ago?)

MontyCarloHall•7h ago

Lots of things can get to that much data, especially in aggregate. Off the top of my head: video/image hosting, scientific applications (genomics, high energy physics, the latter of which can generate PBs of data in a single experiment), finance (granular historic market/order data), etc.

geoka9•7h ago

In addition to what others have mentioned, before the "AI bubble", there was a "data science bubble" where every little signal about your users/everything had to be saved so that it could be analyzed later.

matwood•7h ago

Like most of AWS, it depends if you need what it provides. A 20PB tape system will have an initial cost in the low to mid 6 figures for the hardware and initial set of tapes. Do the copies need to be replicated geographically? What about completely offline copies? Reminds me of conversations with archivists where there's preservation and then there's real preservation.

spprashant•9h ago

The thing I find counter intuitive about AWS and hyper-scalers in general is, they make so much sense when you are starting out a new project. A few VMs, some gigs of data storage, you are off to the races in a day or two.

As soon as you start talking about any kind of serious data storage and data transfer the costs start piling up like crazy.

Like in my mind, the cost curve should flatten out over time. But that just doesn't seem to be the reality.

shadowgovt•8h ago

Sounds like they did the right thing for their business model.

I think as AWS grows and changes the curve of the target audience is changing too. The value proposition is "You can get Cloud service without having a dedicated Cloud team," but there are caveats:

- AWS is complicated enough that you will still need a team to integrate against it. The abstractions are not free and the ones that are leaky will bite you without dedicated systems engineers to specialize in making it work with your company's goals.

- For small companies with little compute need, AWS is a good option. Beyond a certain scale... It is worth noting that big companies build their own datacenters, they don't rely on someone else's Cloud. Amazon, Google, and Microsoft don't run on each other.

- Recently, the cost model has likely changed if a company pokes their head up and runs the numbers, there's, uh, quite a few engineers with deep knowledge of how to build a scalable cloud infrastructure available to hire now for some reason. In fact, a savvy company keeping its ear to the ground can probably snap up some high-tier talent very soon (https://www.reuters.com/business/world-at-work/amazon-target...).

It really depends on where your company's risk and cost models are. Running on someone else's cloud just isn't the only option.

yanslookup•8h ago

FD: I work at Amazon, I also started my career in a time where I had to submit paper requests for servers that had turn around times measured in months.

I just don't see it. Given the nature of the services they offer it's just too risky not to use as much managed stuff with SLAs as possible. k8s alone is a very complicated control plane + a freaking database that is hard to keep happy if it's not completely static. In a prior life I went very deep on k8s, including self managing clusters and it's just too fragile, I literally had to contribute patches to etcd and I'm not a db engineer. I kept reading the post and seeing future failure point after future failure point.

The other aspect is there doesn't seem to be an honest assessment of the tradeoffs. It's all peaches and cream, no downsides, no tradeoffs, no risk assessment etc.

AndroTux•8h ago

Managing a complex environment is hard, no matter whether that’s deployed on AWS or on prem. You always need skilled workers. On one platform you need k8s experts. On the other platform you need AWS experts. Let’s not pretend like AWS is a simple one-click fire and forget solution.

And let’s be very real here: if your cloud service goes down for a few hours because you screwed something up, or because AWS deployed some bad DNS rules again, the world moves on. At the end of the day, nobody gives a shit.

yanslookup•2h ago

Maybe I've drank the koolaid but I've done both a lot of systems level work and AWS work (I don't actually use any AWS stuff in my role here interestingly) and I think for a business that needs a handful of hosts in 2 AZs I can't imagine the ROI and risk profile being better to self host.

AWS truly does let you focus on your business logic and abstracts a TON of undifferentiated work and well beyond the low hanging fruit of system updates and load balancing.

I guess put another way, providing a SaaS you need to have an SLA, those SLAs flow from SLO and SLIs and ultimately a risk profile of your hw and sw. The risk of a bad HBA alone probably means a day of downtime if you don't do things perfectly. AWS has bad HBAs, CPUs, memory, disks etc all day long every day and it's not even a blip for customers, never mind downtime. And if you don't model bad HBAs in your SLAs then your board is going to be pissed when that outage inevitably happens.

Now if you don't have SLAs and you like sysops, networkops, clusterops, dbops work then sure, YOLO.

hedora•7h ago

At another big-4 hyperscaler, we ended up with substantial downtime and a lossy migration because they didn’t know how to manage kubernetes.

Microk8s doesn’t use etcd (they have their own, simpler thing), which seems like a good tradeoff at single rack scale: https://benbrougher.tech/posts/microk8s-6-months-later/

The article’s deployment has a spare rack in a second DC and they do a monthly cutover to AWS in case the colo provider has a two site issue.

Spending time on that would make me sleep much better than hardening a deployment of etcd running inside a single point of failure.

What other problems do you see with the article? (Their monthly time estimates seem too low to me - they’re all 10x better than I’ve seen for well-run public cloud infrastructure that is comparable to their setup).

yearolinuxdsktp•6h ago

I agree that a business should use Kubernetes only if there is a clear need for that level of infrastructure automation. It's a time and money mistake to use K8s by default.

dumbledoren•22m ago

Variants like k3 are not as complicated and problematic as k8.

thelastgallon•8h ago

These are the features that AWS provides

(1) Massive expansion of budget (100 - 1000x) to support empire building. Instead of one minimum-wage sysadmin with 2 high-availability, maxed-out servers for 20K - 40K (and 4-hour response time from Dell/HPE), you can have 100M multi-cloud Kubernetes + Lambda + a mix-and-match of various locked-in cloud services (DB, etc.). And you can have a large army of SRE/DevOps. You get power and influence as a VP of Cloud this and that and 300 - 1000 people reporting to you.

(2) OpEx instead of CapEx

(3) All leaders are completely clueless about hiring the right people in tech. They hire their incompetent buddies who hire their cronies. Data centers can run at scale with 5-10 good people. However, they hire 3000 horrible, incompetent, and toxic people, and they build lots of paperwork, bureaucracy, and approvals around it. Before AWS, it was VMware's internal cloud that ran most companies. Getting bare metal or a VM will take months to years, and many, many meetings and escalations. With AWS, here is my credit card, pls gimme 2 Vms is the biggest feature.

torginus•8h ago

The problem with those 5 people, is you can't hire a 6th - your stack is custom and probably even if you find the guy, he'll need months of ramp-up.

In contrast, you could throw a stone into a bush and hit an AWS guy.

rikafurude21•6h ago

If your 6th needs months to understand how the basic blocks in your system are arranged then he might not be one of the "good" guys

torginus•5h ago

Not really a hardcore infra guy, but on the coding side, I know companies with products that have codebases in the multi million LoC range written over decades, one of my friends interned there and told me they didn't even let him work on the core product for months, they put him on some custom testing framework they had for it, just so he could get familiar enough with the core code to be able to contribute meaningfully.

He told me that before they started doing that, there were incidents like teams writing entire modules they didn't know already existed - now there were 2 pieces of code doing basically the same thing, that were just incompatible enough to not be possible to merge them.

thelastgallon•4h ago

And how does AWS help with this?

torginus•12m ago

On the infra side - by standardizing things.

One time, in on prem, we had a custom setup with a machine running half the services we used, including a reverse proxy using haproxy with some custom Lua scripts for routing, a fileserver using lighttpd, some docker compose stuff, a stateless query thingy running on nodejs, etc.

We needed to change something, and the guy who wrote it left a year ago and we had to reverse engineer the stuff he did (some of it was quite questionable).

We weren't entirely successful and had to rewrite some stuff. I'm not saying how it was done wasn't clever or cost efficient, but damn if it was done on AWS, I probably would've known where to look for stuff (and so would've most of my colleagues).

thelastgallon•1h ago

Why would you need more people? Don't treat the 5 people like shit.

dilyevsky•5h ago

You say cloud allows massive expansion like it's a negative but it can be boon for a pre-pmf startup or a scaleup. You simply don't have to worry much about capacity planning in cloud and that can be a huge time/effort saver.

Sure, if you're only growing <30% YoY and already paying several millions for cloud and bandwidth/storage are large fraction of that, by staying in cloud you're proving your incompetence as an engineering org.

Slothrop99•2h ago

> one minimum-wage sysadmin

The internet assures me there are loads of these underemployed Unix/networking experts just sitting around waiting to set up your infrastructure. But in my experience, these people are actually really difficult to hire, and not at all cheap. (Possibly the sharp ones have 'sold out' and gone the SRE route and are now one of those '3000' people.)

So I wonder if there's a certain amount of wishful thinking on both sides here, like "I wish a 'clueful' company would hire me to be their head sysadmin...", while companies who have tried to do this on the cheap usually just have terrible ops. ("Whoops, the backups haven't worked in 2 years...")

agoodusername63•1h ago

Yeah I'm one of them. Started as an on-site Linux sysadmin. Moved to cloud SRE because remote is plentiful and it pays better.

I get crap recruiters in my inbox and LinkedIn every other week with the worst offers to go back to on-site bare metal admin. 30% less pay, on-site requirements, and it's a contracted position?

I need that Futurama "oh you're serious, let me laugh harder" gif

If companies want to whine that good Linux datacenter ops doesn't exist anymore, laugh in their faces.

Sohcahtoa82•1h ago

> (2) OpEx instead of CapEx

Someone please explain to me why this matters. I'd think that expenditures are expenditures, and that if the outright purchase of hardware would see an RoI compared to renting it in the cloud in under a year, it'd be a no-brainer to just buy the hardware.

guax•49m ago

OpEx means that if demand for your service goes down, cost goes down, your hardware does not become a capital liability since it depreciate fast. Way easier to justify changes to it too, you don't need a purchase project to get new instances, you're already "approved" and the contract was already signed with fluctuating costs. Needs more hardware? press a button, no need to research vendors, get contract negotiations in place.

AWS makes the life of finance and leadership a lot easier because they spend a lot of money justifying their superiority in ways that you don't have to think too hard to use and be taken seriously. They're to CTOs what think tanks and lobbyist are for lawmakers.

"No one got fired for buying ibm" for the new era.

There is a lot of truth in AWS propaganda, they're great for many things. But some of it is built on lies, cost being one, performance another.

rglover•8h ago

I really dislike how this industry oscillates between various states of epiphany that things that are overcomplicated and expensive are overcomplicated and expensive. As an industry, we must look like utter clowns to the world. It's really sad that saying "own or control your own servers" seems to be a sword in the stone moment for far more people than it should. Things that used to be a "duh" are now a "wow" and it's deeply unsettling to watch.

dimitrios1•8h ago

One thing I can say definitively, as someone who is definitely not an AI zealot (more of an AI pragmatist): GPT language models have reduced the barrier of running your own bare metal server. AWS salesfolk have long often used the boogeyman of the costs (opportunity, actual, maintenance) of running your own server as the reason you should pick AWS (not realizing you are trading one set of boogeymen for another), but AI has reduced a lot of that burden.

sema4hacker•8h ago

Anycast, Argo Rollouts, Aurora Serverless, AWS, BGP, Ceph, ClickHouse, Cloudflare, CloudFront, DWDM, Flux, Frankfurt, Glacier, Helm, Kinesis, Kubernetes, Metabase, MicroK8s, NVMe, OneUptime, OpenTelemetry Collector, Paris, Postgres, Posthog, PXE, Redis, Step Functions, Supermicro, Talos, Terraform, Tinkerbell, VM's.

I wish you started out by telling me how many customers you have to serve, how many transactions they generate, how much I/O there is.

rossdavidh•8h ago

I had a problem figuring out why the place I was working wanted to move from in-house to AWS; their workload was easily handled by a few servers, they had no big bursts of traffic, and they didn't need any of the specialized features of AWS.

Eventually, I realized that it was because the devs wanted to put "AWS" on their resumes. I wondered how long it would take management to catch on that they were being used as a place to spruce up your resume before moving on to catch bigger fish.

But not long after, I realized that the management was doing the same thing. "Led a team migration to AWS" looked good on their resume, also, and they also intended to move on/up. Shortly after I left, the place got bought and the building it was in is empty now.

I wonder, now that Amazon is having layoffs and Big Tech generally is not as many people's target employer, will "migrated off of AWS to in-house servers" be what devs (and management) want on their resume?

whstl•7h ago

Devs wanting to put AWS on their resume push for it, then the next wave you hire only knows AWS.

And then discussions on how to move forward are held between people that only know AWS and people who want to use other stuff, but only one side is transparent about it.

ahel•6h ago

with "dev wanting X" nothing happens. "leadership deciding X" then it needs to get done.

hedora•8h ago

Reason to use AWS from the article:

> You do not have the appetite to build a platform team comfortable with Kubernetes, Ceph, observability, and incident response.

Has work been using AWS wrong? Other than Ceph, all those things add up to onerous half time jobs for rotating software engineers.

Before gp3 came out, working around EBS price/performance terribleness was also on the list.

electroly•7h ago

I put our company onto a hybrid AWS-colocation setup to attempt to get the best of both worlds. We have cheap fiddly/bursty things and expensive stable things and nothing in between. Obviously, put the fiddly/bursty things in AWS and put the stable things in colocation. Direct Connect keeps latency and egress costs down; we are 1 millisecond away from us-east-1 and for egress we pay 2¢/GB instead of the regular 9¢/GB. The database is on the colo side so database-to-AWS reads are all free ingress instead of egress, and database-to-server traffic on the colo side doesn't transit to AWS at all. The savings on the HA pair of SQL Server instances is shocking and pays for the entire colo setup, and then some. I'm surprised hybrids are not more common. We are able to manage it with our existing (small) staff, and in absolute terms we don't spend much time on it--that was the point of putting the fiddly stuff in AWS.

The biggest downside I see? We had to sign a 3 year contract with the colocation facility up front, and any time we want to change something they want a new commitment. On AWS you don't commit to spending until after you've got it working, and even then it's your choice.

jcalvinowens•7h ago

I have seen multiple startups paying thousands of dollars a month in AWS bills to run a tiny service which could trivially run on an $800 desktop on a residential internet connection. It's absolutely tragic.

hedora•7h ago

That’s like $24K a year. Assuming they have working failover and business continuity plans, it’s actually a really good deal (vs having a 10-20% time employee deal with it).

whstl•6h ago

AWS doesn't get magically expensive just because you put your website there.

You don't get to an overcomplicated AWS madness without having a few engineers already pushing complexity.

And an overcomplicated setup also means it needs maintenance. There are no personnel savings there.

hedora•4h ago

For one VM, EBS with backups gives you business continuity.

You could get manual failover with a single writer replicated managed Postgres setup and a warm VM.

That’s on the order of a thousand a month for a medium workload. It’s probably a 10x markup vs buying the servers, but it doesn’t matter if it saves an employee.

whstl•4h ago

It doesn’t save employees. Over-complicated infrastructure doesn’t magically appear out of nowhere. Someone has to setup and maintain. It’s expensive.

debarshri•7h ago

Recently i learned that orgs these days want to show software and infrastructure spend as capex as they can shown it as depreciating asset for tax purposes.

I understand that with AWS you cannot do that as it is often seem as opex.

I guess thats a good enough motivation to move out of AWS at scale.

kyledrake•7h ago

The article mentions Equinix Metal but if you look it up they are shutting down the service https://docs.equinix.com/metal/hardware/standard-servers

Doesn't make me want to be a Equinix customer when they just randomly shut down critical hosting services.

I'm pretty sure that it's just the post-merger name for Packet which was an incredible provider that even had BYO IP with an anycast community. Really a shame that it went away, it was a solid alternative to both AWS and bare metal and prices were pretty good.

There's a missing middle between ultra expensive/weird cloud and cheap junk servers that I would really love to see get filled.

dilyevsky•6h ago

Fwiw equinix metal was an acquisition (Packet). Seems like it didnt go too well

jameson•6h ago

Curious to know how's the development experience been post-migration? Was there additional friction due to lack of tooling in on-prem that would otherwise available in the cloud env for example?

carlgreene•6h ago

Ok so this may be a dumb question...but now do you handle ISP outages due to storms and stuff with on prem solutions? I'd imagine large datacenters have much more sophisticated and reliable internet connections than say an Xfinity business customer, but maybe that's wrong.

neuronflux•6h ago

Much more sophisticated and reliable than Xfinity.

Good datacenters have redundant and physically separated power and communication from different providers.

Also, in case something catastrophic happens at one datacenter, the author mentions they are peered to another datacenter in a different country, as another layer of redundancy. Cloudflare handles their ingress, so such a catastrophic event wouldn't likely to be noticed by their customers.

flufluflufluffy•6h ago

Right! I can’t believe they decided to ditch the OS entirely and maintained availability like that!

yearolinuxdsktp•6h ago

Running EKS on AWS was their problem. If they didn't run EKS on AWS, they would've had a considerably simpler setup running Amazon Linux, not having to upgrade Kubernetes every 3 quarters, managing network security using security groups instead of having open internal networking, and running in a single AZ would've eliminated intra-AZ costs. In large data centers like us-east-1, an individual AZ is actually internally striped for extra redundancy, and you are much more likely to experience regional downtime than single AZ downtime, especially if you have a stable workload and do not rely on tech beyond rock-solid basics (EC2, VPC, ELB, S3, EBS). If you're willing to operate a single bare metal rack in a DC, you should be willing to run in a single AWS AZ.

I don't know how much time they spend configuring/dealing with Kubernetes, but I bet it's a large chunk of the 24 hour engineer-hours per quarter. But this is not a required expense: "EKS had an extra $1,260/month control-plane fee". Running EKS adds a massive IAM policy maintenance overhead, whereas a non-EKS (EC2 w/ golden AMIs) setup results in drastically simpler IAM policies.

NAT gateways are ~$50 a month, plus data transfer. Setting up a gateway VPC endpoint to S3 will avoid having to pay transfer charges to S3.

They were at 90% reservation capacity, so they should be using reservations for greater savings and in fact, running stable workloads with reservations is something that AWS excels at. Reservation means that you will be able to terminate and re-launch instances even when there's a spike in demand from other users--your instance capacity is guaranteed.

Running the basics on VMs also effectively avoids vendor lock-in. Every cloud provider supports VMs with a RedHat clone, VPCs, load balancing, networked storage, access controls, object storage and a fixed size fleet with auto-relaunch on instance failure.

With a consistent workload, they would have very likely escaped the downtime from AWS a week ago as well, because, as per AWS, "existing EC2 instances that had been launched prior to the start of the event remained healthy and did not experience any impact for the duration of the event".

With Terraform and automation for building launchable images, you can stand up a cluster quickly in any region with secure networking, including in a separate AWS account, in the same region, for the sake of testing.

With AWS, you can set up automatic EBS backups of all your data to snapshots trivially, and even send them to a 3rd locked-down account, so they can't be accidentally wiped.

ZebusJesus•5h ago

Thank you for the share this is really good information for making expensive decisions!

nemothekid•4h ago

>We now save over $1.2M / yr and we expect this to grow, as we grow as a business.

Am I just naive? How is a uptime SaaS product saving over a million year on managed colo vs AWS? Was every API route in it's own EC2 instance?

AWS is expensive sure, but over a million dollars a year? For this product specifically?.

I got some clarification from their earlier posts and it looks like they were intentionally avoiding any AWS platform features:

>Our goal was to avoid reliance on AWS or any proprietary cloud technology.

>When we were utilizing AWS, our setup consisted of a 28-node managed Kubernetes cluster. Each of these nodes was an m7a EC2 instance. With block storage and network fees included, our monthly bills amounted to $38,000+. This brought our annual expenditure to over $456,000+.

I just think if you are going to deploy on AWS, then treat it AWS like managed-colo, then your bill is going to be high. I understand how that seems unfair, but AWS isn't really in the business of selling virtual machines. If you sit down and ask yourself how you got here, it just seems like you committed yourself to wasting money. If I knew I just needed some linux boxes from the start, there are better choices than AWS.

pjdesno•3h ago

I'm involved in a fairly large academic cloud deployment, sited in a 15MW data center built and shared by a few large universities.

There are huge advantages of scale to computer operations in a few areas:

- facility: the capital and running cost of a purpose-built datacenter is far cheaper per rack than putting machines in existing office-class buildings, as long as it's a reasonable size - ours is ~1000 racks, but you might get decent scale at a quarter of that. (also one fat network pipe instead of a bunch of slow ones)

- purchasing: unlike consumer PCs, low-volume prices for major vendor servers are wildly inflated, and you don't get decent prices until you buy quite a few of them.

- operations: people come in integer units, and (assuming your salary ranges are bounded) are only competent in small number of technical areas each. Whether you have one machine or 1000s you need someone who can handle each technology your deployment depends on, from Kubernetes to network ops; multiply 4x for those requiring 24/7 coverage, or accept long response times for off-hours failures.

That last one is probably the kicker. To keep salary costs below 50% of your total, assuming US pay rates and 5-year depreciation since machines aren't getting faster as quickly as they used to, you probably need to be running tens of millions of dollars in hardware.

Note that a tiny deployment of a few machines in a tech company is an exception, since you have existing technical staff who can run them in their spare time. (and you have other interesting work for them to do, so recruiting and retention isn't the same problem as if their only job was to babysit a micro-deployment)

That's why it can be simultaneously true that (a) profit margins on AWS-like services are very high, and (b) AWS is cheaper than running your own machines for a large number of companies.

kshacker•2h ago

> the capital and running cost of a purpose-built datacenter is far cheaper per rack than putting machines in existing office-class buildings, as long as it's a reasonable size - ours is ~1000 racks, but you might get decent scale at a quarter of that.

Just want to confirm what I am reading. You are talking about ~1000 racks as the facility size, not what a typical university requires.

Frannky•3h ago

I only use bare metal—super cheap and very easy to switch. No worries about crazy bills or handling the crazy complexity of their systems. So far, so good. When/if problems start, I'll try them

bhewes•2h ago

Yes to this keep core base load in your own bare metal systems, use the clouds for what they do best.

unixhero•1h ago

I went bare metal too. Not because of AWS, but because of being frozen out by Hetzner because of a debt of 0.02eur with no way of paying it.

StratusBen•1h ago

Co-Founder and CEO of https://vantage.sh/ here - I've been pretty impressed by the rate that repatriation is happening off of public cloud. It rarely ever came up and in the last year it's been popping up more and more -- and especially just for getting access to GPU workloads.

I thought there would be a greater unbundling to AWS or to cheaper providers but it seems like a good-sized portion of the market is just going back to managing their own hardware.

Naklin•45m ago

> We spent a week of engineers time (and that is the worst case estimate) on the initial migration, spread across SRE, platform, and database owners.

I’m sorry but I don’t believe this for one second.

And unfortunately that makes me distrust the entirety of the article.

OpenAI Restructures to Become a More Traditional For-Profit Company

Infrastructure behind Dust deep-dive agent

Kilo for Cursor Refugees Program

Open-sourced game logic, art and Spine animations – SuperWEIRD Game Kit

Republican plan would make deanonymization of US census data trivial

Show HN: AINativeKit-UI – Turn MCP JSON into ChatGPT App UIs

Elon's antics may have cost Tesla more than a million vehicle sales

Show HN: AI-powered element monitoring for websites

Cognition Releases SWE-1.5: Near-SOTA Coding Performance at 950 tok/s

AOL to Be Acquired by Italy's Bending Spoons

How to Kill 2 Monopolies with 1 Tool (X-ray lithography)

Llamafile Returns

Why does every second command fail with Foreign Char sets in there now?

Phillips Machine – Monetary National Income Analogue Computer

Faker: Generate Realistic Test Data in Python with One Line of Code – CodeCut

Ballroom Project Claims 123-Year-Old East Wing

Tell HN: I (accidentally) started "hosting" a government website

Jonas Hietala: Packing Neovim with Fennel

UCLA math department TA, grader cuts spark concern over student learning

Joke's on you, fleshbag! Channel 4's first AI presenter is dizzyingly grim

New Infrastructure-as-Code Tool "Formae" Takes Aim at Terraform

We're Hiring Across the Globe

Meta's OpenZL: A Universal Compression Framework for Structured Data

x86 is an octal machine (1995)

In Ancient Spain, a Nail Through the Skull Could Mean Enmity, or Honor

Why We're Beating Modsecurity

Credit traders are buying protection against Oracle Corp. defaulting on its debt

Do animals fall for optical illusions? It's complicated

Our first narrative collection: the Andrew Nelson papers

Making Messaging Layer Security (MLS) More Decentralized

AWS to bare metal two years later: Answering your questions about leaving AWS

Comments

OpenAI Restructures to Become a More Traditional For-Profit Company

Infrastructure behind Dust deep-dive agent

Kilo for Cursor Refugees Program

Open-sourced game logic, art and Spine animations – SuperWEIRD Game Kit

Republican plan would make deanonymization of US census data trivial

Show HN: AINativeKit-UI – Turn MCP JSON into ChatGPT App UIs

Elon's antics may have cost Tesla more than a million vehicle sales

Show HN: AI-powered element monitoring for websites

Cognition Releases SWE-1.5: Near-SOTA Coding Performance at 950 tok/s

AOL to Be Acquired by Italy's Bending Spoons

How to Kill 2 Monopolies with 1 Tool (X-ray lithography)

Llamafile Returns

Why does every second command fail with Foreign Char sets in there now?

Phillips Machine – Monetary National Income Analogue Computer

Faker: Generate Realistic Test Data in Python with One Line of Code – CodeCut

Ballroom Project Claims 123-Year-Old East Wing

Tell HN: I (accidentally) started "hosting" a government website

Jonas Hietala: Packing Neovim with Fennel

UCLA math department TA, grader cuts spark concern over student learning

Joke's on you, fleshbag! Channel 4's first AI presenter is dizzyingly grim

New Infrastructure-as-Code Tool "Formae" Takes Aim at Terraform

We're Hiring Across the Globe

Meta's OpenZL: A Universal Compression Framework for Structured Data

x86 is an octal machine (1995)

In Ancient Spain, a Nail Through the Skull Could Mean Enmity, or Honor

Why We're Beating Modsecurity

Credit traders are buying protection against Oracle Corp. defaulting on its debt

Do animals fall for optical illusions? It's complicated

Our first narrative collection: the Andrew Nelson papers

Making Messaging Layer Security (MLS) More Decentralized