The long term app model on the market model is shifting much more towards buying services vs renting infrastructure. It’s here where the AWS case falls apart with folks now buying Planet Scale vs RDS, buying DataBricks over the mess that AWS puts for for data lakes, working with model providers directly vs the headaches of Bedrock. The real long term threat is AWS continues to whiff on all the other stuff and gets reduced to a boring rent-a-server shop that market forces will drive to be very low margin.
Yes a lot of those 3rd party services will run on AWS but the future looks like folks renting servers from AWS at 7% gross margin and selling their value-add service on top at 60% gross margin.
What people forget about the OVH or Hetzner comparison is that for those entry servers they are known for, think the Advance line with OVH or AX with Hetzner. Those boxes come with some drawbacks.
The OVH Advance line for example comes without ECC memory, in a server, that might host databases. It's a disaster waiting to happen. There is no option to add ECC memory with the Advance line, so you have to use Scale or High Grade servers, which are far from "affordable".
Hetzner per default comes with a single PSU, a single uplink. Yes, if nothing happens this is probably fine, but if you need a reliable private network or 10G this will cost extra.
But imo, systems like these (like the ones handling bank transaction), should have a degree of resiliency to this kind of failure, as any hw or sw problem can cause something similar.
reference : https://news.ycombinator.com/item?id=38294569
> We have 730+ days with 99.993% measured availability and we also escaped AWS region wide downtime that happened a week ago.
This is a very nice brag. Given they are using their ddos protection ingress via CloudFlare there is that dependancy, but in that case I can 100% agree than DNS and ingress can absolutely be a full time job. Running some microservices and a database absolutely is not. If your teams are constantly monitoring and adjusting them such as scaling, then the problem is the design. Not the hosting.
Unless you're a small company serving up billions of heavy requests an hour, I would put money on the bet AWS is overcharging you.
Technically? Totally doable. But the owners prefer renting in the cloud over the people-related issues of hiring.
You don't need to hire dedicated people full time. It could even be outsourced and then a small contract for maintenance.
It's the same argument you could say for "accounting persons", or "HR persons" - "We are a software organisation!" - Personally I don't buy the argument.
Yeah, those people we outsourced to happen to work at AWS.
Pay some "devops" folks and then underfund them and give them a mandate of all ops but with less people and also you need to manage the constant churn of aws services and then deal with normal outages and dumb dev things.
I’ve been working at a place for a long time and we have our own data centers. Recently there has been a push to move to the public cloud and we were told to go through AWS training. It seems like the first thing AWS does in its training is spend a considerable amount of time on selling their model. As an employee who works in infrastructure, hearing Amazon sell so hard they the company doesn’t need me anymore is not exactly inspiring.
After that section they seem to spend a considerable amount of time on how to control costs. These are things no one really thinks about currently, as we manage our own infra. If I want to spin up a VM and write a bunch of data to it, no one really cares. The capacity already exists and is paid for, adding a VM here or there is inconsequential. In AWS I assume we’ll eventually need to have a business justification for every instance we stand up. Some servers I run today have value, but it would be impossible to financially justify in any real terms when running in AWS where everything has a very real cost assigned to it. What I do is too detached from profit generation, and the money we could save is mostly theoretical, until something happens. I don’t know how this will play out, but I’m not excited for it.
The AWS mandatory training I did in the past was 100% marketing of their own solutions, and tests are even designed to make you memorize their entire product line.
The first two levels are not designed for engineers: they're designed for "internal salespeople". Even Product Managers were taking the certification, so they would be able to recommend AWS products to their teams.
It wasn't as simple as that then, at it's still not as simple as that now.
It’s become polarised (as everything seems to).
I’ve specced bare metal, I’ve specced AWS, which is used entirely a matter of the problem/costs and relative trade-offs.
That is all it is.
Clients that use cloud consistently end up spending more on devops resources, because their setups tends to be wastly more complex and involve more people.
The biggest ops teams I worked alongside were always dedicated to running AWS setups. The slowest too were dedicated to AWS.
People here are comparing the worst possible of Bare Metal with "hosting my startup on AWS".
I wish I could come up with some kind of formalization of this issue. I think it has something to do with communication explosions across multiple people.
Right, doesn't that include figuring out the right and best way of running it, regardless if it runs on client machines or deployed on servers?
At least I take "software engineering" to mean the full end-to-end process, from "Figure out the right thing to build" to "runs great wherever it's meant to run". I'm not a monkey that builds software on my machine and then hands it off to some deployment engineer who doesn't understand what they're deploying. If I'm building server software, part of my job is ensuring it's deployed in the right environment and runs perfectly there too.
With AWS I think this tradeoff is very weak in most cases: the tasks that you are paying AWS for are relatively cheap in time-of-people-in-your-org, and AWS also takes up a significant amount of that time with new tasks as well. Of the organisations I'm personally aware of, the ones who hosted on-prem spent less money on their compute and had smaller teams managing it, with more effective results than those who were cloud-based (to various degrees of egregousness from 'well, I can kinda see how it's worth it because they're growing quickly' to 'holy shit they're setting money on fire and compromising their product because they can't just buy some used tower PCs and plug them in in a closet in the office')
It's also that the requirements vary a lot, discussions here on HN often seem to assume that you need HA and lots of scaling options. That isn't universally true.
This applies only if you had an extra customer that pays the difference. Basically argument only holds if you can’t take more customers because upkeeping the infrastructure takes too much time or you need to hire extra person which takes more money than AWS bill difference.
("Shall we make the app very resilient to failure? Yes running on multiple regions makes the AWS bill bigger but you'll get much fewer outages, look at all this technobabble that proves it")
And of course AWS lock-in services are priced to look cheaper compared to their overpricing of standard stuff[1] - if you just spend the engineering effort and IaC coding effort to move onto them, this "savings" can be put to more AWS cloud engineering effort which again makes your cloud eng org bigger and more important.
[1] (For example implementing your app off containers to Lambda, or the db off PostgreSQL to DynamoDB etc)
I don't think it is easy. I see most organizations struggle with the fact that everything is throttled in the cloud. CPU, storage, network. Tenants often discover large amounts of activity they were previously unaware of, that contributes to the usage and cost. And there may be individuals or teams creating new usages that are grossly impacting their allocation. Did you know there is a setting in MS SQL Server that impacts performance by an order of magnitude when sending/receiving data from the Cloud to your on-premises servers? It's the default in the ORM generated settings.
Then you can start adding in the Cloud value, such as incomprehensible networking diagrams that are probably non-compliant in some way (guess which ones!), and security? What is it?
Sounds interesting, which setting is that?
Many a company was stuck with a datacenter unit that was unresponsive to the company's needs, and people migrated to AWS to avoid dealing with them. This straight out happened in front of my eyes multiple times. At the same time, you also end up in AWS, or even within AWS, using tools that are extremely expensive, because the cost-benefit analysis for the individuals making the decision, who often don't know very much other than what they use right now, are just wrong for the company. The executive on top is often either not much of a technologist or 20 years out of date, so they have no way to discern the quality of their staff. Technical disagreements? They might only know who they like to hang out with, but that's where it ends.
So for path dependent reasons, companies end up making a lot of decisions that in retrospect seem very poor. In startups if often just kills the company. Just don't assume the error is always in one direction.
I'd like to +1 here - it's an understated risk if you've got datacenter-scale workloads. But! You can host a lot of compute on a couple racks nowadays, so IMHO it's a problem only if you're too successful and get complacent. In the datacenter, creative destruction is a must and crucially finance must be made to understand this, or they'll give you budget targets which can only mean ossification.
If you hire people that are not responsive to your needs, then, sure, that is a problem that will be a problem irrespective of what their pet stack is.
In a large company I worked the Ops team that had the keys to AWS was taking literal months to push things to the cloud, causing problems with bonuses and promotions. Security measures were not in place so there were cyberattacks. Passwords of critical services lapsed because they were not paying attention.
At some point it got so bad that the entire team was demoted, lost privileges, and contractors had to jump in. The CTO was almost fired.
It took months to recover and even to get to an acceptable state, because nothing was really documented.
Looking back at doing various hiring decisions at various levels of organizations, this is probably the single biggest mistake I've done multiple times, hiring specific people using specific technology because we were specifically using that.
You'll end up with a team unwilling to change, because "you hired me for this, even if it's best for the business with something else, this is what I do".
Once I and the organizations shifted our mindset to hiring people who are more flexible, even if they have expertise in one or two specific technologies, they won't put their head in the sand whenever changes come up, and everything became a lot easier.
Also there's a mindset difference - if I gave you a server with 32 cores you wouldn't design a microservice system on it, would you? After all there's nowhere to scale to.
But with AWS, you're sold the story of infinite compute you can just expect to be there, but you'll quickly find out just how stingy they can get with giving you more hardware automatically to scale to.
I don't dislike AWS, but I feel this promise of false abundance has driven the growth in complexity and resource use of the backend.
Reality tends to be you hit a bottleneck you have a hard time optimizing away - the more complex your architecture, the harder it is, then you can stew.
Let me go on a tangent about trains. In Spain before you board a high-speed train you need to go though full security check, like on an airport. In all other EU countries you just show up and board, but in Spain there's the security check. The problem is that even though the security check is an expensive, inefficient theatre, just in case something does blow up, nobody wants to be the politician that removed the security check. There will be no reward for a politician that makes life marginally easier for lots of people, but there will be severe punishment for a politician that is involved in a potential terrorist attack, even if the chance of that happening is ridiculously small.
This is exactly why so many companies love to be balls deep into AWS ecosystem, even if it's expensive.
Just for curiosity's sake, did any other EU countries have any recent terrorist attacks involving bombs on trains in the capital, or is Spain so far alone with this experience?
You can pay for EC2+EBS+network costs, or you can have a fancy cloud native solution where you pay for Lambda, ALBs, CloudWatch, Metrics, Secret Manager, (things you assume they would just give you, like if you eat at a restaurant, you probably won't expect to pay for the parking, toilet, or paying rent for the table and seats).
So cloud billing is its own science and art - and in most orgs devs don't even know how much the stuff they're building costs, until finance people start complaining about the monthly bills.
The consequence of running a database poorly is lost data.
At the end of the day they're all just processes on a machine somewhere, none of it is particularly difficult, but storing, protecting, and traversing state is pretty much _the_ job and I can't really see how you'd think ingress and DNS would be more work than the datastores done right.
Now with AWS, I have a SaaS that makes 6 figures and the AWS bill is <$1000 a month. I'm entirely capable of doing this on-prem, but the vast majority of the bill is s3 state, so what we're actually talking about is me being on-call for an object store and a database, and the potential consequences of doing so.
With all that said, there's definitely a price point and staffing point where I will consider doing that, and I'm pretty down for the whole on-prem movement generally.
That's the sweet spot for AWS customers. Not so much for AWS.
The key thing for AWS is trying to get you locked in by "helping you" depend on services that are hard to replicate elsewhere, so that if your costs grow to a point where moving elsewhere is worth it, it's hard for you to do so.
The biggest difficulty in eating into AWS market share is that believing it is cheap has become religion.
Perfect example - MSK. The brokers are config locked at certain partition counts, even if your CPU is 5%. But their MSK replicator is capped on topic count. So now I have to work around topic counts at the cluster level, and partition counts at the broker level. Neither of which are inherent limits in the underlying technologies (kafka and mirrormaker)
It's a way to "commoditize" engineers. You can run on premise or mixed infra better and cheaper, but only if you know what you are doing. This requires experienced guys and doesn't work with new grad hired by big cons and sold ad "cloud experts".
(I do that for people; my AWS using customers consistently end up needing more help)
but somehow that is never a problem.
this is the main advantage of cloud, no one cares if the site/service/app is down as long as it's someone else's fault and responsibility.
I'm not. It seems to be happening a lot. Any time a topic about not using AWS comes up here, or on Reddit there a sudden surge of people appearing out of nowhere shouting down anyone who suggests other options. It's honestly starting to feel like paid shilling.
The cloud stuff is extremely expensive and doesn't work any better than our existing solutions. Like a commentator said below, it's insidious as your entire organization later becomes dependent on that. If you buy a cloud solution, you're also stuck with the vendor deciding to double the cost of the product once you're locked in.
The GPU stuff is annoying as all of our needs are fine with normal CPU workloads today. There are no performance issues, so again...what's the point? Well... somebody wants to play with GPUs I guess.
AWS/Azure/GCP is great, but like any tool or platform you need to do some financial/process engineering to make an optimal choice. For small companies, time to market is often key, hence AWS.
Once you’re a little bigger, you may develop frameworks to operate efficiently. I have apps that I run in a data center because they’d cot 10-20x at a cloud provider. Conversely, I have apps that get more favorable licensing terms in AWS that I run there, even though the compute is slower and less efficient.
You also have people who treat AWS with the old “nobody gets fired for buying IBM” mentality.
I imagine a lot of people who use Linux/AWS now started out with bare metal Microsoft/VMWare/Oracle type of environments where AWS services seemed like a massive breath of fresh air.
Having an ability to spin up a server or a vm when you need it without having to ask a single question is very liberating. Sometimes such elasticity is exactly what's needed. OTOH other people's servers aren't always the wise choice, but you have to know both environments to make the right choice, and nowadays I feel most people don't really know anything about bare metal.
Luckily, Amazon is far from the only VM provider out there, so this discussion doesn't need to be polarized between "AWS everything" and "on-premise everything". You can rent VMs elsewhere for a fraction of the cost. There are many places that will rent you bare metal servers by the hour, just as if they were VMs. You can even mix VMs and bare metal servers in the same datacenter.
Beyond public cloud being bad for the planet, I also hate that it drains companies of money, centralizes everyone's risk, and helps to entrench Amazon as yet another tech oligarchic fiefdom. For most people, these things just don't matter apparently.
Similar here, I think. I got into Computer Science because I liked software... the way it was. Now I truly think that most software completely sucks.
The thing is that it has grown so much since then, that most developers come from a different angle.
If you cant build a positive business case then its not the correct move. Cash is king. Sadly.
A lot of the discussion here is that the cost of the in-house team is less than people think.
For instance: at a former gig, we used a service in the EU that handled weekends, holidays and night time issues and escalated to our team as needed. It was pretty cheap, approximately $10K monthly fee for availability and hourly rate when there were any issues to be resolved. There were a few mornings I had an email with a post-mortem report and an invoice for a hundred euros or so. We came pretty close to 5 9's uptime but we didn't have to worry about SLA's or anything.
Well, well, they have a whole team doing "devops administration" on AWS and require extra people. So not having the money for an in-house team ... no AWS for you.
I've worked for 2 large-ish firms in the past 3 years. One huge telco, one "medium" telco (still 100s of people). BOTH had a team just for AWS IAM administration. Only for that one thing, because that was company-wide (and was regularly demonstrated to be a single point of failure). And they had AWS administrator teams, yes teams, for every department (even HR had one, though in the medium telco all management had a shared team, but the networking and development departments still had their own AWS teams, who, btw, also did IAM. The company-wide IAM team maintained an AWS IAM and some solution they'd bought that also worked for their windows domain and ticketing system (I hate you IBM remedy), and eqiupment ordering portal and ...)
AND there were "devops" positions on every development team, and on the network engineering team, and even a small one for the building "technics" team.
Oh and they both had an internal cluster on top of AWS, part on-premise, part rented DC space, which did at least half the compute work (but presumably a lot less of the weird edge-cases), that one ran the company services that are just insane on AWS like any kind of video.
they sell "you don't need a team"... which is true om your prototype and mvp phase. and you know when you grow you will have an ops team and maybe move out.
but in the very long middle time... you will be supporting clients and sla etc, and will end up paying both aws AND an ops team without even realizing.
We should coin the term "Cloud Learned Helplessness"
Having done consulting in this space for a decade, and worked with containerised systems since before AWS existed, my experience is that managing an AWS system is consistently more expensive and that in fact the devops cost is part of what makes AWS an expensive option.
That said, I've seen real world scenarios where complexity is up the wazoo and an opex cost focus means you're hiring under skilled staff to manage offerings built on components with low sticker prices. Throw in a bit of the old NIH mindset (DIY all the things!) and it's large blast radii with expensive service credits being dished out to customers regularly. On a human factors front your team will be seeing countless middle of the night conference calls.
While I'm not 100% happy with the AWS/Azure/GCP world, the reality is that on-prem skillsets are becoming rarer and more specialist. Hiring good people can be either really expensive or a bit of a unicorn hunt.
Most AWS-only Ops engineers I know are making bank and in high demand, and Ops teams are always HUGE in terms of headcount outside of startups.
The "AWS is cheaper" thing is the biggest grift in our industry.
You can easily get your service up by asking claude code or whatever to just do it
It produces aws yaml that’s better than many devops people I’ve worked with. In other words, it absolutely should not be trusted with trivial tasks, but you could easily blow $100K’s per year for worse.
After fully in cloud for sometimes, we’re moving to hybrid solutions. The upper management happy with costs and the cloud engineer had new toy's
At the same time, the incredible complexity of the software infrastructure is making specialists more and more useless. To the point that almost every successful specialist out there is just some disguised generalist that decided to focus their presentation in a single area.
What?
I throw up in my mouth every time I see "full stack" in a job listing.
We got rid of roles... DBA's, QA teams, Sysadmins, then front and back end. Full Stack is the "webmaster" of the modern era. It might mean front and back end, it might mean sysadmin and DBA as well.
It's not even limited to sysadmins, or in tech. How do you know whether a mechanic is very good, or iffy? Is a financial advisor giving you good advice, or basically robbing you? It's not as if many companies are going to hire 4 business units worth of on prem admins, and then decide which one does better after running for 3 years, or something empirical like that. You might be the poor sob that hires the very expensive, yet incompetent and out of date specialist, whose only remaining good skill is selling confidence to employers.
Of course but unless I misunderstood what you meant to say, you don't escape that by buying from AWS. It's just that instead of "sysadmin specialists" you need "AWS specialists".
If you want to outsource the job then you need to go up at least 1 more layer of abstraction (and likely an order of magnitude in price) and buy fully managed services.
The most frustrating part of hyperscalers is that it's so easy to make mistakes. Active tracking of you bill is a must, but the data is 24-48h late in some cases. So a single engineer can cause 5-figure regrettable spend very quickly.
helm repo add mariadb-operator https://mariadb-operator.github.io/mariadb-operator
helm install mariadb-operator mariadb-operator/mariadb-operator
And then you can just provision MariaDB "kind", ie. you kubectl apply with something specifying database name, maximum memory, type of high availability (single primary or multimaster) and secret reference and there you go: new database, ready to be plugged into other pods.see so many series B+ companies running DB and storage without a care in the world.
Running mysqldump to a usb disk in the office once a day is pretty cheap.
I expect most, if not 99%, of all businesses can cope with a hardware failure and the associated downtime while restoring to a different server, judging from the impact of the recent AWS outage and the collective shrug in response. With a proper raid setup, data loss should be quite rare, if more is required a primary + secondary setup with a manual failover isn't hard.
Basic async logical replication in MySQL/MariaDB is extremely easy to set up, literally just a few commands to type.
Ditto for doing failover manually the rare times it is needed. Sure, you'll have a few minutes of downtime until a human can respond to the "db is down" alert and initiates failover, but that's tolerable for many small to medium sized businesses with relatively small databases.
That approach was extremely common ~10-15 years ago, and online businesses didn't have much worse availability than they do today.
Moving away is an existential issue for them - this is why there's such pushback. A huge % of new developer and devops generation doesn't know anything about deploying software on bare metal or even other clouds and they're terrified about being unemployed.
I see more comments in favor than pushing back.
The problem I have with these stories is the confirmation bias that comes with them. Going self-hosted or on-premises does make sense in some carefully selected use cases, but I have dozens of stories of startup teams spinning their wheels with self-hosting strategies that turn into a big waste of time and headcount that they should have been using to grow their businesses instead.
The shared theme of all of the failure stories is missing the true cost of self-hosting: The hours spent getting the servers just right, managing the hosting, debating the best way to run things, and dealing with little issues add up but are easily lost in the noise if you’re not looking closely. Everyone goes through a honeymoon phase where the servers arrive and your software is up and running and you’re busy patting yourselves on the back about how you’re saving money. The real test comes 12 months later when the person who last set up the servers has left for a new job and the team is trying to do forensics to understand why the documentation they wrote doesn’t actually match what’s happening on the servers, or your project managers look back at the sprints and realize that the average time spent on self-hosting related tasks and ideas has added up to a lot more than anyone would have guessed.
Those stories aren’t shared as often. When they are, they’re not upvoted. A lot of people in my local startup scene have sheepish stories about how they finally threw in the towel on self-hosting and went to AWS and got back to focusing on their core product. Few people are writing blog posts about that because it’s not a story people want to hear. We like the heroic stories where someone sets up some servers and everything just works perfectly and there are no downsides.
You really need to weigh the tradeoffs, but many people are not equipped to do that. They just think their chosen solution will be perfect and the other side will be the bad one.
What the modern software business seems to have lost is the understanding that ops and dev are two different universes. DevOps was a reaction to the fact that even outsourcing ops to AWS doesn’t entirely solve all of your ops problems and the role is absolutely no substitute for a systems administrator. Having someone that helps derive the requirements for your infrastructure, then designs it, builds it , backs it up, maintains it, troubleshoots it, monitors performance, determines appropriate redundancy, etc. etc. etc. and then tells the developers how to work with it is the missing link. Hit-by-a-bus documentation, support and update procedures, security incident response… these are all problems we solved a long time ago, but sort of forgot about moving everything to cloud architecture.
This is a fascinating take, if you ask me, treating them as separate is the whole problem!
The point of being an engineer is to solve real world problems, not to live inside your own little specialist world.
Obviously there's a lot to be said for being really good at a specialized set of skills, but thats only relevant to the part where you're actually solving problems.
DevOps, conceptually, goes back to the 90s. I was using the term in 2001. If memory serves, AWS didn't really start to take off until the mid/late aughts, or at least not until they launched S3.
DevOps was a reaction to the software lifecycle problem and didn't have anything to do with AWS. If anything it's the other way around: AWS and cloud hosting gained popularity in part due to DevOps culture.
The problems here are no different than using SaaS anywhere else in a business, you can also run all your sales tracking through excel, it's just that once you have more than a few people doing sales that becomes a major bottleneck the same way not having an easier to manage infrastructure system.
We planned a migration to move from 4OD instances to one on prem machine and we guessed we’d save $1000/mo, our builds would be faster and we’d have less failures due to capacity issues. We even had a spare workstation and a rack in the office that so the capex was 0.
I plugged the machine into the rack and no internet connectivity. Put in an IT ticket which took 2 days for a reply, only to be told that this was an unauthorised machine and needed to be imaged by IT. The back and forth took 4 weeks, multiple meetings and multiple approvals. My guess is that 4 people spent probably 10 hours arguing whether we should do this or not.
On AWS I can write a python script and have a running windows instance in 15 minutes.
https://en.wikipedia.org/wiki/Bare_metal
Edit: For clarity, wikipedia does also have pages with other meanings of "bare metal", including "bare metal server". The above link is what you get if you just look up "bare metal".
I do aim to be some combination of clear, accurate and succinct, but I very often seem to end up in these HN pissing matches so I suppose I'm doing something wrong. Possibly the mistake is just commenting on HN in itself.
I'm not sure what you did, but when you go to that Wikipedia article, it redirects to "Bare Machine", and the article contents is about "Bare Machine". Clicking the link you have sends you to https://en.wikipedia.org/wiki/Bare_machine
So it seems like you almost intentionally shared the article that redirects, instead of linking to the proper page?
Can we stop this now? Please?
Fix it then, if you think it's incorrect. Otherwise, link to https://en.wikipedia.org/wiki/Bare_metal_(disambiguation) like any normal and charitable commentator would do.
> Can we stop this now? Please?
Sure, feel free to stop at any point you want to.
Things today are different. As cloud service providers have grown to become dominant, they now offer a vast, complicated tangle of services, microservices, control panels, etc., at prices that can spiral out of control if you are not constantly on top of them, making bare metal cheaper for many use cases.
That was never the case for AWS, the point was never "We're cheap" but "We let you scale faster for a premium".
I first came across cloud services around 2010-2011 I think, when the company I worked at at the time started growing and we needed something better than shared hosting. AWS was brought up as a "fresh but expensive" alternative, and the CTO managed to convince the management that we needed AWS even if it was expensive, because it'll be a lot easier to tear up/down servers as we need it. Bandwidth costs I think was the most expensive part of the package, at least back then.
When I look at what performance per $ you get with AWS et al today, it looks the same, incredibly expensive for the performance you (don't) get. Better off with dedicated instances unless you team is lacking the basic skills of server management, or until the company really grown so it keeps being difficult dealing with the infrastructure, then hire a dedicated person and let them make the calls for what's next.
Being able to start small from a $1/mth bill without any fixed cost overheads is incredibly powerful for small startups.
If I wanted to store bytes in a DC it would cost $10k/mth by the time I was paying colo/ servers/ disks before I stored my first byte. Sure there wouldn't be any incremental costs for the second byte but thats a steep jump. S3 would have cost me $0.02. Being able to try technology and prove concepts at the product development stage is very powerful and why AWS became not just a vendor but a _technology partner_ for many companies.
Yes, no doubt about it. Initially AWS was mostly sold as "You never know when you might want to scale fast, imagine being featured in a newspaper and your servers can't handle the load, you need cloud for that!" to growing startups, and in that context it kind of makes sense, pay extra but at least be online.
But initially when you're small, or later when you're big and establish, other things make more sense. But yes, I agree that if you need to aggressively be able to scale up or down, cloud resources make sense to use for that, in addition to your base infrastructure.
AWS needs to stop trying to have a half-arsed solution to every possible use case and instead focus on doing a few basic things really well.
[0] https://aws.amazon.com/certification/certified-solutions-arc...
AWS was mostly spared from yesterday’s big cuts but have been told to “watch this space” in the new year after re:Invent.
A lot of newer stuff that actually scales (so Lightsail doesn't count) is entangled with "security", "observability" and "network" services. So if you just want to run EC2 + RDS today, you also have to deal with VPC, Subnets, IAM, KMS, CloudWatch, CloudTrail, etc.
Since security and logs are not optional, you have very limited choice.
Having that many required additional services means lots of hidden charges, complexity and problems. And you need a team if you're not doing small-scale stuff.
> Our workload is 24/7 steady. We were already at >90% reservation coverage; there was no idle burst capacity to “right size” away. If we had the kind of bursty compute profile many commenters referenced, the choice would be different.
Which TBH applies to many, many places, even if they are not aware of it.In any case, not everyone need five nines, and usually it's just much easier to bring down a platform due to some bug in your own software rather that the core infrastructure going down at a rack level.
> The Equinix Metal service will be sunset on June 30, 2026.
Our .NET Apps are still deployed as Docker Compose Apps which we use GitHub Actions and Kamal [1] to deploy. Most Apps use SQLite + Litestream with real-time replication to R2, but have switched to a local PostgreSQL for our Latest App with regular backups to R2.
Thanks to AI that can walk you through any hurdle and create whatever deployment, backup and automation scripts you need, it's never been easier to self-host.
This really is the crux of the matter in my opinion, at least for applications (databases and so on is in my opinion more nuanced). I've only worked at one place where using cloud functions made sense (keeping it somewhat vague here): data ingestion from stations that could be EXTREMELY bursty. Usually we got data from the stations at roughly midnight every day, nothing a regular server couldn't handle, but occasionally a station would come back online after weeks or new stations got connected etc which produced incredible load for a very short amount of time when we fetched, parsed and handled each packet. Instead of queuing things for ages we could instead just horizontally scale it out to handle the pressure.
Compared to all the other things that can and will go wrong, this risk seems pretty small, but I have no data to back that up.
Is there a simple safe setup that we can run on an Ubuntu server?
We self-host the Postgres db with frequent backups to s3 but just in case the site takes off, we need an affordable reliable solution.
Does anyone here run their own db servers? Any advise?
Backups, security, upgrades etc
Info noted
Setting up a DB isn't hard, using an LLM to ask questions will guide you to the right places. I'm always talking with Gemini because I switched from Ubuntu to Fedora 42 server and things are slightly different here and there.
But, different server hosts offer DB-ready OS's so all you have to do is load the OS on the server and you'll be ready to go.
The joy of Linux is getting everything _just right_ and so much _just right_ that you can launch a second server and set it up that way _just right_ within minutes.
Gee, how hard is to find SE experts in that particular combination of available ops tools? While in AWS every AWS certified engineer would speak the same language, the DIY approach surely suffers from the lack of "one way" to do things. Change Flux with Argo for example (assuming the post is talking about that Flex and no another tool with the same name), and you have a almost completely different gitops workflow. How do they manage to settle with a specific set of tools?
What are the major differences?
However to steelman AWS use. Many businesses are STILL running mainframes. Many run terrible setups like Access as a production database. In 2025 there are large companies with no CICD platforms or IAC, and some companies where even VC is still a new concept or a dark art. So not every company is in the position to actually hire competent system administrators and system engineers to set up some bare metal machines and configure Ceph, much less Hadoop or Kubernetes. So AWS lets these companies just buy this capabilities while forcing the software stack to modernize.
That company had its own data center, tape archives, etc. It had been running largely the same way continuously since the 90s. When I left for a better job, the company had split into two camps. The old curmudgeonly on-prem activists and the over-optimistic cloud native AWS/GCP certified evangelist with no real experience in the cloud (because they worked at a company with no cloud presence). I'm humble enough to admit that I was part of the second camp and I didn't know shit, I was cargo culting.
This migration is still not complete as far as I'm aware. Hopefully the teams that resisted this long and never left for the cloud get to settle in for another decade of on-prem superiority lol.
The story will be different for every business because every business has different needs.
Given the answer to "How much did migration and ongoing ops really cost?" it seems like they had an incredibly simple infrastructure on AWS, and it was really easy to move out. If you use a wider-range of services the cost savings are much more likely to cancel themselves.
Very much this.
Small team in a large company who has an enterprise agreement (discount) with a cloud provider? The cloud can be very empowering, in that teams who own their infra in the cloud can make changes that benefit the product in a fraction of the time it would take to work those changes through the org on prem. This depends on having a team that has enough of an understanding of database, network and systems administration to own their infrastructure. If you have more than one team like this, it also pays to have a central cloud enablement team who provides common config and controls to make sure teams have room to work without accidentally overrunning a budget or creating a potential security vulnerability.
Startup who wants to be able to scale? You can start in the cloud without tying yourself to the cloud or a provider if you are really careful. Or, at least design your system architecture in such a way that you can migrate in the future if/when it makes sense.
In 2010 you could only get 64 Core Xeon CPU coming in 8 Sockets, or maximum or 8 Core per socket. And that is ignoring NUMA issues. Today you could get 256 Core per socket that is at least twice as fast per core. What used to be 64 Server could now be fitted into 1. And by 2030, it would be closer to 100 to 1 ratio. Not to mention Software on Server has gotten a lot faster compared to 2010. PHP, Python, Ruby, Java, ASP or even Perl. If we added up everything I wouldn't be surprised we are 200 or 300 to 1 ratio compared to 2010.
I am pretty sure there is some version of Oxide in the pipeline that will catch up to latest Zen CPU Core. If a server isn't enough, a few Oxide Rack should fit 99% of Internet companies usage.
However, I do get the point about cost-premium and more importantly vendor-risk that's paid when using managed services.
We are hosted on cloudflare workers which is very cheap, but to mitigate the vendor risk we have also setup up replicas of our api servers on bunny.net and render.com.
For example, we couldn't offer free GeoIP downloads[0] if we were charged the outrageous $0.09 / GB, and the same is true for companies serving AI models or game assets.
But what makes me almost sick is how slow is the cloud. From network-attached disks to overcrowded CPUs, everything is so slooooow.
My experience is that the cloud is a good thing between 0-10,000 $ / month. But you should seriously consider renting bare-metal servers or owning your own after that. You can "over-provision" as much as you want when you get 10-20x (real numbers) the performance for 25% of the price.
I really like how people throw around these baseless accusations.
S3 is one of the cheapest storage solutions ever created. The last 10 years I have migrated roughly 10-20PB worth of data to AWS S3 and it resulted in significant cost saving every single time.
If you do not know how to use cloud computing than yes, AWS can be really expensive.
The real cost of self-hosting, in my direct experience with multiple startup teams trying it, is the endless small tasks, decisions, debates, and little changes that add up over time to more overhead than anyone would have expected. Everyone thinks it’s going to be as simple as having the colo put the boxes in the rack and then doing some SSH stuff, then you’re free of those AWS bills. In my experience it’s a Pandora’s box of tiny little tasks, decisions, debates, and “one more thing” small changes and overhauls that add up to a drain on the team after the honeymoon period is over.
If you’re a stable business with engineers sitting idle that could be the right choice. For most startups who just need to get a product out there and get customers, pulling limited headcount away from the core product to save pennies (relatively speaking) on a potential AWS bill can be a trap.
If those 20PB are deep archive, the S3 Glacier bill comes out to around $235k/year, which also seems ludicrous: it does not cost six figures a year to maintain your own tape archive. That's the equivalent of a full-time sysadmin (~$150k/year) plus $100k in hardware amortization/overhead.
The real advantage of S3 here is flexibility and ease-of-use. It's trivial to migrate objects between storage classes, and trivial to get efficient access to any S3 object anywhere in the world. Avoiding the headache of rolling this functionality yourself could well be worth $3.6M/year, but if this flexibility is not necessary, I doubt S3 is cheaper in any sense of the word.
As soon as you start talking about any kind of serious data storage and data transfer the costs start piling up like crazy.
Like in my mind, the cost curve should flatten out over time. But that just doesn't seem to be the reality.
I think as AWS grows and changes the curve of the target audience is changing too. The value proposition is "You can get Cloud service without having a dedicated Cloud team," but there are caveats:
- AWS is complicated enough that you will still need a team to integrate against it. The abstractions are not free and the ones that are leaky will bite you without dedicated systems engineers to specialize in making it work with your company's goals.
- For small companies with little compute need, AWS is a good option. Beyond a certain scale... It is worth noting that big companies build their own datacenters, they don't rely on someone else's Cloud. Amazon, Google, and Microsoft don't run on each other.
- Recently, the cost model has likely changed if a company pokes their head up and runs the numbers, there's, uh, quite a few engineers with deep knowledge of how to build a scalable cloud infrastructure available to hire now for some reason. In fact, a savvy company keeping its ear to the ground can probably snap up some high-tier talent very soon (https://www.reuters.com/business/world-at-work/amazon-target...).
It really depends on where your company's risk and cost models are. Running on someone else's cloud just isn't the only option.
I just don't see it. Given the nature of the services they offer it's just too risky not to use as much managed stuff with SLAs as possible. k8s alone is a very complicated control plane + a freaking database that is hard to keep happy if it's not completely static. In a prior life I went very deep on k8s, including self managing clusters and it's just too fragile, I literally had to contribute patches to etcd and I'm not a db engineer. I kept reading the post and seeing future failure point after future failure point.
The other aspect is there doesn't seem to be an honest assessment of the tradeoffs. It's all peaches and cream, no downsides, no tradeoffs, no risk assessment etc.
And let’s be very real here: if your cloud service goes down for a few hours because you screwed something up, or because AWS deployed some bad DNS rules again, the world moves on. At the end of the day, nobody gives a shit.
(1) Massive expansion of budget (100 - 1000x) to support empire building. Instead of one minimum-wage sysadmin with 2 high-availability, maxed-out servers for 20K - 40K (and 4-hour response time from Dell/HPE), you can have 100M multi-cloud Kubernetes + Lambda + a mix-and-match of various locked-in cloud services (DB, etc.). And you can have a large army of SRE/DevOps. You get power and influence as a VP of Cloud this and that and 300 - 1000 people reporting to you.
(2) OpEx instead of CapEx
(3) All leaders are completely clueless about hiring the right people in tech. They hire their incompetent buddies who hire their cronies. Data centers can run at scale with 5-10 good people. However, they hire 3000 horrible, incompetent, and toxic people, and they build lots of paperwork, bureaucracy, and approvals around it. Before AWS, it was VMware's internal cloud that ran most companies. Getting bare metal or a VM will take months to years, and many, many meetings and escalations. With AWS, here is my credit card, pls gimme 2 Vms is the biggest feature.
In contrast, you could throw a stone into a bush and hit an AWS guy.
I wish you started out by telling me how many customers you have to serve, how many transactions they generate, how much I/O there is.
Eventually, I realized that it was because the devs wanted to put "AWS" on their resumes. I wondered how long it would take management to catch on that they were being used as a place to spruce up your resume before moving on to catch bigger fish.
But not long after, I realized that the management was doing the same thing. "Led a team migration to AWS" looked good on their resume, also, and they also intended to move on/up. Shortly after I left, the place got bought and the building it was in is empty now.
I wonder, now that Amazon is having layoffs and Big Tech generally is not as many people's target employer, will "migrated off of AWS to in-house servers" be what devs (and management) want on their resume?
And then discussions on how to move forward are held between people that only know AWS and people who want to use other stuff, but only one side is transparent about it.
> You do not have the appetite to build a platform team comfortable with Kubernetes, Ceph, observability, and incident response.
Has work been using AWS wrong? Other than Ceph, all those things add up to onerous half time jobs for rotating software engineers.
Before gp3 came out, working around EBS price/performance terribleness was also on the list.
alyxya•3h ago