We moved from AWS to Hetzner, saved 90%, kept ISO 27001 with Ansible

https://medium.com/@accounts_73078/goodbye-aws-how-we-kept-iso-27001-slashed-costs-by-90-914ccb4b89fc

182•sksjvsla•6h ago

Earlier this year I led our migration off AWS to European cloud (Hetzner + OVHcloud), driven by cost (we cut 90%) and data sovereignty (GDPR + CLOUD Act concerns).

We rebuilt key AWS features ourselves using Terraform for VPS provisioning, and Ansible for everything from hardening (auditd, ufw, SSH policies) to rolling deployments (with Cloudflare integration). Our Prometheus + Alertmanager + Blackbox setup monitors infra, apps, and SSL expiry, with ISO 27001-aligned alerts. Loki + Grafana Agent handle logs to S3-compatible object storage.

The stack includes: • Ansible roles for PostgreSQL (with automated s3cmd backups + Prometheus metrics) • Hardening tasks (auditd rules, ufw, SSH lockdown, chrony for clock sync) • Rolling web app deploys with rollback + Cloudflare draining • Full monitoring with Prometheus, Alertmanager, Grafana Agent, Loki, and exporters • TLS automation via Certbot in Docker + Ansible

I wrote up the architecture, challenges, and lessons learned: https://medium.com/@accounts_73078/goodbye-aws-how-we-kept-i...

I’m happy to share insights, diagrams, or snippets if people are interested — or answer questions on pitfalls, compliance, or cost modeling.

Comments

Keyframe•5h ago

I think the most often mentioned problems mentioned are pollution of Hetzner addresses by shady people (might be addressed with "exits" from AWS / Cloudflare) and you are running on hardware which does tend to fail / needs upgrades. Were there some concerns on those from you?

Also, Loki! How do you handle memory hunger on loki reader for those pesky long range queries, and are there alternatives?

sksjvsla•5h ago

Pollution: We front everything user-facing through Cloudflare, so external users (and bots) don’t interact directly with our Hetzner/OVH IPs. We lock down our IPs at the firewall (ufw + Cloudflare IP allowlisting) so only trusted sources can even connect at L4.

Failures/upgrades: We provision with Terraform, so spinning up replacements or adding capacity is fast and deterministic.

We monitor hardware metrics via Prometheus and node exporter to get early warnings. So far (9 months in) no hardware failure, but it’s a risk we offset through this automation + design.

Apps are mostly data-less and we have (frequently tested) disaster recovery for the database.

Loki: We’re handling the memory hunger by

• Distinguishing retention limits and index retention

• Tuning query concurrency and max memory usage via Loki'’'s config + systemd resource limits.

• Use Promtail-style labels + structured logging so queries can filter early rather than regex the whole log content.

• Where we need true deep history search, we offload to object store access tools or simple grep of backups — we treat Loki as operational logs + nearline, not as an archive search engine.

Keyframe•5h ago

Thanks for thorough answer! Seems like you've platformized(!) yourself to an extent, have you considered going full on with k8s on top of metal (their machines) to offset some of the concerns about hardware?

sksjvsla•4h ago

Thanks for the compliment.

We used AWS EKS in the old days and we never liked the extreme complexity of it.

With two Spring Boot apps, a database and Redis running across Ubuntu servers, we found simpler tools to distribute and scale workloads.

Since compute is dirt cheap, we over-provision and sleep well.

We have live alerts and quarterly reviews (just looking at a dashboard!) to assess if we balance things well.

K8s on EKS was not pleasant, I wanna make sure I never learn how much worse it can get across European VPS providers.

sksjvsla•5h ago

A good alternatives for Loki is Victoria. Popular, way more performant and reputable but we went with Loki because of the relative size and diversity of maintainers between the two projects. Your points are super valid and we worked around it as mentioned above.

chuckadams•1h ago

Quickwit is also worth a look, along with its log collector companion Vector. I think at least Vector was a YC company before they got shlorped up by Datadog, but they're still both actively maintained open source.

TZubiri•5h ago

https://en.wikipedia.org/wiki/Sybil_attack

One of the advantages of more expensive providers seems to be that they have good reputation due to a de facto PoW mechanism.

sksjvsla•5h ago

Depends on the use case, right? I don’t accept traffic from random Hetzner IPs — only Cloudflare’s IPs are allowed.

The only potential indirect risks is if your Hetzner VPS IP range gets blacklisted (because some Hetzner clients abuse it for Sybil attacks or spam).

Or if Hetzner infrastructure was heavily abused, their upstream or internal networking could (in theory) experience congestion or IP reputation problems — but this is very unlikely to affect your individual VPS performance.

This depends on what you are doing on Hetzner and how you restrict access but for an ISO-27001 certified enterprise app, I believe this is extremely unlikely.

liampulles•1h ago

(Not OP): On the loki question: yeah our project had a similar issue. I did a lot of playing around with the loki configuration, and what you'll discover by reading their blogs on Loki performance is that the indexing settings they recommend are not the ones that are used by default in helm (and probably other deployment configurations). Once I did some reconfiguration, added read specific instances, and implemented their other recommendations - we did see much better performance.

Just remember: their interest is that you buy their cloud service, not in giving an out-of-the-box great experience on their open source stuff.

jordanbeiber•5h ago

Same here, but Azure. About 90% saved, with a very similar stack.

It is a great big cloud play to make enterprises reliant on the competency in their weird service abstractions, which is slowly draining the quite simple ops story an enterprise usually needs.

ed_mercer•4h ago

Can you please elaborate how Azure is cheaper?

miyuru•4h ago

I think the parent meant that they moved from Azure to Hetzner.

jordanbeiber•4h ago

”Same here” meaning moving to Hetzner, but from Azure - could’ve made it less ambiguous!

Might throw together a post on it eventually:

https://news.ycombinator.com/context?id=43216847

sokoloff•5h ago

Might be interesting, but doesn’t seem to be a valid “Show HN”

* - https://news.ycombinator.com/showhn.html

nopakos•5h ago

I think a European CloudFlare would be nice to exist.

sksjvsla•5h ago

Yes, it would be nice. Given Cloudflare's dev-friendly branding for some reason, I did not mind keeping it.

abc123abc123•4h ago

No problem! https://bunny.net/about/ Enjoy!

miyuru•4h ago

bunny still don't support IPv6 to origin, or else I would have switched.

ToJans•3h ago

We're in the process of migrating away from azure. Currently lots of cloudflare, but also some stuff runs on Hetzner.

If I manage to get https://uncloud.run/ or something similar up & running, the platform will no longer matter, whether it's OVH, Hetzner, Azure, AWS, GCP, ... It should all be possible & easy to switch... #FamousLastWords

saltysalt•4h ago

I love Hetzner, I run my Internet search engine from there: bare metal FTW.

louwrentius•4h ago

I'm involved with a cloud migration myself so I like the topic, but the Medium article contains less information than this "Shown HN" post.

The Medium post is mostly fluff and a lead generator.

sksjvsla•4h ago

The Medium post is more of a high-level case study for a mixed audience (including non-technical decision makers). I intentionally kept the details lighter there, partly to avoid overwhelming readers and partly because the real “meat” (like our Ansible/Terraform patterns, Prometheus config, etc.) is harder to convey in that format without turning it into a giant technical appendix.

I’m happy to share specific configs, diagrams, or lessons learned here on HN if people want — and actually I’m finding this thread a much better forum for that kind of deep dive.

I'll dive into other aspects elsewhere: You can't doubt that given what I am sharing here.

Any particular area you’d like me to expand on? (e.g. how we structured Terraform modules, Ansible hardening, Prometheus alerting, Loki tuning?)

clcaev•2h ago

More detail how you tie ISO and Terraform/Ansible would be welcome.

sksjvsla•1h ago

Here you go. Write me at jk@datapult.dk, if you need more.

A.5.25 Security in development and support processes:

Safe rolling deploy, rollback mechanisms, NGINX health checks, code versioning, Prometheus alerting for deployment issues

A.6.1.2 Segregation of duties:

Separate roles for database, monitoring, web apps; distinct system users

A.8.1.1 Inventory of assets:

Inventory management through Ansible inventory.ini and groups

A.8.2.3 Handling of assets:

Backup management with OVH S3 storage; retention policy for backups

A.8.16 Monitoring activities (audit logging, monitoring):

auditd installed with specific rule sets; Prometheus + Grafana Agent + Loki for system/application/audit log monitoring

A.9.2.1 User registration and de-registration:

ansible_user, restricted SSH access (no root login, pubkey auth), AllowUsers, DenyUsers enforced

A.9.2.3 Management of privileged access rights:

Controlled sudo, audit rules track use of sudo/su; no direct root access

A.9.4.2 Secure log-on procedures:

SSH hardening (no password login, no root, key-based access)

A.9.4.3 Password management system:

Uses Ansible Vault and variables;

A.10.1.1 Cryptographic controls policy:

SSL/TLS certificate generation with Cloudflare DNS-01 challenge, enforced TLS on Loki, Prometheus

A.12.1.1 Security requirements analysis and specification:

Tasks assert required variables and configurations before proceeding

A.12.4.1 Event logging:

auditd, Prometheus metrics, Grafana Agent shipping logs to Loki

A.12.4.2 Protection of log information:

Logs shipped securely via TLS to Loki, audit logs with controlled permissions

A.12.4.3 Administrator and operator logs:

auditd rules monitor privileged command usage, config changes, login records

A.12.4.4 Clock synchronization:

chrony installed and enforced on all hosts

A.12.6.1 Technical vulnerability management:

Lynis, Wazuh, vulnerability scans for Prometheus metrics

A.13.1.1 Network controls:

UFW with strict defaults, Cloudflare whitelisting, inter-server TCP port controls

A.13.1.2 Security of network services:

SSH hardening, NGINX SSL, Prometheus/Alertmanager access control

A.13.2.1 Information transfer policies and procedures:

Secure database backups to OVH S3 (HTTPS/S3 API)

A.14.2.1 Secure development policy:

Playbooks enforce strict hardening as part of deploy processes

A.15.1.1 Information security policy for supplier relationships:

OVH S3, Cloudflare services usage with access key/secret controls; external endpoint defined

A.16.1.4 Assessment of and decision on information security events:

Prometheus alert rules (e.g., high CPU, low disk, instance down, SSL expiry, failed backups)

A.16.1.5 Response to information security incidents:

Alertmanager routes critical/security alerts to email/webhook; plans for security incident log webhook

A.17.1.2 Implementing information security continuity:

Automated DB backups, Prometheus backup job monitoring, retention enforcement

A.18.1.3 Protection of records:

Loki retention policy, S3 bucket storage with rotation; audit logs secured on disk

ArtTimeInvestor•4h ago

How did you decide on Hetzner and OVH and why do you need both?

Have you looked into others as well, like IONOS and Scaleway?

sksjvsla•4h ago

Great question. Technically speaking I might not need both, but I have a gut feeling that one of these cloud providers might not be as hardened as the hyperscalers, and that Russia is just waiting to put one of these two services down. So for maximal resiliency I chose to design from a multi-cloud setup from the beginning.

Scaleway came up but is more expensive. IONOS did not come up in our research.

Part of what we tried to do was to make ourselves independent from traditional cloud services and be really good at doing stuff on a VPS. Once you start doing that, you can actually allow yourself to look more at uptimes and at costs. Also, since we wanted everything to be fully automated, Terraform support was important for us, and OVHcloud and Hetzner had that.

I'm sure there's many great cloud providers out in Europe, but it's hard to vet them to understand if they can meet demand and if they are financially stable. We would want not to keep switching cloud providers. So picking two of the major ones seemed like a safe choice.

handfuloflight•3h ago

What would Russia's interests be in putting these ISPs down, specifically?

sksjvsla•3h ago

Without making it too political and speculating on things I don't know, I, like many other Europeans, have seen plenty of cases of Russia ruining infrastructure projects in Europe, everything from internet cables on the ocean bed, telcos, water supplies, railways and more. Authorities are asking civilians in Scandinavia to be prepare their hiused with. Good and water and are actively hardening security around critical infrastructure, including their software. I won't comment more on this because it's gonna derail this discussion.

anticodon•3h ago

Is there a single proof? Like some Russian citizen was caught ruinining infrastructure project and it was proved that a) he is a citizen of Russia or was paid by Russian authorities, b) that the person in question had indeed done some damage to the infrastructure project.

I don't remember a single such case. I remember reading a lot of speculations like "it's highly likely that it was done by Russians" every single time without a trace of evidence.

hbnjgf•3h ago

Does it matter for the average business, if an infrastructure was brought down buy the Russian state or someone blaming it on the Russians?

It's undeniable that core European infrastructure is targeted currently

rsynnott•2h ago

Okay, I mean, if you want to give poor ol’ Putin the benefit of the doubt, something that looks like a state actor but might theoretically not be Russia is doing a lot of minor to moderate economic sabotage in Europe.

Personally I think the amount of special pleading required to imagine that it is _not_ Russia is a bit much (particularly around the deep sea cable cuts; at that point you’re really claiming that Russia is deniably pretending that it is them, but really it’s someone else), but you do you. It doesn’t change the overarching point; both Hetzner and OVH would be obvious targets for, ah, whoever it is.

rsynnott•2h ago

Russia likes causing trouble in Europe generally (and elsewhere; the Internet Research Agency was largely targeted at the US, say).

jillesvangurp•3h ago

> We rebuilt key AWS features ourselves

At what cost? People usually exclude the cost of DIY style hosting. Which usually is the most expensive part. Providing 24x7 support for the stuff that you've home grown alone is probably going to make large dent into any savings you got by not outsourcing that to amazon.

> $24,000 annual bill felt disproportionate

That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.

This still could make sense. But you aren't telling the full story here. And I bet it's a lot less glamorous when you factor in development time for this.

Don't get me wrong; I'm actually considering making a similar move but more for business reasons (some of our German customers really don't like US hosting companies) than for cost savings. But this will raise cost and hassle for us and I probably will need some re-enforcements on my team. As the CTO, my time is a very scarce commodity. So, the absolute worst use of my time would be doing this myself. My focus should be making our company and product better. Your techstack is fine. Been there done that. IMHO Terraform is overkill for small setups like this; fits solidly in the YAGNI category. But I like Ansible.

hiAndrewQuinn•3h ago

This is what I'm wondering too. 90% is a lovely number to throw around but what is the opportunity cost?

sksjvsla•3h ago

> Cost of DIY and support: You’re absolutely right that 24x7 ops could eat up any savings if you built everything from scratch without automation or if you needed dedicated staff watching dashboards all night. In our case:

• We heavily invested upfront in infrastructure-as-code (Terraform + Ansible) so that infra is deterministic, repeatable, and self-healing where possible (e.g. auto-provisioning, automated backup/restore, rolling updates).

• Monitoring + alerting (Prometheus + Alertmanager) means we don’t need to watch screens — we get woken up only if there’s truly a critical issue.

• We don’t try to match AWS’s service level (e.g. RTO of minutes for every scenario) — we sized our setup to our risk profile and customers’ SLAs.

> True cost comparison:

• The migration was done as part of my CTO role, so no external consulting costs. The time investment paid back within months because the ongoing cost to operate the infra is low (we’re not constantly firefighting).

• I agree that if you had to hire more people just to manage this, it could negate the savings. That’s why for some teams, AWS is still a better fit.

> Business vs. cost drivers: Honestly, our primary driver was sovereignty and compliance — cost savings just made the business case easier to sell internally. Like you, our European customers were increasingly skeptical of US cloud providers, so this aligned with both compliance and go-to-market.

> Terraform / YAGNI: Fair point! Terraform probably is more than we need for the current scale. I went with it partly because it fits our team’s skillset and lets us keep options open as we grow (multi-provider, DR regions, etc).

And, finally, because this, I am posting about it. I am sharing as much as I can, and just spread the work about it. I just sharing my experience and knowledge. If you have any questions or want to discuss further, feel free to reach out at jk@datapult.dk!

sksjvsla•3h ago

https://news.ycombinator.com/item?id=44335920#44336757

randomtoast•3h ago

> Don't get me wrong; I'm actually considering making a similar move but more for business reasons (some of our German customers really don't like US hosting companies) than for cost savings

There will be a new AWS European Sovereign Cloud[1] with the goal of being completely US independent and 100% compliant with EU law and regulations.

[1]: https://www.aboutamazon.eu/news/aws/aws-plans-to-invest-7-8-...

jjani•3h ago

> There will be a new AWS European Sovereign Cloud[1] with the goal of being completely US independent

The idea that anything branded AWS can possibly be US independent when push comes to shove is of course pure fantasy.

randomtoast•3h ago

I don't know, with that argument you can argue that everything is dependent on everything, for instance, the EU automobile industry is hugely dependent on materials and chips from all over the world including US and thus real independence is a pipe dream.

BrandoElFollito•3h ago

This is one of the reasons we were wondering if the US can switch off our fighter jets. The ones we own, brought from the US.

The US clearly state that extraterritoriality is fine with them. Depending on the company, one gag order is enough to sabotage a whole company.

owebmaster•3h ago

Of course you know, the US government can use many methods to enforce their demand, it makes no sense to use an Amazon alternative to Amazon, it's such a nonsense to join a conversation about migrating away from Amazon suggesting that.

ozim•2h ago

But it will check boxes on compliance checklist.

wqaatwt•1h ago

Not on the “political concerns” checklist which is getting more and more important

zius•3h ago

Our customers across EU (hospitals) are not impressed or interested (n=175). Such a delusional project.

The ICC move by MS made hospitals go in an even higher gear to prepare off-ramp plans. From private Azure cloud to “let’s get out”

awongh•3h ago

90% sounds good but the real dollar amount feels low.

Two reasons for this stick out:

- Are the multi-million dollar SV seed rounds distorting what real business costs are? Counting dev salaries etc. (if there is at least one employee) it doesn't seem worth the effort to save $20k - i.e., 1/5 of a dev salary? But for a bootstrapped business $20k could definitely be existential.

- The important number would be the savings as percent of net revenue. Is the business suddenly 50% more profitable? Then it's definitely worth it. But in terms of thinking about positively growing ARR doing cost/benefit on dropping AWS vs. building a new (profitable) feature I could see why it might not make sense.

Edit to add: it's easy to offhand say "oh yeah easy, just get to $2M ARR instead of saving $20k- not a big deal" but of course in the real world it's not so simple and $20k is $20k. The prevalent SV mindset of just spending without thinking too much about profitability is totally delusional except for like 1 out of 10000 startups.

layer8•2h ago

From the blog post: "We are a Danish workforce management company doing employee scheduling." Definitely not a VC-funded SV startup. Probably bootstrapped.

sksjvsla•1h ago

Yes, bootstraped for our own money. It makes a difference.

If I generalize, I see two kinds of groups for whom this reduction of cost does not matter. The first group are VC-funded, and the second group are in charge of +million AWS bill. We do not have anything in common with these companies, but we have something in common with 80% of readers on this forum and 80% of AWS clients.

physix•44m ago

It was cool reading your article.

We're also bootstrapped and use Hetzner, not AWS (except for the occasional test), for very much the same reasons as you.

And we are also fully infrastructure as code using Ansible.

We used to be a pure software vendor, but are bringing out a devtool where the free tier runs on Hetzner. But with traction, as we build out higher tier services, it's an open question on what infrastructure to host it on.

There are a kazillion things to consider, not the least of which is where the user wants us to be.

randomtoast•3h ago

> $24,000 annual bill felt disproportionate

>> That's around 1-2 months of time for a decent devops freelancer. If you underpay your devs, about 1/3rd of an FTE per year. And you are not going to get 24x7 support with such a budget.

In terms of absolute savings, we’re talking about 90% of 24k, that’s about 21.6k saved per year. A good amount, but you cannot hire an SRE/DevOps Engineer for that price; even in Europe, such engineers are paid north of 70k per year.

I personally think the TCO (total cost of ownership) will be higher in the long run, because now every little bit of the software stack has to be managed by their infra team/person, and things are getting more and more complex over time, with updates and breaking changes to come. But I wish them well.

mk89•2h ago

In mid sized companies, creating/using/maintaining AWS resources requires nevertheless one or more teams of devops/sre.

Out of experience, in the long run, this "managed aws saved us because we didn't need people" feels always like the typical argument made by saas sales people. In reality, many services/saas are really expensive, and you probably will only need a few features which sometimes you can rollout yourself.

The initial investment might be higher, but in the long run I think it's worth it. It's a lot like Heroku vs AWS. Superexpensive, but it allows you with little knowledge to push a POC in production. In this case, it's AWS vs self hosted or whatever.

Finally, can we quantify the cost of data/information? This company seems to be really "using" this strategy (= everything home made, you're safe with us) for sales purposes. And it might work, although for the final consumer this might have a higher price, which finally pays the additional devops to maintain the system. So who cares?

How important is for companies to not be subject to CLOUD act or funny stuff like that?

Elinvynia•2h ago

70k? Just hire in Poland/Czechia/Slovakia for 50% off!

Unless by Europe you mean the Apple feature availability special of UK/Germany/France/Spain/Italy

renw0rp•2h ago

My colleagues were talking about salaries in the range of $40-60k... About 8-10 years ago. And I don't think it got any cheaper

mbmjertan•2h ago

Still, it’s highly location-dependent, and mileage varies drastically between countries.

I’m an SWE with a background in maths and CS in Croatia, and my annual comp is less than what you claim here. Not drastically, but comparing my comp to the rest of the EU it’s disappointing, although I am very well paid compared to my fellow citizens. My SRE/devops friends are in a similar situation.

I am always surprised to see such a lack of understanding of economic differences between countries. Looking through Indeed, a McDonald’s manager in the US makes noticeably more than anyone in software in southeast Europe.

debugnik•53m ago

Spain and Italy are closer to the Poland bracket than the UK/Germany one, possibly even lower for some roles.

lossolo•3m ago

You won’t find anyone competent for that kind of money there.

o_m•2h ago

It makes sense if you consider there is a risk you might get kicked out by AWS because the US government force Amazon to close your account. The US is also hinting about going to war against Europe (Greenland), which makes a bad idea to have any connection to the US.

StopDisinfo910•2h ago

> Providing 24x7 support for the stuff that you've home grown alone is probably going to make large dent into any savings you got by not outsourcing that to amazon.

I don’t understand why people keep propagating this myth which is mostly pushed by the marketing department of Azure, AWS and GCP.

The truth is cloud provider doesn’t actually provide 24/7 support to your app. They only ensure that their infrastructure is mostly running for a very loose definition of 24/7.

You still need an expert on board to ensure you are using them correctly and are not going to be billed a ton of money. You still need people to ensure that your integration with them doesn’t break on you and that’s the part which contains your logic and is more likely to break anyway.

The idea that your cloud bill is your TCO is a complete fabrication and that’s despite said bill often being extremely costly for what it is.

steveBK123•1h ago

I think both things are true - people overestimate the level of support provided by AWS, but also re-building the laundry list of stuff OP did in-house to save $24k/year seems onerous.

But the idea that AWS provides some sort of white glove 24/7 support is laughable for anyone that's ever run into issues with one of their products...

hluska•1h ago

Why would cloud providers support anything more than their infrastructure?

kiney•2h ago

your implicit assumption that AWS requires less (exoensive) labour is just not true

sksjvsla•1h ago

Exactly our insight having maintained the same app both places.

wqaatwt•1h ago

> That's around 1-2 months of time for a decent

Presumably they are in Europe? so labour is a few times cheaper here.

> Providing 24x7 support

They are not maintaining the hardware itself and it’s not like Amazon is doing providing devops for free. Unless you are using mainly serverless stuff the difference might not be that significant

rz2k•1h ago

Isn’t $24k also a naive accounting of the annual cost of AWS in this case? What FTE-equivalent was required to set up the services they use at AWS? What FTE-equivalent is required to keep the annual bill down to $24k from say $48k or $100k?

Garlef•1h ago

My last contact with AWS support (100€/month tier) was someone feeding me LLM generated slop that contained hallucinations about nonexistent features and configuration options.

heisenbit•1h ago

AWS features may be expensive to replicate 100% but what if one only needs 80%. One also needs to consider the effort involved in configuring AWS and maintaining the skills for that. Then there are opportunity costs of using e.g. AWS dashboards vs. better ones with grafana etc..

I guess a lot depends on size, diversity and dynamics of the demand. Not every nail benefits from contact with the biggest hammer in the toolbox.

hk1337•3h ago

Does anybody care, besides you, that you’re ISO 27001 compliant? I thought SSAE 16 and other SSAE standards were the main things people were concerned with having?

Freak_NL•3h ago

Pff… You wish. Depending on the sector you are in, ISO 27001 can either be a hard requirement (either directly or through national standards built upon it, like the Dutch healthcare NEN 7510) or completely irrelevant. If this company needs it, you can bet their customers need it — usually because they in turn are required to do so because of regulations.

hbnjgf•3h ago

Seems to depend on industry and/or region.

Most of our customers have a hard requirement on ISO 9001. Many on ISO 27001, too. The rest strongly prefers a partner having a plan to get ISO 27001

folmar•55m ago

You must be looking from US perspective. In EU I don't think I've seen any SSAE provided or wanted, and I've seen a bit of medium to big industry.

anticodon•3h ago

I'm not surprised about 90% of savings. I remember that initially AWS was promoted everywhere as being "cheaper" than your own hardware, colocation or VPS/VDS hosting.

Once I was working in a quite small company (around 100 employees) that hosted everything on AWS. Due to high bills (it's a small company that resided in Asia) and other problems, I migrated everything to DigitalOcean (we still used AWS for things like SES), and the monthly bill for hosting became like 10 times lower. With no other consequences (in other words, it haven't become less reliable).

I still wonder who calculated that AWS is cheaper than everything else. It's definitely one of the most expensive providers.

jjani•3h ago

Interesting, comparing commodity services (VMs, storage etc) like-for-like, DO has always seemed more expensive than AWS. Do you remember what was the main source of savings?

peer2pay•3h ago

My memory might be off here but wasn’t the initial AWS "cheaper" promise only made vs buying & maintaining your own hardware?

sam_lowry_•3h ago

I did a successful AWS to Hetzner migration myself once, and I'd like to make a business of "back-to-earth migrations" but clients are hard to find.

Everyone talks about it but none wants to be the first mover.

dijit•3h ago

When the money really starts drying up it will be on the table again, for now it's "someone elses money" at worst and "less profits for the shareholders" at best (which is not an incentive for an engineer on the ground).

There's also a lot of FUD regarding hiring more staff, my observed experience is that hyperscalers need an equivalent number of people on rotation- it's just different skills (learning the intricacies/quirks of different product offerings on the hyperscaler vs CS/Operational fundamentals) - so everyone is scared to overload their teams with work and potentially need to hire people -- you can couple this with the fact that all migrations are up-front expensive and change is bad by default.

There will come a day where there simply isn't enough money to spend 10x the cost on these systems. It will be a sad day for everyone because salaries will be depressed too, and we will opine the days of shiny tools where we could make lots of work disappear by saying that our time is too expensive to work with such peasant issues.

sgt•3h ago

For those wondering about ISO 27001 - it's a standard for international security management, and popular in Europe.

However in the US it's not very relevant or even interesting to companies, and some European companies fail to understand that.

SOC 2 is the default and the preferred standard in the US - it's more domestic and less rigid than ISO 27001.

ozim•2h ago

ISO27001 I wouldn’t call rigid, most of the stuff you should be doing anyway if you use any software.

checking for evidence that you are doing those things I would call ridgit. SOC2 as attestation doesn’t require so much documentation.

sgt•2h ago

Sure, it depends on the implementation of your ISMS. Ideally you want to follow the control guidance in 27002. They've done a lot of thinking on this.

candiddevmike•1h ago

Having been through both, I much prefer the "rigid" ISO 27001 as the SOC2 audits seem to be based on how well you vibed with the auditor and the auditors competency more than anything. The things they are auditing seem overly broad and open to interpretation, and the auditors descriptions of your controls can easily be twisted.

cataflam•3h ago

Happy for you, don't get me wrong, but your post is not particularly news, I'm guessing everyone on HN knows bare metal/VPS providers are cheaper than AWS/Azure/GCP.

And also lacking a bit in details:

- both technical (e.g. how are you dealing with upgrades or multi-data center fallback for your postgresql), and

- especially business, e.g. what's the total cost analysis including the supplemental labor cost to set this up but mostly to maintain it.

Maybe if you shared your scripts and your full cost analysis, that would be quite interesting.

sksjvsla•1h ago

> Techincal

I'm trying to share as much technical across this thread as for your two examples:

System upgrades:

Keep in mind that as per the ISO specification, system upgrades should be applied but in a controlled manner. This lends itself perfectly to the following case that is manually triggered.

Since we take steps to make applications stateless, and Ansible scripts are immutable:

We spin up a new machine with the latest packages and once ready it join the Cloudflare load balancer. The old machines are drained and deprovisioned.

we spin up a new machine We have a playbook that iterates through our machines and does it per machine before proceeding. Since we have redundancy on components, this creates no downtime. The redundancy in the web application is easy to achieve using the load balancer in Cloudflare. For the Postgres database, it does require that we switch the read-only replica to become the main database.

DB failover:

The database is only written and read from by our web applications. We have a second VM on a different cloud that has a streaming replication of the Postgres database. It is a hot standby that can be promoted. You can use something like PG Bouncer or HAProxy to route traffic from your apps. But our web framework allows for changing the database at runtime.

> Business

Before migration (AWS): We had about 0.1 FTE on infra — most of the time went into deployment pipelines and occasional fine-tuning (the usual AWS dance). After migration (Hetzner + OVHCloud + DIY stack): After stabilizing it is still 0.1 FTE (but I was 0.5 FTE for 3-4 months), but now it rests with one person. We didn’t hire a dedicated ops person. On scaling — if we grew 5-10×: * For stateless services, we’re confident we’d stay DIY — Hetzner + OVHCloud + automation scales beautifully. * For stateful services, especially the Postgres database, I think we'd investigate servicing clients out of their own DBs in a multi-tenant setup, and if too cumbersome (we would need tenant-specific disaster recovery playbooks), we'd go back to a managed solution quickly.

I can't speak for cloud FTE toll vs a series of VPS servers in the big boys league ($ million in monthly consumption) and in the tiny league but at our league it turns out that it is the same FTE requirement.

Anyone want to see my scripts, hit me up at jk@datapult.dk. I'm not sure it'd be great security posture to hand it out on a public forum.

BrandoElFollito•3h ago

Any reasons to go for certbot instead of Traefik or Caddy?

sksjvsla•2h ago

We use cloudflare as the WAF and loadbalancer, which makes traefik less relevant and Certbort easier to couple.

nnurmanov•2h ago

I moved from managed AWS to unmanaged AWS (lightsail), decreasing the cost significantly and still staying in AWS ecosystem. I use S3, Route53, SES and other cheap services, you could consider this path

martypitt•2h ago

> A combination of Prometheus, Grafana, and Loki allowed us to replicate — and in some ways exceed — the visibility we had on AWS

Given these existence of these tools, which are fantastic, I'm often stunned at how sluggish, expensive and how lacklustre the UX is of the AWS monitoring stack.

Monitoring quickly became the most expensive, and most unpleasant part of our AWS experience.

mystifyingpoi•1h ago

When I discovered that Live Tail (an equivalent of looking at the logs with `tail -f ...`) is paid, I laughed out loud. The most obvious functionality for everyday looking at logs is not free. CW is pain.

yread•1h ago

I don't get the numbers. It used to be 24000$/year. You saved 90%. So you're spending 200$ a month at Hetzner? That's literally one EPYC server. You really don't need distributed systems for that. Can you talk a bit more about requests per second or number of users?

robin_reala•1h ago

$2400, not $200.

enronmusk•1h ago

$2400/year is $200/month. He is correct.

sksjvsla•1h ago

You can't do single server setup for your workloads if you are ISO 27001 compliant and, further, you must have a separate server for logging and monitoring.

No matter load, there is a need for complexity for this certification.

Not all employees log in daily. For a scheduling app, most people check a few times a week, but not every day.

Daily active users (DAU) = around 10,000 to 20,000

Peak concurrency (users on at the exact same time) = generally between 1,500 to 2,000 at busy times (like when new schedules drop or at shift start/end times)

Average concurrent users at any random time = maybe 50 to 150

Why cloud costs can add up even for us:

Extensive use of real-time features and complex labour rules mean the app needs to handle a lot of data processing and ultimately sync into salary systems.

An example:

Being assigned to a shift has different implications for every user. It may trigger a nuisance bonus, and such a bonus could further only be triggered in certain cases, such as when you had the shifts assigned compared to when it start time.

Lastly, there is the optimizing of a schedule why is computationally expensive.

liampulles•1h ago

Part of what I expect to get when I pay AWS is that it reduces my operational burden, and this has been true in my experience. I've almost forgotten about all the prep, the stress, etc. that comes from upgrading deprecated mysql clusters now that I've gotten used to using the AWS managed equivalents.

That is not to say that this aspect alone justifies huge fees, but it does have significant value.

sksjvsla•49m ago

This comment is great. Upgrade processes should be part of your internal processes if you want to get ISO 27001 certification that is not just checking the boxes but actually something you use for more control of your development and release cycle.

AWS RDS does not upgrade major or minor versions of Postgres or, as you mentioned, MySQL. In that case, they might patch update it. But these patch updates are easy to do yourself and does not take long to be reminded of in your ISMS and then subsequently carry out.

The purpose of this post is not to justify cloud hyperscalers versus European servers. It is actually a post on how to manage a highly regulated, compliant, and certified server setup yourself outside AWS because so many people just have their ISO certification on AWS infrastructure and once they got that they are never able to leave AWS again.

If you have no client demand and no real need to work on updating your infrastructure yourself, then you can go ahead and not go for an ISO 27001 certification and let AWS RDS update as it pleases. But if you operate a complex beast in a regulated industry such as employment law, finance, and such, then you get some more fun challenges and higher need for control.

0xjunhao•1h ago

With the rise of Agentic AI, this increasingly feels like the right move, unless AWS drastically lowers their prices.

sksjvsla•48m ago

Agreed, LLMs helped us with this.

ksec•44m ago

I know OVH and Hetzner gets mentioned a lot as European Cloud, but I thought I should bring UpCloud [1] for HN's attention. I also sometimes think OVH and Hetzner are not a fair comparison as much as I want competition to HyperScaler. Hetzner uses consumer grade component with a few server grade selections.

[1] https://upcloud.com

Behind the scenes: Redpanda Cloud's response to the GCP outage

Cosmoe: BeOS Class Library on Top of Wayland

Scaling our observability platform by embracing wide events and replacing OTel

Using Microsoft's New CLI Text Editor on Ubuntu

Samsung embeds IronSource spyware app on phones across WANA

Delta Chat is a decentralized and secure messenger app

A new blood type discovered in France: "Gwada negative", a global exception

Fundamental Problems of Lisp, the Cons Cell (2024)

Phoenix.new – Remote AI Runtime for Phoenix

Life as Slime

Show HN: EchoStream – A Local AI Agent That Lives on Your iPhone

AbsenceBench: Language models can't tell what's missing

Harper – an open-source alternative to Grammarly

YouTube's new anti-adblock measures

Unexpected security footguns in Go's parsers

Captain Cook's missing ship found after sinking 250 years ago

Record DDoS pummels site with once-unimaginable 7.3Tbps of junk traffic

Airpass – easily overcome WiFi time limits

Plastic bag bans and fees reduce harmful bag litter on shorelines

Show HN: A color name API that maps hex to the closest human-readable name

Sega mistakenly reveals sales numbers of popular games

Mathematicians hunting prime numbers discover infinite new pattern

Microsoft suspended the email account of an ICC prosecutor at The Hague

The Unreasonable Effectiveness of Fuzzing for Porting Programs

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Astronomers locate universe's 'missing' matter in the largest cosmic structures

Learn You Galois Fields for Great Good (00)

Show HN: Nxtscape – an open-source agentic browser

Augmented Vertex Block Descent (AVBD)

Tiny Undervalued Hardware Companions (2024)

We moved from AWS to Hetzner, saved 90%, kept ISO 27001 with Ansible

Comments

Behind the scenes: Redpanda Cloud's response to the GCP outage

Cosmoe: BeOS Class Library on Top of Wayland

Scaling our observability platform by embracing wide events and replacing OTel

Using Microsoft's New CLI Text Editor on Ubuntu

Samsung embeds IronSource spyware app on phones across WANA

Delta Chat is a decentralized and secure messenger app

A new blood type discovered in France: "Gwada negative", a global exception

Fundamental Problems of Lisp, the Cons Cell (2024)

Phoenix.new – Remote AI Runtime for Phoenix

Life as Slime

Show HN: EchoStream – A Local AI Agent That Lives on Your iPhone

AbsenceBench: Language models can't tell what's missing

Harper – an open-source alternative to Grammarly

YouTube's new anti-adblock measures

Unexpected security footguns in Go's parsers

Captain Cook's missing ship found after sinking 250 years ago

Record DDoS pummels site with once-unimaginable 7.3Tbps of junk traffic

Airpass – easily overcome WiFi time limits

Plastic bag bans and fees reduce harmful bag litter on shorelines

Show HN: A color name API that maps hex to the closest human-readable name

Sega mistakenly reveals sales numbers of popular games

Mathematicians hunting prime numbers discover infinite new pattern

Microsoft suspended the email account of an ICC prosecutor at The Hague

The Unreasonable Effectiveness of Fuzzing for Porting Programs

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Astronomers locate universe's 'missing' matter in the largest cosmic structures

Learn You Galois Fields for Great Good (00)

Show HN: Nxtscape – an open-source agentic browser

Augmented Vertex Block Descent (AVBD)

Tiny Undervalued Hardware Companions (2024)