Is Northern Virginia still the least reliable AWS region?

https://statusgator.com/blog/aws-least-reliable-region-in-2025/

104•colinbartlett•1mo ago

Comments

david_shaw•1mo ago

Yes, it's the least reliable. Thanks for summarizing the data here to illustrate the issue.

It's often seen as the "standard" or "default" region to use when spinning up new US-based AWS services, is the oldest AWS center, has the most interconnected systems, and likely has the highest average load.

It makes sense that us-east-1 has reliability problems, but I wish Amazon was a little more upfront about some of the risks when choosing that zone.

Forgeties79•1mo ago

Nobody ever got fired for connecting to us-east-1

yibers•1mo ago

Ass covering-wise, you are probably better off going down with everyone else on us-east-1. The not so fun alternative: being targeted during an RCA explaining why you chose some random zone no one ever heard of.

riffic•1mo ago

how about following the well-architected framework and building something with a suitable level of 9s where you can justify your decisions during a blameless postmortem (please stamp your buzzword bingo card for a prize.)

paradox460•1mo ago

We vibe code everything in flavor of the month node frameworks, tyvm, because elixir is too hard to hire for (or some equally inane excuse)

DANmode•1mo ago

I agree with your post conceptually.

However: Don’t underestimate community support (in the areas you’re likely to want it) when comparing development stacks.

paradox460•1mo ago

Conversely, a community means nothing if they flit from one "best practice" to another

transcriptase•1mo ago

I look forward to the eventual launch of a new and improved version of your app using electron.

What’s the point in having 64 Gb of DDR5 and 16 cores @ 4.2 GHz if not to be able to have a couple electron apps sitting at idle yet somehow still using the equivalent computational resources of the most powerful supercomputer on earth in the mid 1990s.

paradox460•1mo ago

We also plan to incorporate a full local llm, to ensure we fill the memory up. It will be used to direct people to our online knowledge base, which will always be empty

transcriptase•1mo ago

Make sure another LLM summarizes pages upon loading, but doesn’t load any content before that completes. Each page should have a few megs of JS tracking scripts siphoning the users CPU to create massive logs on AWS that nobody will ever use to improve anything.

Oh and put everything behind the strictest cloudflare settings you can, so that even a whiff of anything that’s not a Windows 11 laptop or iPhone on a major U.S. network residential or mobile IP gets non-stop bot checks!

thejosh•1mo ago

Bandwidth cost is also another major reason.

rconti•1mo ago

Places nobody's ever heard of like "Ohio" or "Oregon"?

Yeah, I'm not worried about being targeted in an RCA and pointedly asked why I chose a region with way better uptime than `us-tirefire-1`.

What _is_ worth considering is whether your more carefully considered region will perform better during an actual outage where some critical AWS resource goes down in Virginia, taking my region with it anyway.

xingped•1mo ago

IIRC, some AWS services are solely deployed on and/or entirely dependent on us-east-1. I don't recall which ones, but I very distinctly remember this coming up once.

cj•1mo ago

AWS IAM has caused multiple cross-region outages.

nothrabannosir•1mo ago

CloudFront certificates

nexus-uw•1mo ago

IAM

technicalape•1mo ago

Everything new basically, like the AI services.

paulddraper•1mo ago

IAM and Route53 have dependencies on us-east-1.

AWS Organizations/Account management is us-east-1.

And if you want a CDN with a custom hostname and want TLS…you have to use us-east-1.

TonyCoffman•1mo ago

The Route53 control plane is in us-east-1, with an optional temporary auto-failover to us-west-2 during outages. The data plane for public zones is globally distributed and highly resilient, with a 100% SLA. It continues to serve DNS records during regular control plane outages in us-east-1, but access to make changes is lost during outages.

CloudFront CDN has a similar setup. The SSL certificate and key have to be hosted in us-east-1 for control plane operations but once deployed, the public data plane is globally or regionally dispersed. There is no auto failover for the cert dependency yet. The SLA is only three 9s. Also depends on Route53.

The elephant in the room for hyperscalers is the potential for rogue employees or a cyber attack on a control plane. Considering the high stakes and economic criticality of these platforms, both are inevitable and both have likely already happened.

throwawaysleep•1mo ago

This to me was the real lesson of the outage. A us-east-1 outage is treated like bad weather. A regional outage can be blamed on the dev. us-east-1 is too big to get blamed, which is why it should be the region of choice for an employee.

dontdoxxme•1mo ago

Why aren't you using IBM cloud?

throwawaysleep•1mo ago

If IBM still had a good reputation, I probably would.

skissane•1mo ago

I’ve seen people go with IBM Cloud because their salespeople were willing to discount more heavily than AWS/GCP/Azure were. Tier 2 players can be hungrier for your business than tier 1 are. And here I’m talking about completely mainstream workloads (Linux, K8S, etc)

Separately from that, if you are trying to move certain types of non-mainstream IBM workloads to cloud (AIX, IBM i, z/OS) then IBM is tier 1 in that case

Esophagus4•1mo ago

Bizarre way of making decisions.

us-east-2 is objectively a better region to pick if you want US east, yet you feel safer picking use1 because “I’m safer making a worse decision that everyone understands is worse, as long as everyone else does it as well.”

nemomarx•1mo ago

It's about risk profile. The question isn't "which region goes down the least" but "how often will I be blamed for an outage."

If you never get blamed for a US east outage, that's better than us-east-2 if that could get you blamed 0.5% of the time when it goes down and us1 isn't down or etc

Esophagus4•1mo ago

But ise1 is down 4x more than use2 (AWS closely guards the numbers and won’t release them, but that is what I’ve seen from 3rd party analysis). Don’t you want your customers to say, “wow, half the internet was down today but XYZ service was up with no issues! I love them.”

I can’t tell if it’s you thinking this way, or if your company is setup to incentivize this. But either way, I think it’s suboptimal.

That’s not about “risk profile” of the business or making the right decision for the customer, that’s about risk profile of saving your own tail in the organizational gamesmanship sense. Which is a shame, tbh. For both the customer and for people making tech decisions.

I fully appreciate that some companies may encourage this behavior, and we all need a job so we have to work somewhere, but this type of thinking objectively leads to worse technology decisions and I hope I never have to work for a company that encourages this.

Edit: addressing blame when things go wrong… don’t you think it would be a better story to tell your boss that you did the right thing for the customer, rather than “I did this because everyone else does it, even though most of us agree it’s worse for the customer in general”. I would assume I’d get more blame for the 2nd decision than the 1st.

throwawaysleep•1mo ago

> Don’t you want your customers to say, “wow, half the internet was down today but XYZ service was up with no issues! I love them.”

See any companies getting credit for it in the last AWS outage? I didn't. My employers didn't reward vendors who stayed up during it.

Esophagus4•1mo ago

We got credit for it.

Shame about your employer, though.

TheNewsIsHere•1mo ago

I also don’t understand this.

US-East-2 staying up isn’t my responsibility. If I need my own failover, I’m going to select a different region anyway.

And it’s not like US-East-2 isn’t already huge and growing. It’s effectively becoming another US-East-1.

throwawaysleep•1mo ago

> US-East-2 staying up isn’t my responsibility.

No, but you can be blamed if other things are up and yours is not. If everyone's stuff is down, it is just a natural disaster.

naet•1mo ago

If my cloud provider goes down and my site is offline, my customers and my boss will be upset with me and demand I fix it as fast as possible. They will not care what caused it.

If my cloud provider goes down and also takes down Spotify, Snapchat, Venmo, Reddit, and a ton of other major services that my customers and my boss use daily, they will be much more understanding that there is a third party issue that we can more or less wait out.

Every provider has outages. US-east-2 will sometimes go down. If I'm not going to make a system that can fail over from one provider to another (which is a lot of work and can be expensive, and really won't be actively used often), it might be better to just use the popular one and go with the group.

Esophagus4•1mo ago

us-east-2 goes down far, far less frequently than us-east-1. AWS doesn’t publicly release the outage numbers (they hold them very close to the chest) but some people have compiled the stats on their own if you poke around.

The regions provide the same functionality, so I see genuinely no downside or additional work to picking the 2 regions over the 2 regions.

It seems like one of those no brainer decisions to me. I take pride in being up when everyone else is down. 5 9s or bust, baby!

kristianc•1mo ago

I find it funny that we see complaints about why software quality has got worse alongside people advocating to choose objectively risky AWS regions for career risk and blame minimisation reasons.

goalieca•1mo ago

This was always the case. The OG saying was “no one got fired for buying IBM”. Then it was changed to Microsoft. And so on..

throwawaysleep•1mo ago

They are for the same reason. How do customers react to either? If us-east-1 fails, nobody complains. If Microsoft uses a browser to render components on Windows and eats all of your RAM, nobody complains.

bigstrat2003•1mo ago

Oh, people complain. The companies responsible have just gotten to the point where they are so entrenched that they don't need to care at all about customer complaints.

zx8080•1mo ago

It all sticks with the 'monopoly' scent.

zx8080•1mo ago

The value now is not really money from customers, but a company's share price or valuation. That, together with the hard push for subscriptions from every single app and service, devaluated customer experience and feedback. Because not many will go through the hell of unsubscribing process even after the outage or serious issues like private data stolen.

There's just not much motivation left to do better systems.

nothrabannosir•1mo ago

> being targeted during an RCA explaining why you chose some random zone no one ever heard of.

“Duh, because there’s an AZ in us-east-1 where you can’t configure EBS volumes for attachment to fargate launch type ECS tasks, of course. Everybody knows that…”

jordanb•1mo ago

Istr major resource unavailability in US-East-2 during one of the big US-East-1 outages because people were trying to fail over. Then a week later there was a US-East-2 outage that didn't make the news.

So if you tried to be "smart" and set up in Ohio you got crushed by the thundering herd coming out of Virginia and then bit again because aws barely cares about you region and neither does anyone else.

The truth is Amazon doesn't have any real backup for Virginia. They don't have the capacity anywhere else and the whole geographic distribution scheme is a chimera.

Fhch6HQ•1mo ago

This is an interesting point. As recently as mid-2023 us-east-2 was 3 campuses with a 5 building design capacity at each. I know they've expanded by multiples since, but us-east-1 would still dwarf them.

Makes one wonder, does us-west-2 have the capacity to take on this surge?

redditor98654•1mo ago

us-west-2 is indeed very large, but will still not be able to take a full failover from us-east-1

g947o•1mo ago

> explaining why you chose some random zone no one ever heard of

Is this from real experience of something that actually happened, or just imagined?

The only things that matter in a decision are:

* Services that are available in the region

* (if relevant and critical) Latency to other services

* SLAs for the region

Everything else is irrelevant.

If you think AWS is so bad that their SLAs are not trustworthy, that's a different problem to solve.

theturtle•1mo ago

I searched for it, and did not find, the word "backhoe."

Big fail.

I have said for years, never ascribe to terrorism what can be attributed to some backhoe operator in Ashburn, Virginia.

We got a lotta backhoes in northern Virginia.

arusahni•1mo ago

The sorting for the "Duration" column appears to be lexicographical, not numeric.

davidfstr•1mo ago

I intentionally avoid using us-east-1 for anything, since I’ve seen so many outages.

temp0826•1mo ago

us-east-1 is often a lynchpin for services worldwide. Something hinky happening to dns or dynamodb in us-east-1 will probably wreck your day regardless of where you set up shop.

secondcoming•1mo ago

We get constant resource issues in GCP’s us-east4 region

nadis•1mo ago

Cackling while reading this visiting my family in Northern Virginia for the holidays. Despite it being a prominent place in the history of the web, it's still the least reliable AWS region (for now).

rayiner•1mo ago

Its nice to know that where I grew up is Too Big to Fail lol.

the__alchemist•1mo ago

I don't know if this is still true, or related, but that area used to be (Circa 10-30 years ago) very highly prone to power outages. The reason was lots of old trees near the lines that would inevitably fall; blackouts in local areas were common due to this.

Fhch6HQ•1mo ago

That's an interesting data point, but I don't think it's relevant. The datacenters themselves are designed with a high level of power reliability and can island themselves if needed.

We've started to see some rather interesting consequences for grid reliability: https://blog.gridstatus.io/byte-blackouts-large-data-center-...

noosphr•1mo ago

At 34 hours of downtime that's two nines of uptime

At this point my garage is tied for reliability with us-east-1 largely because it got flooded 8 month ago.

emersonrsantos•1mo ago

Glad to use us-west-2 for reasons.

kankerlijer•1mo ago

There are only two kinds of cloud regions: the ones people complain about and the ones nobody uses

kachapopopow•1mo ago

I like this a lot, this is a great comparison for hetzner american offerings since it's not big enough for them to even bother investing much into it so there's not that many complains about it. People just dumping it (me included) after discovering the amount of random issues it has probably also doesn't help.

if you are using hetzner: avoid everything other than fra region, ideally pray that you are placed in the newer part of the datacenter since it has the upgraded switching spine I haven't seen the old one in a bit so they might have deprecated it entirely.

jeltz•1mo ago

Hetzner does not have any "fra region". They have Helsinki, Falkstien and Nuremberg in Europe. None of them which has any issues as far as I know. They used to have some issues with the very old stuff in Falkstien.

kachapopopow•1mo ago

sorry, fsn* I have them typod fra internally and keep messing it up since it's stuck in my head.

Manouchehri•1mo ago

Yeah, I was often the single source of reporting Claude outages (or even missing support completely) on less commonly used Amazon Bedrock regions.

htrp•1mo ago

Which regions were you using ? ( Thought claude had global inference support that routed to all regions)

Manouchehri•1mo ago

I believe I was using us-east-2.

In the early days of cross-region inference, less people were using it, and there was basically no monitoring (and/or alerting) on Amazon's side.

The cross-region and global inference routing is... odd at times.

joe_the_user•1mo ago

A sound banker, alas, is not one who foresees danger and avoids it, but one who, when he is ruined, is ruined in a conventional and orthodox way along with his fellows, so that no one can really blame him. JM Keynes

paulddraper•1mo ago

That is incredibly appropriate.

vasco•1mo ago

Eu-west-1 is miles better and is huge

therobots927•1mo ago

Of course it is, all of the NSA men in the middle add a lot of overhead that can interfere with regular operations.

alexjurkiewicz•1mo ago

I think part of this is that Status Page updates require AWS engineers to post them. In the smaller Tokyo (ap-northeast-1) region, we've had several outages which didn't appear on the status page.

calmbonsai•1mo ago

Answer these questions:

- Is X region and its services covered by a suitable SLA? https://aws.amazon.com/legal/service-level-agreements/

- Does X region have all the explicit services you need? (note things like certs and iam are "global" so often implicitly US-East-1)

- What are your PoP latency requirements?

- Do you have concerns about sovereign data: hosting, ingress, and egress? https://pages.awscloud.com/rs/112-TZM-766/images/AWS_Public_...

JojoFatsani•1mo ago

Yes

bzGoRust•1mo ago

The test environment is deployed on us-east-1, whereas the production environment deployed on us-west-2 on our side.

yearolinuxdsktp•1mo ago

Us-east-1 is far far from least reliable. It’s one of the more reliable ones. Smaller regions tend to have more reliability issues affecting the entire AZ.

This analysis is skewed due to the major incident in 2025. What was the data for 2024 and over the last, say, 5 years? So the proclamation of least reliable of us-east-1 is based on 1 year of data, and it’s probably fair to say that at least last 3 years if not 5 are a better predictor of reliability.

us-east-1 also hosts some special things, so it will have more services to lose.

mlhpdx•1mo ago

I stopped deploying to a single region for production years ago, so I don’t really have a horse in this region comparison race. That said, I’ve seen network level issues in every region I use — nothing like the big outage, but issues that may disrupt a service. Designing for how the world is rather than how I wish it was makes a lot of sense to me.

bob1029•1mo ago

I think if you need something more reliable than us-east-1 that you should be hosting on prem in facilities you own and operate.

There aren't that many businesses that truly can't handle the worst case (so far) AWS outage. Payment processing is the strongest example I can come up with that is incompatible with the SLA that a typical cloud provider can offer. Visa going down globally for even a few minutes might be worse than a small town losing its power grid for an entire week.

It's a hell of a lot easier to just go down with everyone else, apologize on Twitter, and enjoy a forced snow day. Don't let it frustrate you. Stay focused on the business and customer experience. It's not ideal to be down, but there are usually much bigger problems to solve. Chasing an extra x% of uptime per year is usually not worth a multicloud/region clusterfuck. These tend to be even less resilient on average.

jl6•1mo ago

> worst case (so far)

It’s kind of amazing that after nearly 20 years of “cloud”, the worst case so far still hasn’t been all that bad. Outages are the mildest type of incident. A true cloud disaster would be something like a major S3 data loss event, or a compromise of the IAM control plane. That’s what it would take for people to take multi-region/multi-cloud seriously.

svelle•1mo ago

> A true cloud disaster would be something like a major S3 data loss event

So like the OVH data center fire back in 2021?

jl6•1mo ago

No, a major one.

(No shade on OVH, but they are ~1% market share player)

dijit•1mo ago

I mean, EBS went offline and people were ok to continue using AWS…

https://arstechnica.com/information-technology/2011/04/amazo...

roger01•1mo ago

> compromise of the IAM control plane

You mean like stealing the master keys for Azure? Oh wait a minute...

nineteen999•1mo ago

> It's a hell of a lot easier to just go down with everyone else, apologize on Twitter, and enjoy a forced snow day.

You forget things like emergency services. If we were to rely on AWS (even with a backup/DR zone in another region), and were to go down with everyone else and twiddle our fingers, houses burn down, people die, and our company has to pay abatements to the govt.

bmitch3020•1mo ago

This story missed a glaring detail. There are simply more data centers in northern VA [0]. More than the rest of the US by a wide margin, or the entire EU+Asia. Things break here because it's where most things are.

[0]: https://www.datacenters.com/providers/amazon-aws/data-center...

vivzkestrel•1mo ago

sort by duration on that page is broken

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?

Hacking up your own shell completion (2020)

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

GLM-OCR: Accurate × Fast × Comprehensive

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

Show HN: AboutMyProject – A public log for developer proof-of-work

Expertise, AI and Work of Future [video]

So Long to Cheap Books You Could Fit in Your Pocket

PID Controller

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

Kubernetes MCP Server

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

What were the first animals? The fierce sponge–jelly battle that just won't end

Sidestepping Evaluation Awareness and Anticipating Misalignment

OldMapsOnline

What It's Like to Be a Worm

Don't go to physics grad school and other cautionary tales

Lawyer sets new standard for abuse of AI; judge tosses case

AI anxiety batters software execs, costing them combined $62B: report

Bogus Pipeline

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

Is Northern Virginia still the least reliable AWS region?

Comments