I am inspired now to dump my databases and rsync the content on a schedule.
1. A hard drive in a fire safe.
2. An S3 bucket, mediated by Wasabi.
3. My friend's server that lives at his house half a continent away.
It would be nice to have a fourth location that's a physical hard-drive that lives outside of my house, but close enough to drive to for pick-up, but it would mean either paying for a safety deposit box as you mentioned, or hassling a friend once a week as I come to pick it up and deposit it.
Re: "The Support Agents Who Became LLMs"; yes, institutionalized support is terrible almost everywhere. Partly because it costs real money to pay real humans to do it properly, so it ends up as a squeezed cost centre.
Translation: "someone noticed it trending on HN, decided it was bad publicity, and that they should do something about it"
Implication: what mattered was the bad publicity, not the poor support infrastructure. The latter won't change, and the next person with similar problems will get the same runaround, and probably lose their data.
/c (cynic, but I suspect realist)
Disabling a legitimate in-use account is one of our absolute nightmares, and I don't care if it was an account paying $3/month we would be having a review of that with our top level management (including our CEO - Matt Garman) no matter how we found out about it. For us, there is not some acceptable rate of this as a cost of doing business.
And disabling an in use account was not the issue here. There not being a way to get the account re enabled is the issue.
At least one layer of human support needs to have the ability -- not just the ability, but the obligation! -- to escalate to your team when a customer service problem occurs that doesn't fit a pattern of known/active scams and they are unable to help the customer themselves. Sounds like that's not currently the case.
In these cases, it's also really important that customer support stick to a script and can't be abused as part of social engineering, hijacking, or fraud check bypass. "No we can't reset your account" is a very important protection too. I agree that there is an obligation to escalation, but I suspect the focus of the COE will be on how we could have detected this without human judgement. There's got to be a way.
Might be your nightmare but at the same time there is no way for your customers to report it or your own support agents to escalate that something wrong might have happened and someone should look again ...
The various teams (anti-fraud and support) are investigating how we failed this customer so we can improve and hopefully keep this from happening again. (This is the ‘Correction of Error’ process that’s being worked on. And CoE’s aren’t a punitive ‘blame session’ - it’s figuring out how a problem happened and how we can fix or avoid it systemically going forward).
To be fair, the publicity did mean that multiple people were flagging this and driving escalations around it.
>My data is back. Not because of viral pressure. Not because of bad PR. [...]
>“I am devastated to read on your blog about the deletion of your AWS data. I did want to reach out to let you know that people in positions of leadership, such as my boss, are aware of your blog post and I’ve been tasked with finding out what I can, and to at least prevent this from happening in the future.”
So, yes, because of bad PR. Or, at least the possibility of the blog blowing up into a bad PR storm. I'm guessing that if there was no blog, the outcome would be different.
But here’s what I learned from this experience: If you are stuck in a room full of deaf people, stop screaming, just open the door and go find someone who can hear you.
The 20 days of pain I went through, it wasn’t because AWS couldnt fix it.
It’s because I believed that one of the 9 support agents would eventually break script and act like a human. Or that they get monitored by another team.
Turns out, that never happened.
It took someone from outside the ticketing system to actually listen and say: Wait. This makes no sense.
At my small business, we proactively monitor blogs and forums for mentions of our company name so that we can head off problems before they become big. I'm extremely confident that is what happened here.
It was PR-driven in the proactive sense. Which is still PR-driven. (which, by the way, I have no problem with! the problem is the shitty support when it isn't PR-driven)
Regardless, I 100% feel your pain with dealing with support agents that won't break script, and I am legitimately happy that you both got to reach someone that was high enough up the ladder to act human and that they were able to restore your data.
Yes, it is totally possible that AWS monitors blogs and forums for early damage control, like your company does.
But we shouldn’t paint it like I was bailed out by some algorithmic PR radar and nothing else.
Let’s not fall into the “Fuk the police” style of thinking where every action is assumed to be manipulation. Tarus didn’t reach out like a Scientology agent demanding I take the post down or warning me of consequences.
He came with empathy, internal leverage, and actually made things move.
When before i read Tarus email, i wrote in Slack to Nate Berkopec (puma maintainer): `Hi. AWS destroyed me, i'm going to take a big break .`
Then his email reset my cortisol levels to acceptable level.
Most importantly, this incident triggered a CoE (Correction of Error) process inside AWS.
That means internal systems and defaults are being reviewed, and that’s more than I expected. We’re getting a real update, that will affect cases like mine in the future.
So yeah, it may have started in the visibility layer, but what matters is that someone human got involved, and actual change is now happening.
>[...] assumed to be manipulation
I think you're reading way more negativity into "PR" than I'm intending (which is no negativity).
It's very clear Tarus is a caring person who really did empathize with your situation and did their best to rectify the situation. It's not a bad thing that your issue may (most likely) have been brought to his attention because of "PR radar" or whatever.
The bad part, on Amazon and other similar companies, is how they typically respond when a potential PR hit isn't on the line. Which, as I'm sure you know because you experienced it prior to posting your blog, is often a brick wall.
The overwhelming issue is that you often require some sort of threat of damage to their PR to be assisted. That doesn't make the PR itself a bad thing. And that fact implies nothing about the individuals like Tarus who care. Often the lowly tier 1 support empathizes, they just aren't allowed to do anything or say anything.
customer service was great and refunded my money without me blogging about it. we messaged back and forth about what i was trying to do and what i thought i was signing up for. i think it helped to have a long history of tiny aws instances because they mentioned reviewing my customer history
i want to hate amazon but they provided surprisingly pleasant and personable service to a small fry like me. that exchange alone probably cost amazon more money than ive spent in aws. won my probably misguided customer loyalty
Being a PR move isn't inherently a bad thing.
The bad thing is the lack of support when PR isn't at risk.
>It's fair to say that without the blog post this issue wouldn't have been noticed or fixed, but anything past that is really just speculating about people's motives.
My only (minor) issue with the blog post is starting by saying "Not because of PR" when the opening email from the human at amazon was "saw your blog". I think it is evident that Tarus Balog did indeed actually care!
"If you want your paperwork processed in Morocco, make sure you know someone at the commune, and ideally have tea with their cousin."
Yes, it works, but it shouldn’t be the system.
What happened with AWS isn’t a clever survival tip, it’s proof that without an account manager, you are just noise in a ticket queue, unless you bring social proof or online visibility.
This should have never come down to 'who you know' or 'how loud you can go online'.
It a big luck that i'm speaking in english and have online presence, what if i was ranting in French, Arabic, or even Darija in Facebook. Tarus will have never noticed.
I recently opened a DigitalOcean account and it was locked for a few days after I had moved workloads in. They took four days to unlock the account, and for my trouble they continued to charge me for my resources during the time the account was locked when I couldn't log in to delete them. I didn't have any recourse at all. They did issue a credit because I asked nicely, but if they said no, that would have been it.
Well, not normally, no. But it does happen. Not often enough to be a meaningful statistical issue, but if it were to happen to you then a little forethought can turn a complete disaster into a survivable event. If you store all your data 'in the cloud' realize that your account could be compromised, used to store illegal data, be subject to social engineering and lots of other ways that could result in a cloud services provider to protect their brand rather than your data. If - like the author - you are lucky you'll only be down for a couple of days. But for most businesses that's the end of the line, especially if you run a multi-tenant SaaS or something like that. So plan for the worst and hope for the best.
Surprising. In my time, things always got pretty serious if your service could not recover from loss due to regretable events.
TFA alluded to a possible but "undocumented" way to restore terminated infrastructure. I don't think all AWS services nuke everything on deletion, but if it is not in writing ...
Doesn't everyone has this?
If wrong data gets deleted, and that gets replicated, now you simply have two copies of bad data.
Yes, I had backups everywhere. Across providers, in different countries. But I built a system tied to my AWS account number, my instances, my IDs, my workflows.
When that account went down, all those “other” backups were just dead noise encrypted forever. Bringing them up to the story only invites the 'just use your other backups' fallback, and ignores the real fragility of centralized dependencies.
It is like this: the UK still maintains BBC Radio 4’s Analogue Emergency Broadcast—a signal so vital that if it’s cut, UK nuclear submarines and missile silos automatically trigger retaliation. No questions asked. That's how much stakes they place on a reliable signal.
If your primary analogue link fails, the world ends. That's precisely how I felt when AWS pulled my account, because I’d tied my critical system to a single point of failure. If the account was just Read only, i will waited because i could have access to my data and rotated keys.
AWS is the apex cloud provider on the planet. This isnt about redundancy or best practices.
it's about how much trust and infrastructure we willingly lend to one system.
Remember that if BBC Radio 4 signal get to fail for some reasons, the world will get nuked, only cockroaches will survive… and your RDS and EC2 billing fees.
https://www.tomshardware.com/software/cloud-storage/aws-accu...
> Update: August 5 7:30am (ET): In a statement, an AWS spokesperson told Tom's Hardware "We always strive to work with customers to resolve account issues and provided an advance warning of the potential account suspension. The account was suspended as part of AWS’s standard security protocols for accounts that fail the required verification, and it is incorrect to claim this was because of a system error or accident."
This shows a bigger part of this problem.
When these mistakes do happen, they're invariably treated as standard operating procedures.
They're NEVER treated as errors.
It would appear that the entire support personnel chain and PR literally have no escalation path to treat any of these things as errors.
Instead, they simply double-down that it's NOT an error that the accounts was terminated on an insufficient notice over bogus claims and broken policies.
mhuot•1h ago
jacquesm•1h ago