This is not like, super trivial. People needed to figure out power, hardware, networking, vpn, etc.. etc.. etc.. and staffing. A lot of that had absolutely nothing to do with IT, but some of it did.
These are the kind of things that fall between the gaps in smaller companies and there's no expert to build this disaster recovery plan because there is no risk or compliance department.
It falls into the lap of whomever is dealing with the audits or whoever has a reputation for getting things done and unblocking people.
A copy-pasted plan is actually fine, as long as whoever's responsible for following it actually can follow it and it works.
A plan that nobody's even looked at much less tried to follow isn't fine. Even if it's word-for-word identical to one that is fine.
~ Lots of people
This guy is quite possibly going to end up looking stupid when something goes wrong and it turns out he lied about having thought about it. I hope he is as clever as he thinks he is at anticipating what will go wrong in the future. Fires and whatnot do happen. Even AWS us-east-1 has experienced outages.
I really don't understand what the point of EMs is.
byoung2•7h ago
david38•6h ago
hamburga•6h ago
hamburga•6h ago
daxfohl•5h ago
jacob_rezi•4h ago
Terr_•5h ago
> Ultimately a lot of this generative tech stuff is just counterfeiting extra signals people were using to try to guess at interest, attentiveness, intelligence, etc.
> So yeah, as those indicators become debased, maybe we'll go back to sending something [...] all boiled down to bullet points.
ctkhn•4h ago
hliyan•4h ago
"Recent outage was due to a retry loop for the Foo API exceeding rate limits. We're implementing a backoff algo"
Sender, via ChatGPT:
Hi,
I wanted to provide more context regarding the recent outage.
The issue was triggered by a retry loop in the Foo API integration. When the API began returning errors, our system initiated repeated retry attempts without sufficient delay, which quickly exceeded the rate limits imposed by the API provider. As a result, requests were throttled, leading to degraded service availability.
To address this, we are implementing an exponential backoff algorithm with jitter. This approach will ensure that retries are spaced out appropriately and reduce the likelihood of breaching rate limits in the future. We are also reviewing our monitoring and alerting thresholds to detect similar patterns earlier.
We’ll continue to monitor the system closely and share further updates as improvements are rolled out.
Best regards,
Receiver, via ChatGPT:
"The outage was caused by excessive retry attempts to the Foo API, which triggered rate limiting and degraded service. To prevent recurrence, exponential backoff with jitter is being implemented"
nullc•2h ago
whiplash451•3h ago