Four nines is not what I would be citing at this point. (That's less than an hour per year, so they burned that for next three decades)
Maybe aim for 99% first.
Otherwise a pretty honest and solid response, kudos for that!
I always strive for 7 9s myself, just not necessarily consecutive digits.
This reads as if overall performance was an afterthought and this doesn’t seem practical; it should be a business metric, it is important to the users after all.
Then again, it’s easy to comment like this in hindsight. We’ll see what happens long term.
We're out of credits, create a new account. We've been API rate limited? When did that start happening? When are we going to get access again?
Good luck engineers of the future!
What We’re Doing:
-We are making ongoing adjustments to our infrastructure to improve stability and ensure reliable scaling under elevated load
-Analyzing system patterns and optimizing backend processes where resource contention is highest
-Implementing protective measures to safeguard platform integrity
It's not going to get better in any way.
My guess is reason they been down so long is they don’t have good rollback so they attempting to fix forward with limited success.
My guess. They use AWS hosted Postgresql and autovacuuming fell permanently behind without them noticing, and can't keep up with organic growth, and they can't scale vertically because they already maxed that out before. So they have to do crash migrations of data off their core DB which is why it's taking so long.
yet another SaaS that really does not need to be online 24/7. It could have been a simple app where you could "no code" on local machine and async state with webflow servers.
you're like the person complaining that the hammer isn't very useful for driving in the screw. you need a different tool/app if you want to make a site you host yourself
Edit, an outage of this length smells of bad systems architecture...
And most non-tech (and many in tech) have never heard about OVH/Hetzner.
Plus, despite marketing begging for the WYSIWYG interface they actually weren't creative enough to generate new content at a pace that required it.
We massively increased conversion rates by going full native and having 1 Engineer churn out parts kits/kitbash LPs from said kits.
Scale for reference: ~$10M/month
Its not great by our standards but I bet many of us drink the house wine not something more sophisticated, right :)
It's not a very interesting thing to do however.
It's actually the job of the CEO to keep all of the c-suite people doing jobs. Doesn't seem to stop the CEO salary explosions.
Companies, after a disaster, focus lots of effort on that particular disaster, leaving all the other potential disasters unplanned for.
If you work at Webflow, you can anticipate LOTS of work in disaster recovery in the next 12 months. This has magically become a high priority for the CEO, who previously wanted features more than disaster recovery planning.
They will wait to focus massive resources on their security until after they get hacked.
(And also why security is always a losing battle)
dangoodmanUT•12h ago