If their triage system is good than overwhelming them with duplicate non-specific "things are wonky" error reports might not hurt, but it def doesn't help.
https://status.heroku.com/incidents/2822
Update
Heroku continues to investigate and remediate an issue with intermittent outages.
Posted 3 hours ago, Jun 10, 2025 14:20 UTC
Issue
Beginning at 06:03 UTC, Heroku is having intermittent outages which are currently being investigated.
Posted 4 hours ago, Jun 10, 2025 13:07 UTC
Investigating
Engineers are continuing to investigate an issue accessing Heroku services.
Posted 4 hours ago, Jun 10, 2025 12:58 UTC
Investigating
Engineers are continuing to investigate an issue accessing Heroku services.
Posted 8 hours ago, Jun 10, 2025 09:19 UTC
Investigating
Engineers are investigating an issue with the Heroku platform.
Posted 9 hours ago, Jun 10, 2025 08:04 UTC
And the official Heroku status page says it's all fine.
I feel bad for companies with a lot of users
Access to check logs or perform other Heroku CLI functions is very intermittent. Some of our services work sometimes. Others don't. When I can occasionally access the logs, it seems like there are issues pinging other services. The apps aren't able to push their logs to DataDog.
It's a mess, and there's not much we can do right now, which is frustrating. And the fact that the Heroku status page itself is broken is embarrassing.
The status page itself is either saying nothing is wrong, or points to an error page[0]. The incident itself[1] hasn't been updated, which is pretty frustrating.
We can't submit a support ticket because, well, it requires the authentication procedure as well.
We use worker queues, and the queues are getting blown out because heroku can't action anything. We're having our microservices yo-yo now, which suggests things are getting worse, now better.
I've always been a huge Heroku advocate, but the last 5 years have been death by a thousand cuts.
0: https://status.heroku.com/error 1: https://status.heroku.com/incidents/2822
Which is odd, heroku I'd think would be pretty good at keeping it's status page infrastructure separate enough to stay up. Must mean something pretty fundamental in their architecture is malfunctioning. :(
but when I am able to see the error page, it did say "Heroku continues to investigate and remediate an issue with intermittent outages" -- I would say it is acknolwedged. Yes, that message is 3 hours old. The fact that it's taking them over 3 hours to fix is disturbing, but getting contant progress communication isn't really urgent for me -- I know they know about it, I know they are working to fix it, I'd like them to fix it _quicker_ but I don't need a play-by-play, "can't even acknowledge an incident" is NOT a problem being exhibited, it's acknowledged.
We'll wait and see what it was. A good retrospective write-up goes a long way to increasing many people's confidence, including mine.
Stability definitely still matters though, of course.
* Heroku and most of Salesforce
* Pipedrive: https://news.ycombinator.com/item?id=44234098
* OpenAI: https://status.openai.com/incidents/01JXCAW3K3JAE0EP56AEZ7CB...
* Lobsters: https://news.ycombinator.com/item?id=44234075
What's the common denominator?
From the last archived version of lobste.rs/about: https://web.archive.org/web/20250601002505/https://lobste.rs...
"Lobsters is hosted on three VPSs at DigitalOcean: a s-4vcpu-8gb for the web server, a s-4vcpu-8gb for the mariadb server, and a s-1vcpu-1gb for the IRC bot. DNS is hosted at DNSimple and we use restic for backups to b2. (Setup details are available in our ansible repo.) Lobsters is cheap to run, so we don't take donations."
My best guess: The authentication issue affects them more than it does us.
1. Our platform can't access any external API services from that dyno specifically. 2. Can't get into dashboard.heroku.com to login to our services/infrastructure. 3. heroku-cli doesnt work so cant access any logs from our dyno. 4. We're an enterprise client and had next to no update or no response from our account manager albeit we're right at the bottom of the pecking order but no reason.
> The Salesforce Trust site is currently experiencing a service disruption. During this time, users will be unable to access the Trust site or receive notifications about service-impacting incidents or maintenances. We will continue to provide updates here until the issue is resolved.
And the heroku status page actually has recent upates, although they are just repeating this every hour:
> Heroku continues to investigate and remediate an issue with intermittent outages.
My app is still up, but heroku dashboard is down for me now -- and (re)starting/stopping dynos may not be working. (hirefire is complaining on and off that it can't scale workers).
In the past, with this kind of outage, apps have often stayed up as long as they don’t try to redeploy, which sometimes takes them irreparably down. So... recommend you don't try to deploy!
It has been there for over 9 hours now.
I would definitely shame them -- do they not even bother looking at status pages before giving you information? How has Heroku not distributed this information to any customer-contacting staff already?
It's crap like this that disturbs me even more than the outage in fact. I know bad outages can happen even to the best of us, but your account manager not knowing about it and not even bothering to try to find out before giving you bad information is the mark of mediocrity at best.
"Update 6: Posted Tue, 10 Jun 2025 11:27:51 UTC
The third-party provider disabled a single server, which did not yield a positive outcome. We are continuing our investigations with the provider."
I wonder, do they mean this whole shitshaw was caused by just one fucking third party server?
Even their status page (https://status.heroku.com/) doesn't work normally. This is incredible.
> 18:22:11 UTC
We’ve identified the likely trigger of the issue and are making strong progress toward a resolution. A recent automated update disrupted network connectivity on affected instances.
But then, about 7 hours in (~1:40 PM EDT) for about 30 minutes all web requests were failing with a 503, logged as H99 "Platform error". No requests made it to our dyno during that period.
We can't failover to contingency without making API calls - the read replica has to be forked and promoted. I've always comforted myself by saying "well, the data plane and control plane have always been separate and aren't impacted by the same events". No longer.
My guess is this may have been some kind of trouble-shooting that caused this issue, but it doesn't matter. The point is I can no longer ignore this global single point of failure and really the only answer is to migrate away. Which I am reluctant to do, and I don't know that it will increase reliability.
We have never had application downtime as a result of a Heroku issue in the last 8 years this has been in production. 30 minutes in 8 years isn't bad. But if it were down for 3 days I'd have the same options: fuck and all.
I feared this may be hackers of something much worse than heroku/salesforce is letting on, due to the complete lack of transparency.
But this line in the salesforce status page makes me think it isn’t something of that nature:
> Our efforts remain focused on internal testing and validation as we continue to see incremental improvements
“Testing and validation” isn’t something you’d do (or say you’re doing) during a hack.
Why don’t companies learn to communicate outages in order to help their customers? Even the heroku status twitter account has posted twice in twelve hours. There’s no excuse for such opaqueness.
The Heroku dashboard is now back online. Customers can now attempt to recycle their dynos on the dashboard.
cornfieldlabs•20h ago
Even the status page is failing
cranberryturkey•20h ago
cornfieldlabs•20h ago
cranberryturkey•20h ago
cornfieldlabs•20h ago
Our servers are up till now
cranberryturkey•19h ago
exikyut•20h ago
cornfieldlabs•20h ago
tomislavpet•19h ago
drrotmos•19h ago
94b45eb4•19h ago