They need to get a grip on this.
myrepo git:(fix/context-types-settings) gp
ERROR: user:1234567:user
fatal: Could not read from remote repository.
myrepo git:(fix/context-types-settings) ssh -o ProxyCommand=none git@github.com
PTY allocation request failed on channel 0
Hi user! You've successfully authenticated, but GitHub does not provide shell access.
Connection to github.com closed.Remember hotmail :)
we need more soverenity and decentralization.
The underlying tech is still decentralized, but what good does that do when we've made everything that uses it dependent on a few centralized services?
You lost me there
So I don't get why the project has "lost you", but I also suspect you're the kind of person any project could readily afford to lose as a user.
Edit: ugh... if you rely on GH Actions for workflows though actions/checkout@v4 is also currently experiencing the git issues, so no dice if you depend on that.
Where is your god now, proponents of immutable filesystems?!
I started packing things into docker containers because of that. Makes it a bit more of a hassle to change things in production.
At the largest place I did have prod creds for everything because sometimes they are necessary and I had the seniority (sometimes you do need them in a "oh crap" scenario).
They where all setup on a second account in my work Mac which had a danger will Robinson wallpaper because I know myself, far far too easy to mentally fat finger when you have two sets of creds.
I actually had the privilege of being sent to the server.
Because my suggestion they have a spare ADSL connection for out of channel stuff was an unnecessary expense... Til he broke the firewall knocked a bunch of folks offline across a huge physical site and locked himself out of everything.
The spare line got fitted the next month.
They done borked it good.
ERROR: no healthy upstream fatal: Could not read from remote repository.
"Why do we need so many people to keep things running!?! We never have downtime!!"
The funny thing is that the over hiring during the pandemic also had the predictable result of mass lay-offs.
Whoever manages HR should be the ones fired after two back to back disasters like this.
I utterly hate being at the mercy of a third party with an after thought of a "status page" to stare at.
We self-host GitLab but the team owning it is having hard time scaling it. From my understanding talking to them, the design of gitaly makes it very hard to scale it beyond certain repo size and # of pushes per day (for reference: our repos are GBs in size, ~1M commits, hundreds of merges per day)
Self-hosted Gitlab periodically blocks access for auto-upgrades. Github.com upgrades are usually invisible.
Github.com is periodically hit with the broad/systemic cloud-outage. Self-hosted Gitlab is more decentralized infra, so you don't have the systemic outages.
With self-hosted Gitlab, you likely to have to deal with rude bots on your own. Github.com has an ops team that deals with the rude bots.
I'm sure the list goes on. (shrug)
Microsoft CEO says up to 30% of the company’s code was written by AI https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-3...
Time to leak that.
The enterprise cloud in EU, US, and Australia has no issues.
If you look at the incident history disruptions happen often in the public cloud for years already. Before AI wrote code for them.
It's not just HTTPS, I can't push via SSH either.
I'm not convinced it's just "some" operations either; every single one I've tried fails.
A. Are these major issues with cloud/SaaS tools becoming more common, or is it just that they get a lot more coverage now? It seems like we see major issues across AWS, GCP, Azure, Github, etc. at least monthly now and I don't remember that being the case in the past.
B. If it's becoming more common, what are the reasons? I can think of a few, but I don't know the answer, so if anyone in-the-know has insight I'd appreciate it.
Operations budget cuts/layoffs? Replacing critical components/workflows with AI? Just overall growing pains, where a service has outgrown what it was engineered for?
Thanks
Someone answered this morning, while Cloudflare outage, it's AI vibe coding and I tend to think there is something true in this. At some point there might be some tiny grain of AI engaged which starts the avalanche ending like this.
However, this is an unexpected bell curve. I wonder if GitHub is seeing more frequent adversarial action lately. Alternatively, perhaps there is a premature reliance on new technology at play.
I was trying to do a 1.0 release today. Codeberg went down for "10 minutes maintenance" multiple times while I was running my CI actions.
And then github went down.
Cursed.
What change is how many services GitHub can be having issues.
A lot of people are pointing to AI vibe coding as the cause, but I think more often than not, incidents happen due to poor maintenance of legacy code. But I guess this may be changing soon as AI written code starts to become "legacy" faster than regular code.
FWIW Microsoft is convinced moving Github to Azure will fix these outages
Your second point is a little disingenuous. Yes, Microsoft and Windows have been wildly successful from a cultural adoption standpoint. But that's not the point I was trying to argue.
So many weird paths we could have gone down it's almost strange Microsoft won.
1.) It's already a miracle Xerox PARC escaped their parent company's management for as long as they did.
3.) IBM was playing catch-up on the supercomputer front since the CDC 6400 in 1964. Arguably, they did finally catch up in the mid-late 80's with the 3090.
Gary was on a flight when IBM called up the Digital Research looking for an OS for the IBM-PC. Gary’s wife, Dorothy, wouldn’t sign an NDA without it going through Gary, and supposedly they never got negotiations back on track.
https://www.zdnet.com/article/ms-moving-hotmail-to-win2000-s...
https://techrights.org/n/2025/08/12/Microsoft_Can_Now_Stop_R...
ever since Musk greenlighted firing people again.. CEOs can't wait to pull the trigger
2/ Then we cannot expect big tech to stay as sharp as in the 2000s and 2010s.
There was a time banks had all the smart people, then the telco had them, etc. But people get older, too comfortable, layers of bad incentive and politics accumulate and you just become a dysfunctional big mess.
I suspect (although have not researched) that global traffic is up, by throughput but also by session count.
This contributes to a lot more awareness. Slack being down wasn't impactful when most tech companies didn't use Slack. An AWS outage was less relevant when the 10 apps (used to be websites) you use most didn't rely on a single AZ in AWS or you were on your phone less.
I think as a society it just has more impact than it used to.
Be good to your Stability reliability engineers for the next few months... it's downtime season!
Among other mentioned factors like AI and layoffs: mass brain damage caused by never-ending COVID re-infections.
Since vaccines don't prevent transmission, and each re-infection increases the chances of long COVID complications, the only real protection right now is wearing a proper respirator everywhere you go, and basically nobody is doing that anymore.
There are tons of studies to back this line of reasoning.
It's more work and slower. I'm convinced half of the reason they keep it that way is because the barrier to entry is higher and it scares contributors away.
You mean, assuming everyone in the conversation is using different email providers. (ie. Not the company wide one, and not gmail... I think that covers 90% of all email accounts in the company...)
Well you can with some effort. But there's certainly some inconvenience.
But yes ssh pushing was down, was my first clue.
My work laptop had just been rebooted (it froze...) and the CPU was pegged by security software doing a scan (insert :clown: emoji), so I just wandered over to HN and learned of the outage at that point :)
The downtime we do have each year is typically also on our terms, not in the middle of a work day or at a critical moment.
The reason for buying centralized cloud solutions is not uptime, it's to safe the headache of developing and maintaining the thing.
I’ve seen that work first hand to keep critical stuff deployable through several CI outages, and also has the upside of making it trivial to debug “CI issues”, since it’s trivial to run the same target locally
However, since we use github.com fore more than just a git hosting it is SPOF in most cases, and we treat it as a snow day.
It would be great to also have the continuous build and test and whatever else you “need” to keep the project going as local alternatives as well. Of course.
[1] Or maybe there is just that much downtime on GitHub now that it can’t be shrugged off
Maybe this will push more places towards self-hosting?
* jobs not being picked up
* jobs not being able to be cancelled
* jobs running but showing up as failed
* jobs showing up as failed but not running
* jobs showing containers as pushed successfully to GitHub's registry, but then we get errors while pulling them
* ID token failures (E_FAIL) and timeouts.
I don't know if this is related to GitHub moving to Azure, or because they're allowing more AI generated code to pass through without proper reviews, or something else, but as a paying customer I am not happy.I have been thinking about this a lot lately. What would be a tweak that might improve this situation?
At least microsoft decided we all deserve a couple hour break from work.
this has broken a few pipeline jobs for me, seems like they're underplaying this incident
My guess is that it has to do with the Cloudflare outage this morning.
These companies are supposed to have the top people on site reliability. That these things keep happening and no one really knows why makes me doubt them.
Alternatively,
The takeaway for today: clearly, Man was not meant to have networked, distributed computing resources.
We thought we could gather our knowledge and become omniscient, to be as the Almighty in our faculties.
The folly.
The hubris.
The arrogance.
Our git server is hosted by Atlassian. I think we've had one outage in several years?
Our self hosted Jenkins setup is similarly robust, we've had a handful of hours of "Can't build" in again, several years.
We are not a company made up of rockstars. We are not especially competent at infrastructure. None of the dev teams have ever had to care about our infrastructure (occasionally we read a wiki or ask someone a question).
You don't have to live in this broken world. It's pretty easy not to. We had self hosted Mercurial and jenkins before we were bought by the megacorp, and the megacorp's version was even better and more reliable.
Self host. Stop pretending that ignoring complexity is somehow better.
Gemini 3 Pro after 3 random things announced Github was the issue.
the problem isn’t with centralized internet services, the problem is a fundamental flaw with http and our centralized client server model. the solution doesn’t exist. i’ll build it in a few years if nobody else does.
How many more outages until people start to see that farming out every aspect of their operations maybe, might, could have a big effect on their overall business? What's the breaking point?
Then again, the skills to run this stuff properly are getting more and more rare so we'll probably see more and more big incidents popping up more frequently like this as time goes on.
The VCs look at stars before deciding which open-core startup to invest in.
The 4 or 5 9s of reliability simply do not matter as much.
shooker435•1h ago
it's up now (the incident, not the outage)