Or maybe your IP/browser is questionable.
Personally, I like sourcehut (sr.ht)
(Joke's on Reddit, though, because Reddit content became pretty worthless since they did this, and everything before they did this was already publicly archived)
Since Microsoft is struggling to make ends meet, maybe they could throw a captcha or proof of work like Anubis by xe iaso.
They already disabled code search for unauthenticated users. Its totally plausible they will disable code browsing as well.
I'm using Firefox and Brave on Linux from a residential internet provider in Europe and the 429 error triggers consistantly on both browsers. Not sure I would consider my setup questionable considering their target audience.
The future is a .txt file of John Carmack pointing out how efficient software used to be, locked behind a repeating WAF captcha, forever.
I encountered this too once, but thought it was a glitch. Worrying if they can't sort it.
And not just generally degenerate bots? Or just one evil bot network?
Collateral damage of AI I guess
From a long-term, clean network I have been consistently seeing these “whoa there!” secondary rate limit errors for over a month when browsing more than 2-3 files in a repo.
My experience has been that once they’ve throttled your IP under this policy, you cannot even reach a login page to authenticate. The docs direct you to file a ticket (if you’re a paying customer, which I am) if you consistently get that error.
I was never able to file a ticket when this happened because their rate limiter also applies to one of the required backend services that the ticketing system calls from the browser. Clearly they don’t test that experience end to end.
That’s what I run on my personal server now.
Almost went with Gitea, but the ownership structure is murky, feature development seems to have plateaued, and they haven’t even figured out how to host their own code. It’s still all on GitHub.
I’ve been impressed by Forgejo. It’s so much faster than Github to perform operations, I can actually backup my entire corpus of data in a format that’s restorable/usable, and there aren’t useless (AI) upsells cluttering my UX.
For listeners at home wondering why you'd want that at all:
I want a centralized Git repo where I can sync config files from my various machines. I have a VPS so I just create a .git directory and start using SSH to push/pull against it. Everything works!
But then, my buddy wants to see some of my config files. Hmm. I can create an SSH user for him and then set the permissions on that .git to give him read-only access. Fine. That works.
Until he improves some of them. Hey, can I give him a read-write repo he can push a branch to? Um, sure, give me a bit to think this through...
And one of his coworkers thinks this is fascinating and wants to look, too. Do I create an SSH account for this person I don't know well at all?
At this point, I've done more work than just installing something like Forgejo and letting my friend and his FOAF create accounts on it. There's a nice UI for configuring their permissions. They don't have SSH access directly into my server. It's all the convenience of something like GitHub, except entirely under my control and I don't have to pay for private repos.
Forgejo promised — but is yet to deliver any — interesting features like federation; meanwhile the real features they've been shipping are cosmetic changes like being able to set pronouns in your profile (and then another 10 commits to improve that...)
If you judge by very superficial metrics like commit counts, forgejo's count is heavily inflated by merges (which gitea development process doesn't use, preferring rebase), and frequent dependency upgragdes. When you remove that, the remaining commits represent maybe half of gitea's development activity.
So I expect to observe both for another year before deciding on where to upgragde. They're too similar at the moment.
FWIW, one of gitea larger users — Blender — continues to use and sponsor gitea and has no plans to switch AFAIK.
git log --since="1 year ago" --format="%an" | sort | uniq -c | sort -n | wc -l
to get an overview of things. That showed 153 people (including a small handful of bots) contributing to Gitea, and 232 people (and a couple bots) contributing to Forgejo. There are some dupes in each list, showing separate accounts for "John Doe" and "johndoe", that kind of thing, but the numbers look small and similar to me so I think they can be safely ignored.And it looks to me like Forgejo is using a similar process of combining lots of smaller PR commits into a single merge commit. The wide majority of its commits since last June or so seem to be 1-commit-per-PR. Changing the above command to `--since="2024-07-1"` reduces the number of unique contributors to 136 for Gitea, 217 for Forgejo. It also shows 1228 commits for Gitea and 3039 for Forgejo, and I do think that's a legitimately apples-to-apples comparison.
If we brute force it and run
git log --since="1 year ago" | rg '\(\#\d{4,5}\)' | wc -l
to match lines that mention a PR (like "Simplify review UI (#31062)" or "Remove `title` from email heads (#3810)"), then I'm seeing 1256 PR-like Gitea commits and 2181 Forgejo commits.And finally, their respective activity pages (https://github.com/go-gitea/gitea/pulse/monthly and https://codeberg.org/forgejo/forgejo/activity/monthly) show a similar story.
I'm not an expert in methodology here, but from my initial poking around, it would seem to me that Forgejo has a lot more activity and variety of contributors than Gitea does.
Or randomly when clicking through a repository file tree. The first time I hit a rate limit was when I was skimming through a repository on my phone, and about the 5th file I clicked I was denied and locked out. Not for a few seconds either, it lasted long enough that I gave up on waiting then refreshing every ~10 seconds.
Yes, it does not look like an intended service usage, but I used it for a demo: https://github.com/ClickHouse/web-tables-demo/
Anyway, will try to do the same with GitHub pages :)
Those of us who self-host git repos know that this is not true. Over at ardour.org, we've passed the 1M-unique-IP's banned due to AI trawlers sucking our repository 1 commit at a time. It was killing our server before we put fail2ban to work.
I'm not arguing that the specific steps Github have taken are the right ones. They might be, they might not, but they do help to address the problem. Our choice for now has been based on noticing that the trawlers are always fetching commits, so we tweaked things such that the overall http-facing git repo works, but you cannot access commit-based URLs. If you want that, you need to use our github mirror :)
They've pretty widely chosen to not do work and just slam websites from proxy IPs instead.
You would think their products would be used by them to do the work if they worked as well as advertised...
I would more readily assume a large social networking company filled with bright minds would have worked out some kind of agreement on, say, a large corpus of copyrighted training data before using it.
It's the wild wild west right now. Data is king for AI training.
This is egregious behavior because Microsoft hasn't been upfront about this while they were doing this. Many open source projects are probably unaware that their issue tracker has been walled off, creating headaches unbeknownst to them.
> You have exceeded a secondary rate limit.
Edit and self-answer:
> In addition to primary rate limits, GitHub enforces secondary rate limits
(…)
> These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons.
https://docs.github.com/en/rest/using-the-rest-api/rate-limi...
This rather significantly changes the place of github hosted code in the ecosystem.
I understand it is probably a response to the ill-behaved decentralized bot-nets doing mass scraping with cloaked user-agents (that everyone assumes is AI-related, but I think it's all just speculation and it's quite mysterious) -- which is affecting most of us.
The mystery bot net(s) are kind of destroying the open web, by the counter-measures being chosen.
Also, neither the new nor the old rate limits are mentioned.
At this point knowledge seems to be gathered and replicated to great effect and sites that either want to monetize their content OR prevent bot traffic wasting resources seem to have one easy option.
AI not caching things is a real issue. Sites being difficult TO cache / failing the 'wget mirror test' is the other side of the issue.
All of public github is only 21TB. Can't they just host that on a dumb cache and let the bots crawl to their heart's content?
gnabgib•5d ago
5000 req/hour for authenticated - personal
15000 req/hour for authenticated - enterprise org
According to https://docs.github.com/en/rest/using-the-rest-api/rate-limi...
I bump into this just browsing a repo's code (unauth).. seems like it's one of the side effects of the AI rush.
mijoharas•12h ago
I thought I was just misreading it and failing to see where they stated what the new rate limits were, since that's what anyone would care about when reading it.
1oooqooq•2h ago
they already have all your code. they've won.
blinker21•7h ago
I really wish github would stop logging me out.
Novosell•6h ago
1oooqooq•2h ago
GH now uses the publisher business model, and as such, they lose money when you're logged out. same reason why google, fb, etc will not ask you for a password for decades.
zarzavat•2h ago
If they would let me stay logged in for a year then I wouldn't care so much.
dghlsakjg•2h ago
usernamed7•6h ago
out-of-ideas•6h ago
thanks github for the worse experience
ikiris•3h ago
notatoad•3h ago
zarzavat•2h ago
A normal rate limit to separate humans and bots would be something like 60 per minute. So it's about an order of magnitude too low.
mjevans•1h ago
Something on the order of 6 seconds a page doesn't sound TOO out of human viewing range depending on how quickly things load and how fast rejects are identified.
I could see ~10 pages / min which is 600 pages / hour. I could also see the argument that a human would get tired at that rate and something closer to 200-300 / hr is reasonable.