i have no idea if its possible to calculate the rate at which repos are being created and time your repo creation to hit vanity numbers
(I understand many companies default to not expose any information unless forced otherwise.)
Also, having a sequence implies at least a global lock on that sequence during repo creation. Repo creation could otherwise be a scoped lock. OTOH, it's not necessarily handled that way --- they could hand out ranges of sequences to different servers/regions and the repo id may not be actually sequential.
People do this. GitHub started doing it too so now you get a nice email from them first instead of another kind of surprise.
https://docs.github.com/en/code-security/secret-scanning/sec...
and the list is way bigger than I recalled: https://docs.github.com/en/code-security/secret-scanning/int...
>quite a few providers they actually have an integration to revoke them if found in a public repo, which I think is way more handy
Yes I've also gotten an email from Amazon saying they revoked a key someone inadvertently leaked (but so long ago I only remember that it happened). I read my AWS emails at least.
Yes: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-... (you can filter to only show repositories created since a given date).
they have some secret leaking infra for enterprise
Which is arguably even more interesting…
The 1000th is missing.
Did anyone make a search engine for these yet, so we'd be able to get an estimate by searching for the word "a" or so?
(This always seemed like the big upside of centralised GitHub to me: people can actually find your code. I've been thinking of making a search since MS bought GH but didn't think I could do the marketing aspects and so it would be a waste of effort and I never did it. Recently I was considering whether this would be worth revisiting, with the various projects I'm putting on Codeberg, but maybe someone beat me to the punch)
https://docs.gitlab.com/api/projects/#list-all-projects (for dumb reasons it seems GL calls them Projects, not Repositories)
https://codeberg.org/api/swagger#/repository/repoGetByID (that was linked to by the Forgejo.org site, so presumably it's the same for it and Codeberg) and its friend https://gitea.com/api/swagger#/repository/repoGetByID
Heptapod is a "friendly fork" of GitLab CE so its API works the same: https://heptapod.net/pages/faq#api-hgrc
and then I'd guess one would need to index the per-project GitLab instances: Gnome, GNU (if they ever open theirs back up), whatever's going on with Savannah, probably Sourceforge, maybe sourcehut (assuming he doesn't have some political reason to block you), etc
If I won the lottery, I'd probably bankroll a sourcegraph instance (from back when they were Apache) across everything I could get my hands upon, and donate snapshots of it to the Internet Archive
Repository search is pretty limited so far: only full-text search on URLs or in a small list of metadata files like package.json.
or incredible commentary on most github repos,
having no purpose, never being realized, and even having given up dreaming.
When I was a student I had to manually set up CVS pserver and CVSWeb to collaborate with other students on assignments.
This is a bit easier by at least a few orders of magnitude.
I mean maybe he made that repo because he'd given up on his coding dreams. Almost.
That would be some interesting accidental meta commentary on the state of things.
And what is an "analytics db" in this context?
For one, if your IDs are approaching the 2^31 signed integer limit, then by definition, you have nearly two billion rows, which is a very big DB table! There are only a handful of systems that can handle any kind of change to that volume of data quickly. Everything you do to it will either need hours of downtime or careful orchestration of incremental/rolling changes. This issue tends to manifest first on the "biggest" and hence most important table in the business such as "sales entries" or "user comments". It's never some peripheral thing that nobody cares about.
Second, if you're using small integer IDs, that decision was probably motivated in part because you're using those integers as foreign keys and for making your secondary indexes more efficient. GUIDs are "simpler" in some ways but need 4x the data storage (assuming you're using a clustered database like MySQL or SQL Server). Even just the change from 32-bits to 64-bits doubles the size of the storage in a lot of places. For 2 billion rows, this is 8 GB more data minimum, but is almost certainly north of 100 GB across all tables and indexes.
Third, many database engines will refuse to establish foreign key constraints if the types don't match. This can force big-bang changes or very complex duplication of data during the migration phase.
Fourth, this is a breaking change to all of your APIs, both internal and external. Every ORM, REST endpoint, etc... will have to be updated with a new major version. There's a chance that all of your analytics, ETL jobs, etc... will also need to be touched.
Fun times.
Just wanted to nitpick this; this is not actually definitively true. A failed insert in some systems will increment the counter and deleting rows usually does not allow the deleted ID to be re-used (new inserts use the current counter). Of course, that is beside the point: the typical case of a table approaching this limit is a very large table.
Our hosting environment at that time was a data centre so we were limited on storage, which complicated matters a bit. Like ideally you’d create a copy of the table but with a wider PK column and write to both tables, then migrate your reads, etc., but we couldn’t do that because the table was massive and we didn’t have enough space. Procuring more drives was possible but took sometimes weeks - no just dragging a slider in your cloud portal. And then of course you’d have to schedule a maintenance window for somebody to plug it in. It was absolutely archaic, especially when you consider this was late 2017/early 2018.
You need multiple environments so you can do thorough testing, which we barely had at that point, and because every major system component was impacted, we had to redeploy our entire platform. Also, because it was the PK column affected, we couldn’t do any kind of staged migration or rollback without the project becoming much more complex and taking a lot longer - time we didn’t have due to the rate at which we were consuming 32-bit integer values.
In the end it went off without a hitch, but pushing it live was still a bit of a white knuckle moment.
The alternative is that you can monkey-patch the database driver to parse the i64 id as an IEEE754 number anyway and deal with this problem later when you overflow the JavaScript max safe integer size (2^53), except when that happens it will manifest in some really wacky ways, rather than the db just refusing to insert a new row.
To speed things up we decided to correct the ID types for the server response, which was key since they were generated from protobuf. But we kept everything using number type IDs everywhere else, even though they would actually be strings, which would not cause many issues because there ain't much reason to be doing numeric operations on an ID, except the odd sort function.
I remember the smirk on my face when I suggested it to my colleague and at the time we knew it was what made sense. It must have been one of the dumbest solutions I've ever thought of, but it allowed us to switch the type eventually to string as we changed code, instead of converting the entire repos at once. Such a Javascript memory that one :)
There was a conflict between this and the LuaRocks implementation under LuaJIT [1] [2], inflicting pain on a narrow set of users as their CI/CD pipelines and personal workflows failed.
It was resolved pretty quick, but interesting!
[1] https://github.com/luarocks/luarocks/issues/1797
[2] https://github.com/openresty/docker-openresty/issues/276
In case anyone cares to read more about the OSM milestone, the official blog entry: https://blog.openstreetmap.org/2021/02/25/100-million-edits-... My write-up of changeset activity around the event: https://www.openstreetmap.org/user/LucGommans/diary/395954
This was probably fifteen years ago. I feel like working in tech was more fun back then.
We were being onboarded, they were just for demo and were promptly deleted. No one cared about the Cool Numbers.
Does anyone remember D666666 from Facebook? It was a massive codemod; the author used a technique similar to this one to get that particular number.
But given that this user doesn't have activity very often, and created two repositories as the number was getting close, it feels likely that it was deliberate. I could be wrong!
Radars that were bug #1,000,000, etc. were kind of special. Unless someone screwed up (and let down the whole team) they were usually faux-Radars with lots of inside jokes, etc.
Pulling up one was enough since the Radar could reference other Radars ... and generally you would go down the rabbit hole at that point enjoying the ride.
I was a dumbass not to capture (heck, even print) a few of those when I had the opportunity.
On the other hand, given how Apple deals with confidential data, you probably wouldn't want to be caught exfiltrating internal documents however benign they are.
I think there's probably a lesson in there about schema design...
Lame. :-(
bitpush•1d ago