frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Similarity = cosine(your_GitHub_stars, Karpathy) Client-side

https://puzer.github.io/github_recommender/
127•puzer•3d ago
GitHub profile analysis - Build your embedding from your Stars - Compare and discover popular people with similar interests and share yours - Generate a Skill Radar - Recommend repositories you might like

Comments

puzer•3d ago
TL;DR

- The Idea: People use GitHub Stars as bookmarks. This is an excellent signal for understanding which repositories are semantically similar.

- The Data: Processed ~1TB of raw data from GitHub Archive (BigQuery) to build an interest matrix of 4 million developers.

- The ML: Trained embeddings for 300k+ repositories using Metric Learning (EmbeddingBag + MultiSimilarityLoss).

- The Frontend: Built a client-only demo that runs vector search (KNN) directly in the browser via WASM, with no backend involved.

- The Result: The system finds non-obvious library alternatives and allows for semantic comparison of developer profiles.

amelius•10h ago
This reminds me of the Netflix prize.

https://en.wikipedia.org/wiki/Netflix_Prize

ashvardanian•9h ago
Cool project! And thanks for mentioning "unum-cloud/USearch" among repo examples :)
jrockway•10h ago
That's actually really neat. It suggested regclient/regclient as a repository I'd like. I looked and, yup, I had no idea that existed and it is a sort of thing I like.

People complain about The Algorithm but it can be useful...

embedding-shape•9h ago
When people talk about "The Algorithm", they're not talking about just some function that sorts stuff by X or Y, but an feed optimized for "evil X", usually trying to drive longer attention, or push up engagement.

If GitHub started using the submissions GitStars to recommend repos in people's GitHub feed, I don't think people would get their pitchforks out about "The Algorithm" in that case. But if GitHub started to make the feed so you spend as much time there as possible, by whatever means and potentially irrelevant stuff, then the GitHub feed would start being considered as one of "The Algorithms" by many, would be my guess.

m00dy•10h ago
lol https://puzer.github.io/github_recommender/#p=eyJ0IjoicHJvZm...
mkehrt•10h ago
Fun fact: cosine similarity's first use in recommendation systems to recommend usenet groups.

(https://dl.acm.org/doi/epdf/10.1145/192844.192905 although they don't call it cosine similarity; they do compute a "correlation coefficient" between two people by adding together the products of scores each gave to a post)

zahlman•9h ago
> they do compute a "correlation coefficient" between two people by adding together the products of scores each gave to a post

I've heard the term "cosine similarity" before but not really looked into it. What does this computation have to do with trigonometry?

Edwinr95•9h ago
The dot product is computed between two vectors. For these use cases that dot product is equal to the cosine of the angle between these angles.

(Strictly speaking we have that the angle is actually defined in terms of the dot/inner product in more abstract spaces like function spaces or L^p/l^p)

armcat•9h ago
It's grounded in basic trigonometry, i.e. it calculates the angle `theta` between two entities/vectors, `a` and `b`. If `theta` is close to 180 degrees, cos(theta) is -1, and cosine similarity dictates these are opposite concepts, i.e. unrelated.
yobbo•9h ago
The Pearson correlation coefficient is covariance normalised to the range [-1, 1] by dividing with the standard deviations (https://en.wikipedia.org/wiki/Pearson_correlation_coefficien...). So not quite same as the normalised scalar product, even though the formulas look related.
mkehrt•9h ago
That makes sense; I don't actually know much about this.

That being said, weirdly, the normalization by standard deviation happens outside the call to `cov` in the paper (page 181, column 1, equations (unnumbered) 1 and 2). And in equation 2 they've expanded `cov` to be the sum of pointwise multiplication of the (scores - average score) people have given to posts.

Again, not my area of expertise, just looking at the math here.

yobbo•9h ago
Yes, they are basically the same thing, but for correlation the values are first zero-centred.
LudwigNagasena•3h ago
Pearson correlation = cosine of the angle between centered random variables. Finite-variance centered random variables form a Hilbert space so it’s not a coincedence. Standard deviation is the length of the random variable as a vector in that space.
Retr0id•9h ago
Very high quality "Recommended repos for you" results, the top one was in fact a repo I was looking for a couple of days ago but did not successfully find.

I just wish I could scroll further down the "Similar to you" list.

ramoz•9h ago
I would like for the weighting to be stronger (e.g. newness - im still getting fairly stale recs), otherwise yes very cool.
keeganpoppen•9h ago
i second the quality. really uncanny.
jbl0ndie•9h ago
Excellent. Found me three other stars and one to that I knew from before but hadn't started. Nice!
embedding-shape•9h ago
It seems to generate pretty good "Recommended repos for you" suggestions, all of them I've heard and seen before, but for one or another reason didn't use for anything or found a need for. Would be great if it could show more options than just 10, because I'm sure further down the list it'd have interesting suggestions I hadn't seen before.
lostmsu•9h ago
Sounds like it actually generates poor suggestions for the reason you are describing. For me, it exclusively suggested repos I've already seen, but did not like.
travisjungroth•8h ago
These seems like an inherent challenge to recommending based on stars. Stars are very sparse, so there’s little “didn’t star this” signal, and there’s no “thumbs down”.

So you’re left with things you “should” star, but there very well could be a reason you didn’t.

armcat•9h ago
This is so nice, it's essentially a collaborative filter (like Spotify recommendations). It would be awesome to try and embed your repos directly, using some LLM embedding like `text-embedding-3-large` and use that either directly or as a re-ranking/scaling mechanism in the recommendation. You might unearth some other interesting repos or people that are doing similar projects but not necessarily starring similar repos.
armcat•9h ago
It would be a good idea to filter out those repos I actually starred - because they are getting a 100% hit (of course they are!).
ComputerGuru•9h ago
99% match to Graydon Hoare and 97% to burntsushi. Could do worse!
lostmsu•9h ago
Yeah, but the matches are not reflexive. You are probably not in the matches for them.
ComputerGuru•8h ago
That explains it. I was curious because rust is probably about half my list only.
aeonik•4h ago
I am also a 99% March to Graydon Hoare.

Makes me wonder if there is something in his stars that is skewing the results.

keeganpoppen•9h ago
this is amazing! i am a bit of a github star enjoyer, and have always wanted something like this. thank you! it looks like for now you take the most recent 500 stars? i have a bit over 1k (i think?), so i would love the 2.1x on that constraint, but completely understand any desire to not do that. fun project! :)
swyx•8h ago
the frontend is beautiful. i find it inspiring that you have 10 years of data science and are no longer limited by your lack of frontend or design knowledge. this is a better site than i couldve done
dmezzetti•8h ago
Nice application, great work!
6r17•7h ago
Ok so i've not been using github for the past 2 years; it matched me closed to Salvatore Sanfilippo with subtitle "creator of redis" - and It just happens that I did write a key-val and more generally working on a database in the meantime.

I don't know how to feel about this lmao

herdrick•7h ago
Good stuff. Are star count and forks etc. the criteria for inclusion of repos? Lots of repos result in "Repository not found".
andriamanitra•6h ago
That's really neat! I found a bunch of cool repositories I had never heard of by looking up my username and a few of my favorite projects.
Lerc•3h ago
I get repo not found for a lot of my things. Like https://github.com/Lerc/stackie just says

"Error: Repository not found: Lerc/stackie"

Imustaskforhelp•1h ago
https://puzer.github.io/github_recommender/#p=eyJ0IjoicHJvZm...

I have got around 1800k projects starred. Usually its just that I had lost my bookmarks once and I lost a lot of github projects so I decided to use stars as my bookmarks or even as whatever I was feeling that time so I have starred some 100 projects or so just because I think they were interesting just enough and nobody starred them so to show my support

Supporting is also another aspect, I really like to share my support and I feel like even these tiny actions at scale really help these projects whether gaining legitimacy or otherwise

I have been such a star fanatic that I have even opened up a github issue about who are the people who have starred the most projects just to give a clear referrence

I have even downloaded all the readme.mds of my github projects that I have starred and made a simple html vibe coded project so that I can view them manually and search them similar to algolia you could say.

Oh btw there are some gists which can help you list all the stars of a person in github which I used to get the star list (or list of repos) then downloading all their readme.mds and converting that as such. Its on my other computer but I should probably back it up as well

I wish there was something like github stars for the whole web in whole. Yes bookmarks exist but a more public form of bookmarks in a way similar to github stars without monetizing in the front (yes I know they are doing AI shenanigans in the background)

Github is still an Okay platform so much so that I nowadays am thinking of uploading media in github wiki for projects instead of youtube. Especially for open source projects, plus even github wiki's can be downloaded via git whereas youtube tries to do everything in control to make you stop making it download so much so that recently they made some changes downstream that even yt-dlp now requires deno or npm engine and the solution is always hacky/ cat and mouse game of sorts.

I don't think that there are any services which can provide the amount of free bandwidth github provides in the way it does. Sure one can get

To be honest, if someone wants, they can probably use ovh or upcloud's zero have unlimited egress with fair use policy

that fair use policy though is basically just that your server would first have I think around 1gbps or 500mbps or like high bandwidth access but then they would cap it to something to 100mbps and ovh can throttle

Upcloud has like an extremely high fair use acceptance policy around 24 TB I think after which they throttle a 1gbps connection to 100mbps which in many vps's could be the highest connection itself and 100mbps aint bad

But also pardon me for this but I asked chatgpt and it seems that civo provides completely unrestricted

Extra Small 1 GB 1 core 30GB NVMe FREE $5.43

Upcloud's around (3.50 euros for the same thing) but if your project is getting even more than 24 TB and you want like other options there are always options

So like in a sense, there just isn't a point in either self hosting and I feel like github can be the freemium thing from youtube to something which can be transitioned to.

Just me rambling but I feel like in the early days Youtube used one of the deals to get their bandwidths as well. I feel as if there are companies which can do that too and Youtube is moving in backwards direction and things like fediverse peertube with genuinely unlimited bandwidth are very much possible for very cheap.

Youtube's monopoly only so much as we wish, its the channels monopolies and the viewers, architecturally its not much big issue as I mentioned previously.

EDIT: Looks like I got side tracked but overall, I am really impressed by your project and its really good, kudos!

“Erdos problem #728 was solved more or less autonomously by AI”

https://mathstodon.xyz/@tao/115855840223258103
316•cod1r•6h ago•199 comments

OLED Not for Me

https://nuxx.net/blog/2026/01/09/oled-not-for-me/
10•c0nsumer•55m ago•11 comments

Maine's black market for baby eels

https://www.pressherald.com/2025/09/09/maines-black-market-for-baby-eels-is-spawning-a-crime-thri...
15•noleary•1h ago•2 comments

Flock Hardcoded the Password for America's Surveillance Infrastructure 53 Times

https://nexanet.ai/blog/53-times-flocksafety-hardcoded-the-password-for-americas-surveillance-inf...
347•fuck_flock•11h ago•115 comments

JavaScript Demos in 140 Characters

https://beta.dwitter.net
209•themanmaran•9h ago•48 comments

RTX 5090 and Raspberry Pi: Can it game?

https://scottjg.com/posts/2026-01-08-crappy-computer-showdown/
183•scottjg•9h ago•72 comments

Greenland sharks maintain vision for centuries through DNA repair mechanism

https://phys.org/news/2026-01-eye-greenland-sharks-vision-centuries.html
33•pseudolus•3d ago•5 comments

How Markdown took over the world

https://www.anildash.com/2026/01/09/how-markdown-took-over-the-world/
185•zdw•10h ago•151 comments

How will the miracle happen today?

https://kk.org/thetechnium/how-will-the-miracle-happen-today/
394•zdw•5d ago•214 comments

Show HN: Rocket Launch and Orbit Simulator

https://www.donutthejedi.com/
108•donutthejedi•9h ago•34 comments

Show HN: Scroll Wikipedia like TikTok

https://quack.sdan.io
192•sdan•10h ago•51 comments

Robotopia: A 3D, first-person, talking simulator

https://elbowgreasegames.substack.com/p/introducing-robotopia-a-3d-first
30•psawaya•1d ago•10 comments

Scientists discover oldest poison, on 60k-year-old arrows

https://www.nytimes.com/2026/01/07/science/poison-arrows-south-africa.html
104•noleary•1d ago•36 comments

Cloudflare CEO on the Italy fines

https://twitter.com/eastdakota/status/2009654937303896492
451•sidcool•12h ago•642 comments

Favorite Tech Museums

https://aresluna.org/fav-tech-museums/
22•justincormack•4d ago•12 comments

Show HN: Miditui – a terminal app/UI for MIDI composing, mixing, and playback

https://github.com/minimaxir/miditui
17•minimaxir•1d ago•2 comments

Start your meetings at 5 minutes past

https://philipotoole.com/start-your-meetings-at-5-minutes-past/
37•otoolep•6h ago•57 comments

My article on why AI is great (or terrible) or how to use it

https://matthewrocklin.com/ai-zealotry/
88•akshayka•10h ago•137 comments

The rise and fall of the company behind Reader Rabbit (2018)

https://theoutline.com/post/6293/reader-rabbit-history-the-learning-company-zoombinis-carmen-sand...
10•mmcclure•1d ago•2 comments

Changes to Android Open Source Project

https://source.android.com/
5•TechTechTech•2d ago•1 comments

Kagi releases alpha version of Orion for Linux

https://help.kagi.com/orion/misc/linux-status.html
362•HelloUsername•15h ago•253 comments

Show HN: I made a memory game to teach you to play piano by ear

https://lend-me-your-ears.specr.net
435•vunderba•11h ago•156 comments

Deno has made its PyPI distribution official

https://github.com/denoland/deno/issues/31254
33•zahlman•7h ago•24 comments

Replit (YC W18) Is Hiring

https://jobs.ashbyhq.com/replit
1•amasad•10h ago

Show HN: Yellopages – New tab Chrome extension

https://yellopages.kawaicheung.io/
13•kiwigod17•1d ago•3 comments

QtNat – Open you port with Qt UPnP

http://renaudguezennec.eu/index.php/2026/01/09/qtnat-open-you-port-with-qt/
40•jandeboevrie•8h ago•33 comments

How to store a chess position in 26 bytes (2022)

https://ezzeriesa.notion.site/How-to-store-a-chess-position-in-26-bytes-using-bit-level-magic-df1...
83•kurinikku•13h ago•71 comments

Show HN: Similarity = cosine(your_GitHub_stars, Karpathy) Client-side

https://puzer.github.io/github_recommender/
127•puzer•3d ago•35 comments

Show HN: A website that auctions itself daily

https://www.thedailyauction.com/
26•nsomani•1d ago•8 comments

How to code Claude Code in 200 lines of code

https://www.mihaileric.com/The-Emperor-Has-No-Clothes/
728•nutellalover•1d ago•226 comments