frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Code review can be better

https://tigerbeetle.com/blog/2025-08-04-code-review-can-be-better/
59•sealeck•2h ago•12 comments

SK hynix dethrones Samsung as world’s top DRAM maker

https://koreajoongangdaily.joins.com/news/2025-08-15/business/tech/Thanks-Nvidia-SK-hynix-dethrones-Samsung-as-worlds-top-DRAM-maker-for-first-time-in-over-30-years/2376834
37•ksec•3d ago•2 comments

Show HN: I was curious about spherical helix, ended up making this visualization

https://visualrambling.space/moving-objects-in-3d/
612•damarberlari•11h ago•111 comments

Gemma 3 270M re-implemented in pure PyTorch for local tinkering

https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/12_gemma3
297•ModelForge•11h ago•46 comments

A statistical analysis of Rotten Tomatoes

https://www.statsignificant.com/p/is-rotten-tomatoes-still-reliable
19•m463•1h ago•3 comments

How to stop feeling lost in tech: the wafflehouse method

https://www.yacinemahdid.com/p/how-to-stop-feeling-lost-in-tech
3•research_pie•22m ago•0 comments

Why are anime catgirls blocking my access to the Linux kernel?

https://lock.cmpxchg8b.com/anubis.html
262•taviso•10h ago•308 comments

Show HN: PlutoPrint – Generate PDFs and PNGs from HTML with Python

https://github.com/plutoprint/plutoprint
82•sammycage•5h ago•17 comments

Launch HN: Channel3 (YC S25) – A database of every product on the internet

85•glawrence13•10h ago•55 comments

Introduction to AT Protocol

https://mackuba.eu/2025/08/20/introduction-to-atproto/
130•psionides•6h ago•66 comments

Visualizing distributions with pepperoni pizza and JavaScript

https://ntietz.com/blog/visualizing-distributions-with-pepperoni-pizza/
5•cratermoon•2d ago•0 comments

Zedless: Zed fork focused on privacy and being local-first

https://github.com/zedless-editor/zed
371•homebrewer•7h ago•223 comments

An Update on Pytype

https://github.com/google/pytype
146•mxmlnkn•8h ago•48 comments

SimpleIDE

https://github.com/jamesplotts/simpleide
20•impendingchange•2h ago•18 comments

Show HN: Luminal – Open-source, search-based GPU compiler

https://github.com/luminal-ai/luminal
85•jafioti•9h ago•44 comments

Coris (YC S22) Is Hiring

https://www.ycombinator.com/companies/coris/jobs/rqO40yy-ai-engineer
1•smaddali•4h ago

Pixel 10 Phones

https://blog.google/products/pixel/google-pixel-10-pro-xl/
343•gotmedium•8h ago•652 comments

Sequoia backs Zed

https://zed.dev/blog/sequoia-backs-zed
288•vquemener•13h ago•188 comments

OPA maintainers and Styra employees hired by Apple

https://blog.openpolicyagent.org/note-from-teemu-tim-and-torin-to-the-open-policy-agent-community-2dbbfe494371
113•crcsmnky•10h ago•42 comments

Vibe coding creates a bus factor of zero

https://www.mindflash.org/coding/ai/ai-and-the-bus-factor-of-0-1608
139•AntwaneB•4h ago•75 comments

Tidewave Web: in-browser coding agent for Rails and Phoenix

https://tidewave.ai/blog/tidewave-web-phoenix-rails
261•kieloo•16h ago•47 comments

Visualizing GPT-OSS-20B embeddings

https://melonmars.github.io/LatentExplorer/embedding_viewer.html
68•melonmars•3d ago•20 comments

Closer to the Metal: Leaving Playwright for CDP

https://browser-use.com/posts/playwright-to-cdp
140•gregpr07•10h ago•97 comments

Lean proof of Fermat's Last Theorem [pdf]

https://imperialcollegelondon.github.io/FLT/blueprint.pdf
69•ljlolel•7h ago•45 comments

Learning about GPUs through measuring memory bandwidth

https://www.evolvebenchmark.com/blog-posts/learning-about-gpus-through-measuring-memory-bandwidth
42•JasperBekkers•11h ago•4 comments

AWS in 2025: Stuff you think you know that's now wrong

https://www.lastweekinaws.com/blog/aws-in-2025-the-stuff-you-think-you-know-thats-now-wrong/
272•keithly•10h ago•170 comments

Mirrorshades: The Cyberpunk Anthology (1986)

https://www.rudyrucker.com/mirrorshades/HTML/
142•keepamovin•17h ago•84 comments

Understanding Moravec's Paradox

https://hexhowells.com/posts/moravecs-paradox.html
16•hexhowells•3d ago•1 comments

The Rise and Fall of Music Ringtones: A Statistical Analysis

https://www.statsignificant.com/p/the-rise-and-fall-of-music-ringtones
49•gmays•3d ago•70 comments

Linear scan register allocation on SSA

https://bernsteinbear.com/blog/linear-scan/
32•surprisetalk•3d ago•3 comments
Open in hackernews

Visualizing GPT-OSS-20B embeddings

https://melonmars.github.io/LatentExplorer/embedding_viewer.html
68•melonmars•3d ago

Comments

kingstnap•6h ago
It's an interesting looking plot I suppose.

My guess is its the 2 largest principle components of the embedding.

But none of the points are labelled? There isn't a writeup on the page or anything?

jablongo•6h ago
Usually PCA doesn't look quite like this so this is likely done using TSNE or UMAP, which are non parametric embeddings (they optimize a loss by modifying the embedded points directly). I can see labels if I mouseover the dots.
terhechte•6h ago
I can see the labels when I hover with the pointer
graphviz•6h ago
What do people learn from visualizations like this?

What is the most important problem anyone has solved this way?

Speaking as somewhat of a co-defendant.

jablongo•6h ago
I lets you inspect what actually constitutes a given cluster, for example it seems like the outer clusters are variations of individual words and their direct translations, rather than synonyms (the ones I saw at least).
minimaxir•5h ago
Not everything has to be directly informative or solve a problem. Sometimes data visualization can look pretty for pretty's sake.

Dimensionality reduction/clustering like this may be less useful for identifying trends in token embeddings, but for other types of embeddings it's extremely useful.

TuringNYC•4h ago
> What do people learn from visualizations like this?

Applying the embeddings model to some dataset of yours of interest, and then a similar visualization, is where it gets cool because you can visually look at clusters and draw conclusions about the closeness of items in your own dataset

_def•6h ago
I have the suspicion that this is how GPT-OSS-20B would generate a visualization of it's embeddings. Happy to learn otherwise.
eddywebs•6h ago
Cool ! Would it possible to generate visualizations of any given open weight model out there ?
minimaxir•5h ago
Yes, it's just yoinking the weights out of the embeddings layer.
numpad0•6h ago
Is this handling Unicode correctly? Seems like a lot of even Latin alphabets are getting mangled.
int_19h•4h ago
It looks like it's not handling UTF-8 at all and displaying it as if it were Latin-1
mkl•3h ago
I don't think it's actually UTF-8. The data is at https://melonmars.github.io/LatentExplorer/embeddings_2d.jso... and contains things like

  "\u00e0\u00a7\u012d\u00e0\u00a6\u013e"
with some characters > 0xff (but none above 0x0143, weirdly).
ashvardanian•6h ago
Any good comparisons of traditional embedding models against embeddings derived from autoregressive language models?
minimaxir•4h ago
They are incomparable. Token embeddings generated with something like word2vec worked well because the networks are shallow and therefore the learned semantic data can be contained solely and independently within the embeddings themselves. Token embeddings as a part of an LLM (e.g. gpt-oss-20b) are conditioned on said LLM and do not have fully independent learned data, although as shown here there still can be some relationships preserved.

Embeddings derived from autoregressive language models apply full attention mechanisms to get something different entirely.

lawlessone•4h ago
what does it mean that some embeddings are close to others in this space?

That they're related or connected or it arbitrary?

Why does it look like a fried egg?

edit: must be related in some way as one of the "droplets" in the bottom left quadrant seems to consist of various versions of the word "parameter"

minimaxir•4h ago
Typically these algorithms cluster by similarity (either euclidian or cosine).

The density of the clusters tend to have trends. In this case, the "yolk" has a lot of bizarre unicode tokens.

esafak•4h ago
Without a way to tune it, this visualization is as much about the dimensionality reduction algorithm used as the embeddings themselves, because trade-offs are unavoidable when you go from a very high dimensional space to a 2D one. I would not read too much into it.
promiseofbeans•1h ago
This demo is a lot more useful for comparing word embeddings: https://www.cs.cmu.edu/~dst/WordEmbeddingDemo/index.html

You can choose which dimensions to show, pick which embeddings to show, and play with vector maths between them in a visual way

It doesn't show the whole set of embeddings, though I am sure someone could fix that, as well as adapting it to use the gpt-oss model instead of the custom (?) mini set it uses.

voodooEntity•4h ago
@Author i would recommend you to give

https://github.com/vasturiano/3d-force-graph

a try, for the text labels you can use

https://github.com/vasturiano/three-spritetext

its based on Three.js and creates great 3D graph visualisations GPU rendered (webgl). This could make it alot more interresting to watch because it could display actual depth (your gpu is gonne run hot but i guess worth it)

just a suggestion.