frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Language Models Pack Billions of Concepts into 12,000 Dimensions

https://nickyoder.com/johnson-lindenstrauss/
77•lawrenceyan•3h ago•30 comments

Grapevine canes can be converted into plastic-like material that will decompose

https://www.sdstate.edu/news/2025/08/can-grapevines-help-slow-plastic-waste-problem
293•westurner•9h ago•182 comments

Betty Crocker broke recipes by shrinking boxes

https://www.cubbyathome.com/boxed-cake-mix-sizes-have-shrunk-80045058
334•Avshalom•9h ago•341 comments

Starlink is currently experiencing a service outage

https://www.starlink.com/
50•thallium205•2h ago•20 comments

Omarchy on CachyOS

https://github.com/mroboff/omarchy-on-cachyos
18•theYipster•2h ago•6 comments

A qualitative analysis of pig-butchering scams

https://arxiv.org/abs/2503.20821
58•stmw•3h ago•9 comments

Which NPM package has the largest version number?

https://adamhl.dev/blog/largest-number-in-npm-package/
67•genshii•4h ago•23 comments

Which colours dominate movie posters and why?

https://stephenfollows.com/p/which-colours-dominate-movie-posters-and-why
84•FromTheArchives•2d ago•12 comments

Celestia – real-time 3D visualization of space

https://celestiaproject.space/
24•LordNibbler•2h ago•3 comments

Show HN: Dagger.js – A buildless, runtime-only JavaScript micro-framework

https://daggerjs.org
49•TonyPeakman•6h ago•32 comments

PythonBPF – Writing eBPF Programs in Pure Python

https://xeon.me/gnome/pythonbpf/
13•JNRowe•2d ago•0 comments

Repetitive negative thinking associated with cognitive decline in older adults

https://bmcpsychiatry.biomedcentral.com/articles/10.1186/s12888-025-06815-2
428•redbell•19h ago•168 comments

Analyzing the memory ordering models of the Apple M1

https://www.sciencedirect.com/science/article/pii/S1383762124000390
71•charles_irl•3d ago•14 comments

OCSP Service Has Reached End of Life

https://letsencrypt.org/2025/08/06/ocsp-service-has-reached-end-of-life
170•pfexec•11h ago•48 comments

For Good First Issue – A repository of social impact and open source projects

https://forgoodfirstissue.github.com/
28•Brysonbw•5h ago•6 comments

Learning Lens Blur Fields

https://blur-fields.github.io/
24•bookofjoe•3d ago•1 comments

Page Object (2013)

https://martinfowler.com/bliki/PageObject.html
25•adityaathalye•4d ago•7 comments

Titania Programming Language

https://github.com/gingerBill/titania
74•MaximilianEmel•9h ago•17 comments

Death to Type Classes

https://jappie.me/death-to-type-classes.html
11•zeepthee•2d ago•1 comments

You’re a slow thinker. Now what?

https://chillphysicsenjoyer.substack.com/p/youre-a-slow-thinker-now-what
367•sebg•4d ago•150 comments

Americans Crushed by Auto Loans as Defaults and Repossessions Surge

https://www.carscoops.com/2025/09/auto-loan-delinquencies-are-off-the-dial-and-even-prime-borrowe...
33•toomuchtodo•2h ago•34 comments

Why We Spiral

https://behavioralscientist.org/why-we-spiral/
281•gmays•16h ago•75 comments

Writing an operating system kernel from scratch

https://popovicu.com/posts/writing-an-operating-system-kernel-from-scratch/
275•Bogdanp•15h ago•50 comments

Trigger Crossbar

https://serd.es/2025/09/14/Trigger-crossbar.html
60•zdw•9h ago•8 comments

AMD Turin PSP binaries analysis from open-source firmware perspective

https://blog.3mdeb.com/2025/2025-09-11-gigabyte-mz33-ar1-blob-analysis/
46•pietrushnic•9h ago•6 comments

Nicu's test website made with SVG (2007)

https://svg.nicubunu.ro/
150•caminanteblanco•16h ago•85 comments

Decentralized YouTube alternative adds livestream scheduling in new release

https://news.itsfoss.com/peertube-7-3/
63•MilnerRoute•4h ago•14 comments

Introduction to GrapheneOS

https://dataswamp.org/~solene/2025-01-12-intro-to-grapheneos.html
173•renehsz•4d ago•168 comments

Read to forget

https://mo42.bearblog.dev/read-to-forget/
186•diymaker•18h ago•47 comments

Gentoo AI Policy

https://wiki.gentoo.org/wiki/Project:Council/AI_policy
137•simonpure•8h ago•116 comments
Open in hackernews

Language Models Pack Billions of Concepts into 12,000 Dimensions

https://nickyoder.com/johnson-lindenstrauss/
72•lawrenceyan•3h ago

Comments

bigdict•2h ago
What's the point of the relu in the loss function? Its inputs are nonnegative anyway.
Nevermark•1h ago
Let's try to keep things positive.
GolDDranks•1h ago
I wondered the same. Seems like it would just make a V-shaped loss around the zero, but abs has that property already!
fancyfredbot•6m ago
RELU would have made it flat below zero ( _/ not \/). Adding the abs first just makes RELU do nothing.
fancyfredbot•9m ago
I thought the belt and braces approach was a valuable contribution to AI safety. Better safe than sorry with these troublesome negative numbers!
js8•1h ago
You can also imagine a similar thing on binary vectors. There two vectors are "orthogonal" if they share no bits that are set to one. So you can encode huge number of concepts using only small number of bits in modestly sized vectors, and most of them will be orthogonal.
phreeza•1h ago
If they are only orthogonal if they share no bits that are set to one, only one vector, the complement, will be orthogonal, no?
yznovyak•1h ago
I don't think so. For n=3 you can have 000, 001, 010, 100. All 4 (n+1) are pairwise orthogonal. However, I don't think js8 is correct as it looks like in 2^n you can't have more than n+1 mutually orthogonal vectors, as if any vector has 1 in some place, no other vector can have 1 in the same place.
prerok•1h ago
Hmm, I think one correction: is (0,0,0) actually a vector? I think that, by definition, an n-dimentional space can have at most n vectors which are all orthogonal to one another.
js8•46m ago
It's not correct to call them orthogonal because I don't think the definition is a dot product. But that aside, yes, orthogonal basis can only have as much elements as dimensions. The article also mentions that, and then introduces "quasi-orthogonality", which means dot product is not zero but very small. On bitstrings, it would correspond to overlap on only small number of bits. I should have been clearer in my offhand remark. :-)
asplake•1h ago
By the original definition, they can share bits that are set to zero and still be orthogonal. Think of the bits as basis vectors – if they have none in common, they are orthogonal.
js8•1h ago
For example, 1010 and 0101 are orthogonal, but 1010 and 0011 are not (share the 3rd bit). Though calling them orthogonal is not quite right.
henearkr•1h ago
Your definition of orthogonal is incorrect, in this case.

In the case of binary vectors, don't forget you are working with the finite field of two elements {0, 1}, and use XOR.

dwohnitmok•1h ago
These set of intuitions and the Johnson-Lindenstrauss lemma in particular are what power a lot of the research effort behind SAEs (Sparse Autoencoders) in the field of mechanistic interpretability in AI safety.

A lot of the ideas are explored in more detail in Anthropic's 2022 paper that's one of the foundational papers in SAE research: https://transformer-circuits.pub/2022/toy_model/index.html

emil-lp•1h ago
Where can I read the actual paper? Where is it published?
yorwba•1h ago
That is the actual paper, it's published on transformer-circuits.pub.
emil-lp•1h ago
It's not peer-reviewed?
yorwba•40m ago
Google Scholar claims 380 citations, which is, I think, a respectable number of peers to have reviewed it.
golem14•29m ago
Unless it’s part of a link review farm. I haven’t looked, and you are probably correct; but I would do a bit of research before making any assumptions
emil-lp•27m ago
That's not at all how peer review works.
aabhay•1h ago
My intuition of this problem is much simpler — assuming there’s some rough hierarchy of concepts, you can guesstimate how many concepts can exist in a 12,000-d space by taking the combinatorial of the number of dimensions. In that world, each concept is mutually orthgonal with every other concept in at least some dimension. While that doesn’t mean their cosine distance is large, it does mean you’re guaranteed a function that can linearly separate the two concepts.

It means you get 12,000! (Factorial) concepts in the limit case, more than enough room to fit a taxonomy

Morizero•55m ago
That number is far, far, far greater than the number of atoms in the universe (~10^43741 >>>>>>>> ~10^80).
am17an•17m ago
Somehow that's still an understatement
cleansy•6m ago
Not surprising since concepts are virtual. There is a person, a person with a partner is a couple. A couple with a kid is a family. That’s 5 concepts alone.
OgsyedIE•51m ago
You can only get 12,000! concepts if you pair each concept with an ordering of the dimensions, which models do not do. A vector in a model that has [weight_1, weight_2, ... weight_12000] is identical to the vector [weight_2, weight_1, ..., weight_12000] within the larger model.

Instead, a naive mental model of a language model is to have a positive, negative or zero trit in each axis: 3^12,000 concepts, which is a much lower number than 12000!. Then in practice, almost every vector in the model has all but a few dozen identified axes zeroed because of the limitations of training time.

yorwba•1h ago
I think the author is too focused on the case where all vectors are orthogonal and as a consequence overestimates the amount of error that would be acceptable in practice. The challenge isn't keeping orthogonal vectors almost orthogonal, but keeping the distance ordering between vectors that are far from orthogonal. Even much smaller values of epsilon can give you trouble there.

So the claim that "This research suggests that current embedding dimensions (1,000-20,000) provide more than adequate capacity for representing human knowledge and reasoning." is way too optimistic in my opinion.

sigmoid10•12m ago
Since vectors are usually normalized to the surface of an n-sphere and the relevant distance for outputs (via loss functions) is cosine similarity, "near orthogonality" is what matters in practice. Especially since you are working with limited precision floating point numbers anyways on any realistic hardware.

Btw. this is not an original idea from the linked blog or the youtube video it references. The relevance of this lemma for AI (or at least neural machine learning) was brought up more than a decade ago by C. Eliasmith as far as I know. So it has been around long before architectures like GPT that could actually be realistically trained on such insanely high dimensional world knowledge.

niemandhier•52m ago
Wow, I think I might just have grasped one of the sources of the problems we keep seeing with LLMs.

Johnson-Lichtenstrauss guarantees a distance preserving embedding for a finite set of points into a space with a dimension based on the number of points.

It does not say anything about preserving the underlying topology of the contious high dimensional manifold, that would be Takens/Whitney-style embedding results (and Sauer–Yorke for attractors).

The embedding dimensions needed to fulfil Takens are related to the original manifolds dimension and not the number of points.

It’s quite probable that we observe violations of topological features of the original manifold, when using our to low dimensional embedded version to interpolate.

I used AI to sort the hodge pudge of math in my head into something another human could understand, edited result is below:

=== AI in use === If you want to resolve an attractor down to a spatial scale rho, you need about n ≈ C * rho^(-d_B) sample points (here d_B is the box-counting/fractal dimension).

The Johnson–Lindenstrauss (JL) lemma says that to preserve all pairwise distances among n points within a factor 1±ε, you need a target dimension

k ≳ (d_B / ε^2) * log(C / rho).

So as you ask for finer resolution (rho → 0), the required k must grow. If you keep k fixed (i.e., you embed into a dimension that’s too low), there is a smallest resolvable scale

rho* (roughly rho* ≳ C * exp(-(ε^2/d_B) * k), up to constants),

below which you can’t keep all distances separated: points that are far on the true attractor will show up close after projection. That’s called “folding” and might be the source of some of the problems we observe .

=== AI end ===

Bottom line: JL protects distance geometry for a finite sample at a chosen resolution; if you push the resolution finer without increasing k, collisions are inevitable. This is perfectly consistent with the embedding theorems for dynamical systems, which require higher dimensions to get a globally one-to-one (no-folds) representation of the entire attractor.

If someone is bored and would like to discuss this, feel free to email me.

sdl•31m ago
So basically the map projection problem [1] in higher dimensions?

[1] https://en.m.wikipedia.org/wiki/Map_projection

rossant•41m ago
Tangential, but the ChatGPT vibe of most of the article is very distracting and annoying. And I say this as someone who consistently uses AI to refine my English. However, I try to avoid letting it reformulate too dramatically, asking it specifically to only fix grammar and non-idiomatic parts while keeping the tone and formulation as much as possible.

Beyond that, this mathematical observation is genuinely fascinating. It points to a crucial insight into how large language models and other AI systems function. By delving into the way high-dimensional data can be projected into lower-dimensional spaces while preserving its structure, we see a crucial mechanism that allows these models to operate efficiently and scale effectively.