Everything Is Correlated

https://gwern.net/everything

138•gmays•9h ago

Comments

eisvogel•6h ago

It's just as I suspected - there are NO coincidences.

andsoitis•5h ago

there is but a single unfolding, and everything is part of it

dang•5h ago

Related. Others?

Everything Is Correlated - https://news.ycombinator.com/item?id=19797844 - May 2019 (53 comments)

stouset•5h ago

Correlated, you mean?

pnt12•4h ago

Those would be all articles posted in HN :)

senko•5h ago

The article missed the chance to include the quote from that standard compendium of information and wisdom, The Hitchhiker's Guide to the Galaxy:

> Since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation — every sun, every planet, their orbits, their composition and their economic and social history from, say, one small piece of fairy cake.

sayamqazi•5h ago

Wouldnt you need the T_zero configuration of the universe for this to work?

Given different T_zero configs of matter and energies T_current would be different. and there are many pathways that could lead to same physical configuration (position + energies etc) with different (Universe minus cake) configurations.

Also we are assuming there is no non-deterministic processed happening at all.

senko•4h ago

I am assuming integrating over all possible configurations would be a component of The Total Perspective Vortex.

After all, Feynman showed this is in principle possible, even with local nondeterminism.

(this being a text medium with a high probability of another commenter misunderstanding my intent, I must end this with a note that I am, of course, BSing :)

eru•59m ago

> Wouldnt you need the T_zero configuration of the universe for this to work?

Why? We learn about the past by looking at the present all the time. We also learn about the future by looking at the present.

> Also we are assuming there is no non-deterministic processed happening at all.

Depends on the kind of non-determinism. If there's randomness, you 'just' deal with probability distributions instead. Since you have measurement error anyway, you need to do that anyway.

There are other forms of non-determinism, of course.

prox•2h ago

In Buddhism we have dependent origination : https://en.wikipedia.org/wiki/Prat%C4%ABtyasamutp%C4%81da

lioeters•2h ago

Also the concept of implicate order, proposed by the theoretical physicist David Bohm.

> Bohm employed the hologram as a means of characterising implicate order, noting that each region of a photographic plate in which a hologram is observable contains within it the whole three-dimensional image, which can be viewed from a range of perspectives.

> That is, each region contains a whole and undivided image.

> "There is the germ of a new notion of order here. This order is not to be understood solely in terms of a regular arrangement of objects (e.g., in rows) or as a regular arrangement of events (e.g., in a series). Rather, a total order is contained, in some implicit sense, in each region of space and time."

> "Now, the word 'implicit' is based on the verb 'to implicate'. This means 'to fold inward' ... so we may be led to explore the notion that in some sense each region contains a total structure 'enfolded' within it."

petters•5h ago

If two things e.g. both change over time, they will be correlated. I think it can be good to keep this article in mind

eru•55m ago

> If two things e.g. both change over time, they will be correlated.

No?

You can have two independent random walks. Eg flip a coin, gain a dollar or lose a dollar. Do that to times in parallel. Your two account balances will change over time, but they won't be correlated.

Evidlo•5h ago

This is such a massive article. I wish I had the ability to grind out treatises like that. Looking at other content on the guy's website, he must be like a machine.

tmulc18•5h ago

gwern is goated

kqr•4h ago

IIRC Gwern lives extremely frugally somewhere remote and is thus able to spend a lot of time on private research.

tux3•4h ago

IIRC people funded moving gwern to the bay not too long ago.

pas•3h ago

lots of time, many iterations, affinity for the hard questions, some expertise in research (and Haskell). oh, and also it helps if someone is funding your little endeavor :)

aswegs8•2h ago

I wish I would be even able to read things like that.

2rsf•4h ago

Did they quote https://www.tylervigen.com/spurious-correlations ?

ezomode•2h ago

Who should quote who? The article is from 2014.

apples_oranges•4h ago

People didn't always use statistics to discover truths about the world.

This, once developed, just happened to be a useful method. But given the abuse using those methods, and the proliferation of stupidity disguised as intelligence, it's always fitting to question it, and this time with this correlation noise observation.

Logic, fundamental knowledge about domains, you need that first. Just counting things without understanding them in at least one or two other ways, is a tempting invitation for misleading conclusions.

mnky9800n•42m ago

There is a quote from George Lucas where he talks about how when new things come into a society people have a tend to over do it.

https://www.youtube.com/watch?v=VEIrQUXm_hY

apples_oranges•31m ago

Nice, yeah. With many movies one has to ask: What's the point? Especially all Disney Star Wars..

st-keller•4h ago

„This renders the meaning of significance-testing unclear; it is calculating precisely the odds of the data under scenarios known a priori to be false.“

I cannot see the problem in that. To get to meaningful results we often calculate with simplyfied models - which are known to be false in a strict sense. We use Newtons laws - we analyze electric networks based on simplifications - a bank-year used to be 360 days! Works well.

What did i miss?

whyever•4h ago

It's a quantitative problem. How big is the error introduced by the simplification?

bjornsing•3h ago

The problem is basically that you can always buy a significant result with money (large enough N always leads to ”significant” result). That’s a serious issue if you see research as pursuit of truth.

syntacticsalt•2h ago

Reporting effect size mitigates this problem. If observed effect size is too small, its statistical significance isn't viewed as meaningful.

bjornsing•55m ago

Sure (and of course). But did you see the effect size histogram in the OP?

thyristan•3h ago

There is a known maximum error introduced by those simplifications. Put the other way around, Einstein is a refinement of Newton. Special relativity converges towards Newtonian motion for low speeds.

You didn't really miss anything. The article is incomplete, and wrongly suggests that something like "false" even exists in statistics. But really something is only false "with a x% probability of it actually being true nonetheless". Meaning that you have to "statistic harder" if you want to get x down. Usually the best way to do that is to increase the number of tries/samples N. What the article gets completely wrong is that for sufficiently large N, you don't have to care anymore, and might as well use false/true as absolutes, because you pass the threshold of "will happen once within the lifetime of a bazillion universes" or something.

Problem is, of course, that lots and lots of statistics are done with a low N. Social sciences, medicine, and economy are necessarily always in the very-low-N range, and therefore always have problematic statistics. And try to "statistic harder" without being able to increase N, thereby just massaging their numbers enough to get a desired conclusion proved. Or just increase N a little, claiming to have escaped the low-N-problem.

syntacticsalt•2h ago

A frequentist interpretation of inference assumes parameters have fixed, but unknown values. In this paradigm, it is sensible to speak of the statement "this parameter's value is zero" as either true or false.

I do not think it is accurate to portray the author as someone who does not understand asymptotic statistics.

PeterStuer•3h ago

Back when I wrote a loan repayment calculator, there were 47 common different ways to 'day count' (used in calculating payments for incomplete repayment periods, e.g in monthly payments, what is the 1st-13th of aug 2025 as a fraction of aug 2025?).

syntacticsalt•4h ago

I don't disagree with the title, but I'm left wondering what they want us to do about it beyond hinting at causal inference. I'd also be curious what the author thinks of minimum effect sizes (re: Implication 1) and noninferiority testing (re: Implication 2).

jongjong•3h ago

Also, I'm convinced that the reason humans intuitively struggle to figure out causality is because the vast majority of causes and effects are self-reinforcing cycles and go both ways. There was little evolutionary pressure for us to understand the concept of causality because it doesn't play a strong role in natural selection.

For example, eat a lot and you will gain weight, gain weight and you will feel more hungry and will likely eat more.

Or exercise more and it becomes easier to exercise.

Earning money becomes easier as you have more money.

Public speaking becomes easier as you do it more and the more you do it, the easier it becomes.

Etc...

ctenb•3h ago

> Public speaking becomes easier as you do it more and the more you do it, the easier it becomes.

That's saying the same thing twice :)

jongjong•36m ago

Haha yes. I meant to say the more public speaking you do, the easier it gets so the more often you want to do it.

renox•1h ago

> Or exercise more and it becomes easier to exercise.

Only if you don't injure yourself while exercising.

jongjong•26m ago

A lot of things can happen to break a self-perpetuating cycle. But it's usually some extreme event. The cycle keeps optimizing in a particular direction and this eventually leads to an extreme situation which becomes unsustainable. The natural equilibrium of a self-reinforcing cycle is not static but drifting towards some extreme unstable state. There is usually a point where it breaks down.

But I suspect that being able to figure out causation doesn't matter much from a survival or reproduction perspective because cause and effect are just labels.

Reality in a self-perpetuating cycle is probably like Condition A is 70% responsible and Condition B is 30% responsible for a problem but they feedback and exacerbate each other... You could argue that Condition A is the cause and Condition B is the effect because B < A but that's not quite right IMO. Also, it's not quite right to say that because A happened first, that A is the cause of a severe problem... The problem would never have gotten so bad to such extent without feedback from B.

01HNNWZ0MV43FF•3h ago

> For example, while looking at biometric samples with up to thousands of observations, Karl Pearson declared that a result departing by more than 3 standard deviations is “definitely significant.”

Wait. Sir Arthur Conan Doyle lived at basically the exact same time as this Karl Pearson.

Is that why the Sherlock Holmes stories had handwriting analysis so frequently? Was there just pop science going around at the time that like, let's find correlations between anything and anything, and we can see that a criminal mastermind like Moriarty would certainly cross their T's this way and not that way?

cluckindan•2h ago

I wonder if this tendency to correlate truly holds for everything? Intuitively it more or less demonstrates that nature tends to favor zero-sum games. Maybe analyzing correlations within the domain of theoretical physics would highlight true non-correlations in some particular approaches? (pun only slightly intended)

eru•57m ago

> Intuitively it more or less demonstrates that nature tends to favor zero-sum games.

Please explain.

simsla•2h ago

This relates to one of my biggest pet peeves.

People interpret "statistically significant" to mean "notable"/"meaningful". I detected a difference, and statistics say that it matters. That's the wrong way to think about things.

Significance testing only tells you the probability that the measured difference is a "good measurement". With a certain degree of confidence, you can say "the difference exists as measured".

Whether the measured difference is significant in the sense of "meaningful" is a value judgement that we / stakeholders should impose on top of that, usually based on the magnitude of the measured difference, not the statistical significance.

It sounds obvious, but this is one of the most common fallacies I observe in industry and a lot of science.

For example: "This intervention causes an uplift in [metric] with p<0.001. High statistical significance! The uplift: 0.000001%." Meaningful? Probably not.

V__•1h ago

I really like this video [1] from 3blue1brown, where he proposes to think about significance as a way to update the probability. One positive test (or in this analog a study) updates the probability by X % and thus you nearly always need more tests (or studies) for a 'meaningful' judgment.

[1] https://www.youtube.com/watch?v=lG4VkPoG3ko

jpcompartir•56m ago

And if we increase N enough we will be able to find these 'good measurements' and 'statistically significant differences' everywhere.

Worse still if we did not agree in advance what hypotheses we were testing, and go looking back through historical data to find 'statistically significant' correlations.

ants_everywhere•24m ago

Which means that statistical significance is really a measure of whether N is big enough

prasadjoglekar•43m ago

For all the shit that HN gives to MBAs, one thing they instill into you during the Managerial Stats class is Stag Sig not the same as Managerial Sig.

ants_everywhere•19m ago

> Significance testing only tells you the probability that the measured difference is a "good measurement". With a certain degree of confidence, you can say "the difference exists as measured".

Significance does not tell you this. Significance can be arbitrarily close to 0 while the probability of the null hypothesis being true is simultaneously arbitrarily close to one

hshshshshsh•2h ago

Doesn't everything means all things that exist in universe and since they exist in same universe they are correlated?

nathan_compton•1h ago

Really classic "rationalist" style writing: a soup of correct observations about statistical phenomena with chunks of weird political bullshit thrown in here and there. For example: "On a more contemporary note, these theoretical & empirical considerations also throw doubt on concerns about ‘algorithmic bias’ or inferences drawing on ‘protected classes’: not drawing on them may not be desirable, possible, or even meaningful."

This is such a bizarre sentence. The way its tossed in, not explained in any way, not supported by references, etc. Like I guess the implication being made is something like "because there is a hidden latent variable that determines criminality and we can never escape from correlations with it, its ok to use "is_black" in our black box model which decides if someone is going to get parole? Ridiculous. Does this really "throw doubt" on whether we should care about this?

The concerns about how models work are deeper than the statistical challenges of creating or interpreting them. For one thing, all the degrees of freedom we include in our model selection process allow us to construct models which do anything that we want. If we see a parole model which includes "likes_hiphop" as an explanatory variable we ought to ask ourselves who decided that should be there and whether there was an agenda at play beyond "producing the best model possible."

These concerns about everything being correlated actually warrant much more careful understanding about the political ramifications of how and what we choose to model and based on which variables, because they tell us that in almost any non-trivial case a model is at least partly necessarily a political object almost certainly consciously or subconsciously decorated with some conception of how the world is or ought to be explained.

zahlman•1h ago

> This is such a bizarre sentence. The way its tossed in, not explained in any way,

It reads naturally in context and is explained by the foregoing text. For example, the phrase "these theoretical & empirical considerations" refers to theoretical and empirical considerations described above. The basic idea is that, because everything correlates with everything else, you can't just look at correlations and infer that they're more than incidental. The political implications are not at all "weird", and follow naturally. The author observes that social scientists build complex models and observe huge amounts of variables, which allows them to find correlations that support their hypothesis; but these correlations, exactly because they can be found everywhere, are not anywhere near as solid evidence as they are presented as being.

> Like I guess the implication being made is something like "because there is a hidden latent variable that determines criminality and we can never escape from correlations with it, its ok to use "is_black" in our black box model which decides if someone is going to get parole?

No, not at all. The implication is that we cannot conclude that the black box model actually has an "is_black" variable, even if it is observed to have disparate impact on black people.

nathan_compton•46m ago

Sorry, but I don't think that is a reasonable read. The phrase "not drawing on them may not be desirable, possible, or even meaningful" is a political statement except perhaps for "possible," which is just a flat statement that its hard to separate causal variables from non-causal ones.

Nothing in the statistical observation that variables tend to be correlated suggests we should somehow reject the moral perspective that that its desirable for a model to be based on causal rather than merely correlated variables, even if finding such variables is difficult or even, impossible to do perfectly. And its certainly also _meaningful_ to do so, even if there are statistical challenges. A model based on "socioeconomic status" has a totally different social meaning than one based on race, even if we cannot fully disentangle the two statistically. He is mixing up statistical and social, moral and even philosophical questions in a way which is, in my opinion, misleading.

nnnnico•1h ago

There is no zero in the real world

Go is still not good

Io_uring, kTLS and Rust for zero syscall HTTPS server

DeepSeek-v3.1

Everything Is Correlated

Control shopping cart wheels with your phone (2021)

LabPlot: Free, open source and cross-platform Data Visualization and Analysis

Code formatting comes to uv experimentally

VHS-C: when a lazy idea stumbles towards perfection [video]

The Minecraft code no one has solved (2024) [video]

An interactive guide to SVG paths

From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf]

Crimes with Python's Pattern Matching (2022)

Weaponizing image scaling against production AI systems

1981 Sony Trinitron KV-3000R: The Most Luxurious Trinitron [video]

How does the US use water?

Building AI products in the probabilistic era

AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'

Elegant mathematics bending the future of design

Show HN: OS X Mavericks Forever

My other email client is a daemon

Beyond sensor data: Foundation models of behavioral data from wearables

How well does the money laundering control system work?

AI tooling must be disclosed for contributions

Benchmarks for Golang SQLite Drivers

Using Podman, Compose and BuildKit

Scientists No Longer Find X Professionally Useful, and Have Switched to Bluesky

Miles from the ocean, there's diving beneath the streets of Budapest

Privately-Owned Rail Cars

4chan will refuse to pay daily online safety fines, lawyer tells BBC

Skill issues – Dialectical Behavior Therapy and its discontents (2024)

Go is still not good

Io_uring, kTLS and Rust for zero syscall HTTPS server

DeepSeek-v3.1

Everything Is Correlated

Control shopping cart wheels with your phone (2021)

LabPlot: Free, open source and cross-platform Data Visualization and Analysis

Code formatting comes to uv experimentally

VHS-C: when a lazy idea stumbles towards perfection [video]

The Minecraft code no one has solved (2024) [video]

An interactive guide to SVG paths

From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf]

Crimes with Python's Pattern Matching (2022)

Weaponizing image scaling against production AI systems

1981 Sony Trinitron KV-3000R: The Most Luxurious Trinitron [video]

How does the US use water?

Building AI products in the probabilistic era

AWS CEO says using AI to replace junior staff is 'Dumbest thing I've ever heard'

Elegant mathematics bending the future of design

Show HN: OS X Mavericks Forever

My other email client is a daemon

Beyond sensor data: Foundation models of behavioral data from wearables

How well does the money laundering control system work?

AI tooling must be disclosed for contributions

Benchmarks for Golang SQLite Drivers

Using Podman, Compose and BuildKit

Scientists No Longer Find X Professionally Useful, and Have Switched to Bluesky

Miles from the ocean, there's diving beneath the streets of Budapest

Privately-Owned Rail Cars

4chan will refuse to pay daily online safety fines, lawyer tells BBC

Skill issues – Dialectical Behavior Therapy and its discontents (2024)

Everything Is Correlated

Comments