I don't think Lindley's paradox supports p-circling

https://vilgot-huhn.github.io/mywebsite/posts/20251206_p_circle_lindley/

44•speckx•1mo ago

Comments

senkora•1mo ago

I really like this article.

> One could specify a smallest effect size of interest and compare the plausibility of seeing the reported p-value under that distribution compared to the null distribution. 6 Maier and Lakens (2022) suggest you could do this exercise when planning a test in order to justify your choice of alpha-level

Huh, I’d never thought to do that before. You pretty much have to choose a smallest effect size of interest in order to do a power analysis in the first place, to figure out how many samples to collect, so this is a neat next step to then base significance level off of it.

CrazyStat•1mo ago

In a perfect world everybody would be putting careful thought into their desired (acceptable) type I and type II error rates as part of the experimental design process before they ever collected any data.

Given rampant incentive misalignments (the goal in academic research is often to publish something as much as—or more than—to discover truth), having fixed significance levels as standards across whole fields may be superior in practice.

levocardia•1mo ago

The real problem is that you very often don't have any idea about what your data are going to look like before you collect them; type 1/2 errors depend a lot on how big the sources of variance in your data are. Even a really simple case -- e.g. do students randomly assigned to AM vs PM sessions of a class score better on exams? -- has a lot of unknown parameters: variance of exam scores, variance in baseline student ability, variance of rate of change in score across the semester, can you approximate scores as gaussian or do you need beta, ordinal, or some other model, etc.

Usually you have to go collect data first, then analyze it, then (in an ideal world where science is well-incentivized) replicate your own analysis in a second wave of data collection doing everything exactly the same. Psychology has actually gotten to a point where this is mostly how it works; many other fields have not.

contravariant•1mo ago

Ultimately I think the paradox comes from mixing two paradigms that aren't really designed to be mixed.

That said you can give a Bayesian argument for p-circling provided you have a prior on the power of the test. The details are almost impossible to work out except for a case by case calculation because unless I'm mistake the shape of the p-value distribution when the null-hypothesis does not hold is very ill defined.

However it's quite possible to give some examples where intuitively a p-value of just below 0.05 would be highly suspicious. You just need to mix tests with high power with unclear results. Say for example you're testing the existence of gravity with various objects and you get a probability of <0.04% that objects just stay in the air indefinitely.

CrazyStat•1mo ago

Huh.

This is an interesting post but the author’s usage of Lindley’s paradox seems to be unrelated to the Lindley’s paradox I’m familiar with:

> If we raise the power even further, we get to “Lindley’s paradox”, the fact that p-values in this bin can be less likely then they are under the null.

Lindley’s paradox as I know it (and as described by Wikipedia [1]) is about the potential for arbitrarily large disagreements between frequentist and Bayesian analyses of the same data. In particular, you can have an arbitrarily small p-value (p < epsilon) from the frequentist analysis while at the same time having arbitrarily large posterior probabilities for the null hypothesis model (P(M_0|X) > 1-epsilon) from the Bayesian analysis of the same data, without any particularly funky priors or anything like that.

I don’t see any relationship to the phenomenon given the name of Lindley’s paradox in the blog post.

[1] https://en.wikipedia.org/wiki/Lindley%27s_paradox

gweinberg•1mo ago

I read the page on Lindsey's paradox, and it's astonishing bullshit. It's well known that with sufficiently insane priors you can come up with stupid conclusions. The page asserts that a Bayesian would accept as reasonable priors that it's equally likely that the probability of child being born male is precisely 0.5 as it is that it has some other value, and also that if it has some other value that all values in the interval from zero to one are equally likely. But nobody on God's green earth would accept those as reasonable values, least of all a Bayesian. A Bayesian would say there's zero chance of it being precisely 0.5, but it is almost certainly really close to 0.5, just like a normal human being would.

CrazyStat•1mo ago

A few points because I actually think Lindley’s paradox is really important and underappreciated.

(1) You can get the same effect with a prior distribution concentrated around a point instead of a point prior. The null hypothesis prior being a point prior is not what causes Lindley’s paradox.

(2) Point priors aren’t intrinsically nonsensical. I suspect that you might accept a point prior for an ESP effect, for example (maybe not—I know one prominent statistician who believes ESP is real).

(3) The prior probability assigned to each of the two models also doesn’t really matter, Lindley’s paradox arises from the marginal likelihoods (which depend on the priors for parameters within each model but not the prior probability of each model).

gweinberg•1mo ago

Are you seriously saying that, because a point distribution may well make sense if the point in question is zero (or 1) other points are plausible also? Srsly?

The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.

CrazyStat•1mo ago

Srsly? Srsly.

> The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.

The birth sex ratio in humans is about 51.5% male and 48.5% female, well outside of your 99.99% interval. That’s embarrassing.

You are extremely overconfident in the ratio because you have a lot of prior information (but not enough, clearly, to justify your extreme overconfidence). In many problems you don’t have that much prior information. Vague priors are often reasonable.

KK7NIL•1mo ago

Wikipedia is infamously bad at teaching math.

This Veritasium video does a great job at explaining how such skewed priors can easily appear in our current academic system and the paradox in general: https://youtu.be/42QuXLucH3Q?si=c56F7Y3RB5SBeL4m

Veedrac•1mo ago

Wikipedia has a section on this that I thought was presented fine.

https://en.wikipedia.org/wiki/Lindley%27s_paradox#The_lack_o...

Indeed, Bayesian approaches need effort to correct bad priors, and indeed the original hypothesis was bad.

That said. First, in defense of the prior, it is infinitely more likely that the probability is exactly 0.5 than it is some individual uniformly chosen number to each side. There are causal mechanisms that can explain exactly even splits. I agree that it's much safer to use simpler priors that can at least approximate any precise simple prior, and will learn any 'close enough' match, but some privileged probability on 0.5 is not crazy, and can even be nice as a reference to help you check the power of your data.

One really should separate out the update part of Bayes from the prior part of Bayes. The data fits differently under a lot of hypotheses. Like, it's good to check expected log odds against actual log odds, but Bayes updates are almost never going to tell you that a hypothesis is "true", because whether your log loss is good is relative to the baselines you're comparing it against. Someone might come up with a prior on the basis that particular ratios are evolutionarily selected for. Someone might come up with a model that predicts births sequentially using a genomics-over-time model and get a loss far better than any of the independent random variable hypotheses. The important part is the log-odds of hypotheses under observations, not the posterior.

jeremysalwen•1mo ago

Admittedly not a statistician, but I think the article is missing the point. The reason why people circle the P values is because nobody actually cares about the thing the p-value is measuring. What they actually care about is whether the null hypothesis is true or some other hypothesis is true. You can wave your hands around about how actually when you said it was significant what you were really saying was something technical about a hypothetical world where the null hypothesis is factually true, and so it's unfair to circle your p value because technically your statement about this hypothetical world is still true. This is not a good argument against p value circling, but rather it merely demonstrates that the technical definition of a p value is not relevant to the real world.

The fact remains that for things which are claimed to be true but turn out to not be true later, the p values that were provided in the paper are very often near the significance threshold. Not so much for things which are obviously and strongly true. This is direct evidence of something that we already know, which is thst nobody cares about p values per se, they only use them to communicate information about something being true or false in the real world, and the technical claim of "well maybe x or y is true, but when I said p=0.49 I was only talking about a hypothetical world where x is true, and my statement about that world still holds true" is no solace.

nerdponx•1mo ago

I understood the point of the article to be exploring the extent to which p-values can be interpreted as strength of evidence in favor of the alternative hypothesis. I don't think anyone is spending all this energy on p-values because they think people care about the p-values.

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

Speed up responses with fast mode

I write games in C (yes, C)

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

We mourn our craft

Show HN: A luma dependent chroma compression algorithm (image compression)

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

History and Timeline of the Proco Rat Pedal (2021)

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

Speed up responses with fast mode

I write games in C (yes, C)

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

We mourn our craft

Show HN: A luma dependent chroma compression algorithm (image compression)

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

History and Timeline of the Proco Rat Pedal (2021)

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

I don't think Lindley's paradox supports p-circling

Comments