If we treat elections like a survey, then they have a massive inherent bias to the sampling method: the people who will get "surveyed" are the ones who are engaged enough to get registered, and then willing to go to a physical polling station and vote. This will naturally bias towards certain types of people.
In practice, we don't treat elections like a survey. If we did, we'd spend a lot of time afterwards weighting the results to figure out what the entire country really thought. But that has its own flaws, and ultimately voting is a civic exercise. You can do it, you can avoid it: that choice is yours, and ultimately part of your vote. In a way, you could argue that the sample size for an election is 100% of the population, where "for whatever reason, I didn't cast a vote" is a valid box to check on this survey.
That said, the whole "samples can be biased" thing is very much relevant for elections because many political groups have an incentive to add additional bias to the samples. That could be as simple as organising pick-ups to allow their voters to get to the polls, or teaching people how to register to vote if they're eligible, but it could also involve making it significantly harder or slower for certain groups (or certain regions) to register or vote.
But agree that random distribution is key to this, but I don't see how that could change with the messaging that every one must vote, versus saying just vote if you're interested.
This is important, because normally, once you take a sample, you need to analyse that sample to ensure that it is representative, and potentially weight different responses if you want to make it more representative. For example, if you got a sample that was 75% women, you might weight the male responses more strongly to match the roughly 50/50 split between men and women in the general population. But in an election, we don't do this, because the assumption is that if you spoil your ballot or don't take part, that is part of your choice as a citizen.
But I think we're saying the same sort of thing, but in different ways: you can either see "the sample of an election is every citizen, regardless of whether they voted" or "the population of an election is everyone who voted", and in either case the sample is the same as the population, and we can therefore assume that it is representative of the population.
Of course, psychologically, everyone needs to vote to have a say. But beyond even that psychological thing, everyone voting is really a security measure against tampering.
Forcing people to vote who aren't interested only makes this effect even worse.
Article used a Python library without really understanding the reason or science behind the result. Knowing this can help when you read an article or watch a news report where they quote a study that says “a study of 300 people…” well why 300 people? You can reasonably assume that the researchers used the CLT.
Say you are an alien, and you want to know roughly the male-to-female ratio of people. Let's say the true ratio is 50%.
Wouldn't this be done by an unbiased sample that's quite small, regardless of whether there's 100M or 8B people on the planet?
In not extreme cases when your population is much bigger than the sample size you're correct that it doesn't really make any difference.
When I was an undergrad first learning statistics I asked my stats instructor (a grad student) about this issue and they responded with something like "the population size doesn't matter because for the assumptions of the test to be met... such and such..." I kind of accepted that answer — we were talking about asymptotic inferences — but it never seemed quite right to me.
The example I gave was actually motivated in part by a sort of real-world problem I was dealing with: let's say you only want to make inferences about a population of 20 individuals. Certainly if you have a sample of 19, the confidence about the population will be much stronger than if your population is 100 million.
One thing he did say which is probably right, is that that 1/20 you didn't sample might throw things off, so it's more influential in a sense than a single member of a population of 100 million.
At the time I hadn't learned about exact and Jaynesian-permutation statistics, but that's probably the right way to think about finite populations. That is, something like "what are all the outcomes you could observe, and what proportion of those does my observed result represent?"
It's just that usually our population is so large that the exact test approach becomes infeasible to deal with without approximations, and you end up with the typical classical asymptotic statistics.
It's all maybe a moot point but it's always a good idea to think about the population you're trying to make inferences about. I think that probably includes the population size, and I think population size is probably bigger than you might initially think sometimes.
As for your last question, obtaining an unbiased sample is kind of harder as the number of attributes you're being unbiased with regard to increases. It's a permutation problem again, probably implicit usually with regard to sampling representativeness.
whatever1•1d ago