Originally Posted by Majin SSJ Eric
I admit up front I know nothing about statistics but i can't wrap my brain around how 1% poll rate is in any way accurate at all? How do you know that 1% wasn't just a select outlier group in the population? What if all respondents were Nvidia employees as a clearly silly extreme? How can demographics not come into play whatsoever when taking your samples into account? Oh, and I'd hazard a guess that polling for something like major political elections are just a bit more thoroughly conducted and accurate than the Steam survey and yet they manage to get those polls wrong all the time. Just never have placed much faith in statistics in general personally but as I said I don't know anything about it.
Your concerns about polling population ARE justified. While in general, assuming a completely random sample population the following post is 100% mathematically correctWarning: Spoiler! (Click to show)
Originally Posted by Carniflex
About statistics and measurements. To make it as simple as possible.
As a very rough rule of thumb, your "error" is approximately 1/sqrt(n) where n is the sample size.
n=10 -> approx 30% "error"
n=100 -> approx 10% "error"
n=1000 -> approx 3% "error" (that is the sample size normally used in political polling)
n=10 000 -> approx 1% "error"
n=100k -> approx 0.3% "error"
n=1M -> approx 0.1% "error" and so on.
There is some assumptions in there. Normal (Gauss) distribution, for a start, which is not always in the case in nature. Not talking about confidence intervals either really or what is the meaning of "sigma". Assuming all the events/measurement are independent.
But as rough rule of thumb its a good estimate for majority of things in life and nature. The meaning of this thing is following, to give a very simple example. If you see something happening 1000 times then you can predict that when the same thing happens for the 1001'th time the outcome will be the same with 97% probability as in the previous 1000 cases.
The real stuff behind this very rough "rule of thumb" is ofc more complex and its not my main field.
In practice approx n=30 is "good enough". For example, for the average to start to settle down when throwing 6 sided dice or throwing coin for heads/tails thingy.
This is only true if the group you're being questioning is representative of the population you want to poll.
for example, in political polling (via teliphone) its long been known the sample will lean more conservative/republican if the poll is taken during 12pm-3pm on a week day, as that is the time you're most likely to poll stay at home mothers or retired Americans, who are a more republican and socially conservative population then the population as a whole. Meanwhile polls taken on Friday night- Sunday morning will be way more liberal then the population because conservatives usually spend time with family and don't answer the telephone during that time period (don't ask me to explain how this is true, I just recall these facts from a political statistics and polling class I took in college, who knows if its valid anymore); furthermore the VOTING public as a whole is more conservative then the non-voting public, meaning GENERAL and RANDOM polls of american citizens will lean further left then what you'd see at the ballot box (which is why political polls should always be of REGISTERED VOTERS)
Furthermore how polls are worded greatly influences the answers. For example if you ask a group of people if someone should be allowed to smoke in church (generally negative response); or if you change the wording to should churches allow someone to smoke in church (generally positive response from a similar sample of people), you get two vastly different results in what is essentially and logically the same question. Why? because of the psychology of it. In the first version of the question people think about "themselves" and if they want the person sitting next to them smoking, while in the 2nd question people think of the "principle" of freedom and generally (at least in american polling samples) tend to favor more rights and less rules. Its an exercise in group psychology.
Which is why you need to be cautious about the sources, methods and even the wording of polls regardless of the sample size. That's why internet polls are so untrustworthy. You have ZERO control over the people answering the questions, worse they're not a random group of people. They're a pre-selected group of people who found your poll, which means they're a unique population not a random population. Would someone not interested in computers or games find this post on this web page? nope. So even just taking the people who read this post and polling them will NOT resemble the population of the USA in any way whatsoever even if we can find 10,000 people to read it and answer it.