Comments by "EebstertheGreat" (@EebstertheGreat) on "How We’re Fooled By Statistics" video.

  1. 8
  2. Brandon Scherrer You misunderstand regression to the mean. He is not suggesting that in large sample sizes there is large statistical error (there can be, but usually there isn't). Rather, he is saying that if in one particular case a value is far from the mean, then the next time it will tend to be closer to the mean, simply because (as you rightly pointed out) numbers tend to cluster around the mean in a normal distribution. A proper analogy would be if you pulled one person off the street, and by chance that person happened to be pretty tall, let's say 190 cm. Wouldn't you agree that it is most probable that the next person you select at random will be shorter? Or if you pick a short person by chance, say 150 cm, isn't it likely that the next person will be taller? That's all he is saying. In the IDF example, they found that after criticizing a soldier for poor performance, their performance tended to improve on the next trial, but after complimenting them for good performance, their performance tended to get worse on the next trial. If this truly was the logic, it is flawed, because performance would tend to regress toward the mean regardless of punishment or reward. However, if they conducted a proper study in which their response to a soldier's test did not depend on the outcome (i.e. they always criticized one test group, regardless of how well they performed, always rewarded another test group, and did neither to a control group), then their results would be statistically meaningful.
    6
  3. 3
  4. They are not. Regression to the mean only implies that average results are more likely than extreme results. It does not claim that you are extra likely to get an average result after an extreme one. For instance, suppose I have a program pick a random integer uniformly from 0 to 99. If on the first try it picks a 95, then I can conclude that on the next try, it will probably pick a number less than 95. Why? Because it's always very likely to pick a number less than 95. The last roll didn't change these odds. In real cases where this is discussed, we usually refer to the observed "mean" of an entire sample. Any individual who scores very far from the mean on one test is likely to score closer to the mean on a retest. This individual might really be very skilled and have an individual expectation that is even higher than their last score. If so, they will actually probably score higher on a retest. But more often in the group, individuals who scored very high also got some benefit from good luck. And by definition, these people are expected to score lower on the retest, because these people are always expected to score lower. The end result is that on average, people who scored far from the mean the first time will tend to score closer to the mean the next time, just because scoring very far from the mean is rare by definition. (Incidentally, this reasoning doesn't always apply. It applies well to events that are independent and have high variance. But if the events are not independent, it might not apply at all. For instance, let's say you have a bunch of people who have never tried throwing cards before and you test how far they can throw them. In the first test, they won't go very far, because throwing cards is difficult. But in the retest, almost everyone will do better, because they will have more practice. And if some people are quicker learners than others, it may be that people who had the best scores the first time will actually improve more than the ones who started out with bad scores, in contradiction to the principle of regression to the mean. But in theory, if you could account for all of these effects and only have some unaccountable "random" effects leftover, you would indeed be able to observe regression to the mean.)
    3
  5. 2
  6. 2
  7. @Cliff WEBB Regression to the mean doesn't help gamblers, because games aren't played that way. If a gambler sees ten coin flips, nine of which are heads, regression to the mean tells him that if you flip a coin ten more times, you will expect fewer than 9 to be heads. This has nothing to do with the previous result; it's just always the case that getting 9 or 10 heads out of 10 tosses is rare. The gambler gets no benefit from knowing this, because everybody knows this. Regression to the mean is usually discussed when an outcome is due to a combination of factors that do change and factors that don't change (or that change more slowly). In the fighter pilot example, we don't know the skill level of any given pilot in advance, so we don't know what to expect from them. Some pilots will perform well and some will perform badly, and for the most part, we think the ones who performed better are just better pilots, and the ones who performed worse are worse pilots. However, we also know there is an element of luck in the task. If we retry the same task, we still expect the ones who did well the first time to do well again, but on average, not by quite as much. That's because the ones who did well in the first place, on average, got lucky, and there's no reason to expect that to happen again. Of course, a few of the pilots who scored highly may have gotten somewhat unlucky and actually perfomed well than they usually would, so next time they will likely perform even better. But these will usually be outnumbered by the lucky pilots who scored well.
    1
  8. 1
  9. 1
  10. 1
  11. It has nothing to do with a normal distribution. A person who did badly on a given test might simply not be a good student or might have had one bad test. In a single test given to lots of students, most of them will tend to score close to what they expect given their overall performance in the class, but some will score better or worse than normal. If you retest them, these students will tend to score about their expected value the next time. The subset of students who scored badly the first time will contain a disproportionate number who underperformed, and the subset of students who scored well the first time will contain a disproportionate number who overperformed. So overall, these students will tend to regress to the mean. It's true that this doesn't apply to all distributions, but the normal distribution is not particularly relevant. What needs to happen is that the outcome depends on two simple probability distributions, which we can think of as measuring skill and luck (for instance). If both these distributions have a finite mean, and the joint probability distribution has identical marginal distributions for the two random variables, then regression to the mean is always expected to occur. If the marginal probability distributions are not identical, then there is no guarantee this will happen, since it's possible that the distribution of having "good luck" could be positively correlated with having "good skill." This generally does not occur in real life, but it could in principle. And if the joint distribution has no mean, then there is simply no mean to regress toward, so of course it can't happen. (In special cases of symmetric distributions like the Cauchy distribution whose expectations have a principal value, I think you might still see regression toward that value, but I'm not sure.) In the case where the mean is infinite, then every result will "regress toward the tail," i.e., no matter what the last result was, the next result will be greater on average, because the average result of a random variable with infinite mean by definition is larger than any real number.
    1