What is Statistical Modeling?
“… based on our statistical modeling… “
“… we run a series of statistical tests…”
“… the discrepancy between these two groups is significant…”
If you’ve spent any time on ReviewMeta, you’ve surely seen us mention a variation of the term “statistical significance” over and over, and you’re probably wondering what this means.
Essentially, here is the question we are trying to answer:
Based on the data, are we confident that this was NOT simply due to random chance?
By checking to see if a discrepancy is “statistically significant” or not, we take into account more factors than just the difference: we’re looking at the sample size and distribution as well. By controlling for these variables we ensure high accuracy in our tests – otherwise our analysis would be jumping to conclusions that aren’t strongly supported by the data.
Let’s use an example to illustrate our point.
Say, for example, a product has only 3 reviews: 1-star, 3-stars and 5-stars. Now let’s also assume that the 5-star review is the only unverified purchaser. The result would look like this:
33% of the reviewers are unverified. The unverified reviews rated this an average of 5.0 while the verified reviews rated this an average of 2.0.
Just from the statement above, one might quickly jump to the conclusion that the unverifieds are obviously biased. However, upon closer examination, we’ll see that this assumption is completely unfounded.
Only one of the three reviews is unverified, and it happens to be the 5-star review. If we selected 1 of the 3 reviews at random, we’d have a 33% chance that it would be the 5-star review. Since it is still very likely that this could happen by random chance, we would NOT consider this to be statistically significant.
Now, on the other hand, let’s say that a product has 100 reviews. 50 of them are 5-star reviews and all are unverified purchasers. The other 50 are all 4-star reviews and are all verified purchasers. The results would look like this:
50% of the reviewers are unverified. The unverified reviews rated this an average of 5.0 while the verified reviews rated this an average of 4.0.
This statement doesn’t look quite as strong as the last, but after when we take a close look, we’ll see that it’s actually much stronger.
So 50 reviews are 5-star and 50 reviews are 4-star. If we selected 50 of these reviews at random, what are the odds that we’d select 50 5-star reviews in a row? Well, let’s do the math.
For the first pick, you’ve got a 50/50 shot of getting a 5-star review. So the odds are 50%. Then the next pick, you’ve got a 49 out of 99 chance of picking a 5-star review. So your odds are about 49.5%. Multiply those together and you’ve got a 24.7% chance of getting a 5-star review for the first two picks. You’ve still got 48 more picks to go, and your pool of 5-star reviews is just going to get smaller.
After accounting for all 50 picks, your odds of randomly selecting all 50 5-star reviews is 0.00000000000000000000000000001%
As you can see in this example, the odds of this happening due to random chance are much less than one in a trillion. At this point, it’s pretty convincing proof that the reviews from unverified purchasers are obviously biased.
Where do we draw the line?
Unless otherwise stated, we use the well-accepted and established 95% confidence interval.
Significance helps ensure accuracy.
When you just look at the difference between groups it can be deceiving, and that is why it’s important to also consider the sample size and distribution of ratings. This is why sometimes you’ll see a large discrepancy for one product pass while a smaller one for a different product still triggers a failure. Using statistical modeling helps us make sure we aren’t failing products due to random chance rather than truly unnatural patterns.