The research field is prone to errors with significant consequences on generalizing results and interpreting statistical relationships.The following are some of the most common:
- Assuming you have a representative sample: This error is common especially in large data sets (big data). The results may look convincing, but the sample does not represent a true picture of the target population.
- Measurement errors: Assuming the data has been measured correctly. Often we get more bad data than good data and you need to adjust for this.
- Correlation versus Causation – There is no statistical test to prove/ disprove whether two things are correlated versus whether one is causing the other to occur. Yet many researchers make this common error of assuming causality instead of correlation.
- Assuming that the data represents what you think it does: For example, not because a large proportion of prisoners are from high school A, means it’s correct to conclude that high school A is responsible for producing criminals.
- Reliance on features that are destined to change. Some numbers are bound to be way off when circumstances change. Daily and weekly fluctuations are easy to notice, but predicting what will happen one year from now is a different story. No matter how well a research explains current and past results, predicting the future based on current information has many limitations.
- Not understanding the impact that algorithm (different statistical tests) choice have on final results. Using different types of algorithms — say, random forest instead of logistic regression — may often have a significant impact on final results.
Can you think of any common errors you might want to add? Please share in the comments below