Correlation or Causation: The most common errors made in data analysis

The research field is prone to errors with significant consequences on generalizing results and interpreting statistical relationships.The following are some of the most common:

statistical errors

  1. Assuming you have a representative sample: This error is common especially in large data sets (big data). The results may look convincing, but the sample does not represent a true picture of the target population.
  2. Measurement errors: Assuming the data has been measured correctly.  Often we get more bad data than good data and you need to adjust for this.
  3. Correlation versus Causation  – There is no statistical test to prove/ disprove whether two things are correlated versus whether one is causing the other to occur. Yet many researchers make this common error of assuming causality instead of correlation.
  4. Assuming that the data represents what you think it does: For example, not because a large proportion of prisoners are from high school A, means it’s correct to conclude that high school A is responsible for producing criminals.
  5. Reliance on features that are destined to change. Some numbers are bound to be way off when circumstances change. Daily and weekly fluctuations are easy to notice, but predicting what will happen one year from now is a different story. No matter how well a research explains current and past results, predicting the future based on current information has many limitations.
  6. Not understanding the impact that algorithm (different statistical tests) choice have on final results. Using different types of algorithms — say, random forest instead of logistic regression — may often have a significant impact on final results.

Can you think of any common errors you might want to add? Please share in the comments below

Conducting your own survey? Here is a quick guide

What is a survey?

It is  the systematic/structured gathering of information from a sample/ proportion of individuals for the main purpose of making conclusions/predictions about a larger population/group.

Prior to starting a survey, one should…

  • Define a clear purpose for the survey
  • Create research objectives/research questions you intend to answer
  • Decide on a methodology to collect the data required

 Popular ways of administering a survey:

  1. Online via a website
  2. Online via email campaign
  3. Face-to-face interviews
  4. Interviews via telephone

In general, the method you choose to administer your survey questionnaire will be dependent on the following:

Questions for the Questionnaire

The following represents questions one should ask when creating a questionnaire.

What does the question contribute?

Each question included in the questionnaire, should contribute significantly to acquiring relevant information for the researcher. The questionnaire should be assessed prior, so as to ensure that all the questions present are relevant and if not, irrelevant questions removed.