Correlation or Causation: The most common errors made in data analysis

The research field is prone to errors with significant consequences on generalizing results and interpreting statistical relationships.The following are some of the most common:

statistical errors

  1. Assuming you have a representative sample: This error is common especially in large data sets (big data). The results may look convincing, but the sample does not represent a true picture of the target population.
  2. Measurement errors: Assuming the data has been measured correctly.  Often we get more bad data than good data and you need to adjust for this.
  3. Correlation versus Causation  – There is no statistical test to prove/ disprove whether two things are correlated versus whether one is causing the other to occur. Yet many researchers make this common error of assuming causality instead of correlation.
  4. Assuming that the data represents what you think it does: For example, not because a large proportion of prisoners are from high school A, means it’s correct to conclude that high school A is responsible for producing criminals.
  5. Reliance on features that are destined to change. Some numbers are bound to be way off when circumstances change. Daily and weekly fluctuations are easy to notice, but predicting what will happen one year from now is a different story. No matter how well a research explains current and past results, predicting the future based on current information has many limitations.
  6. Not understanding the impact that algorithm (different statistical tests) choice have on final results. Using different types of algorithms — say, random forest instead of logistic regression — may often have a significant impact on final results.

Can you think of any common errors you might want to add? Please share in the comments below

Codes & Reproducible Analysis

Data analysis should be reproducible! That is, a colleague should be able to look at your codes (yes codes) and clearly understand each step that was taken to produce your results. This is not as difficult as it sounds, since all the popular statistical software (e.g. SPSS, RATS, STATA and R-stats) all allow the analysts to save codes and insert comments.

Data analysis, using codes and comments holds the true secrets behind “reproducibility”. So start coding, commenting and reproducing the best analysis possible.

SPSS coding

Jamaica and ‘Big Data’: The Future of Research

The emergence of ‘big data‘ – the wealth of information being collected daily on customer behaviour and attitude via websites such as Facebook, Twitter, Linkedin etc – will cause a change in research as we know it. Internationally, the ‘big data’ discourse have gone a far way, however no mention has been made regarding its impact locally.

Data collection change:

Facebook, Twitter, Linkedin and foursquare to name a few, collect millions of data on customers and website users on a daily basis. Some of the data collected are as follows:

  1. Location – Foursquare or Facebook check in
  2. Feelings/ opinions – Twitter posts about mood
  3. Sentiments – about a company or product
  4. Likes – what you are interested in
  5. Demographic information – age & gender

Among other data that is freely shared by users of these free services. The days of physically walking ‘door to door’ asking questions in order to collect data are numbered, data is constantly being produced and stored, the problem (or opportunity) is now the utilization of this available data – that’s where the alignment of software developers and the statisticians/ or data analyst is critical.

Jamaica has a growing tech community with several graphic designers and application developers all developing solutions they hope consumers will like and purchase. Few have seen the value in strategically developing applications which are data driven, that is, along with providing value to the consumer they actually collect valuable data from them.

Jamaica and by extension the wider Caribbean region is faced with many development problems, chief among them are high crime rates, high cost of justice, low customer service (both public and private sectors), high cost of collecting data, low levels of efficiency in production etc. The above problems represent areas where additional data could help to ameliorate if not solve many of these issues. The nexus between developers and data analyst holds tremendous opportunities for improved efficiency and data gathering throughout the Caribbean. More data, better analysis, more insights and more solutions to current problems.

The change is underway:

IBM, for example offers a text analytics software which allows users to gather and analyze data from Twitter and Facebook postings about a company. That is, using their text analytics software, you can get an overall picture of customer/ user sentiments about a company.

Just consider the implications this might have on the future of polls! Based on tweets and social media sentiments, one could actually be able to predict election outcomes with 0% margin of error – talk about big data!

Companies collecting ‘big data’ will hold tremendous insights relating to its consumers and a country’s population in general. Imagine being able to see attitude change in real-time, to see customer feedback about service and being able to quickly respond to negative sentiments.

CVM came under tremendous pressure from Jamaicans via the social media throughout most of the London 2012 Olympics. This dissatisfaction with the company’s coverage was evident on Twitter days before being published in the local news papers. This is one example of the future potential of ‘big data analysis’ in Jamaica, and its implication for large establishments.

The new age of research will see the rise of the analyst and presenter, persons/ companies skilled in interpreting and gathering insights from large volumes of data. This is especially true for big companies serving large demographics of customers.

share your comments/ views.

Visionary Entrepreneur

About the authorLuwayne Thomas is Co-founder & COO @balcostics

Follow me on twitter: @LuwayneThomas

 

 

 

At Balcostics our mission is to empower leaders with the required data and information to make better decisions. Learn more about our full list of research outsourcing services for individuals and companies: Click here