How to avoid bias in data analytics

Using deep analysis of data to help you with decision making is a good idea but it can also backfire if the data is biased. Here are the typical biases, and how to avoid them.

Bias in data analytics can happen either because the humans collecting the data are biased or because the data collected is biased.

Being biased is a natural tendency that we all possess but it must be reduced as much as possible to make better decisions. Bias in data analytics can be avoided by framing the right questions, which allow respondents to answer without any external influences, and by constantly improving algorithms. Below you will find four types of biases and tips to avoid them.

1. Confirmation bias in data analytics

Confirmation bias occurs when researchers use respondents’ answers to confirm their original hypothesis; only accepting evidence that supports it and rejecting those responses that go against it. The problem with confirmation bias is that it supports only one viewpoint and out rightly rejects others, thus narrowing down the gaze of the company. To reduce confirmation bias, researchers must be ready to reexamine and reconsider respondents’ answers and must also avoid falling in love with preconceived notions and views.

2. Interpretation bias in data analytics

Interpretation bias is the human tendency to interpret ambiguous situations in a positive or negative fashion. One of the most common forms of this in organisations occurs when data is presented.

There’s a famous study about this. Elizabeth Loftus at the University of California carried out research in which she showed a film of car crashes to her volunteers. Then, she split them into groups and asked them to sit in separate rooms. Loftus asked them several questions about the car crashes, but used different verbs for each group. For instance, one of the questions asked was “about how fast were the cars going when they… contacted/hit/bumped/collided/smashed”. To her surprise, the choice of verb had a huge impact on their estimates (those who were told they “smashed” guessed higher speeds, as compared to those who were told they “contacted”).

So if you’re presenting data from an engagement survey to the leaders of your company, be aware of what words you’re using. For instance, spot the differences between these two statements.

Only 50 per cent of our remote workers reported finding their work very meaningful
A full 50 per cent of our remote workers reported finding their work very meaningful

The former presumes that 50 per cent is disappointing, the latter paints a rosier picture. To avoid this bias, it’s best to have a baseline data point to compare against, such as an industry average, or your own organisation’s past performance.

3. Information bias in data analytics

In 2008, Google came up with a novel task for early warning of flu to curb its proliferation. The initiative, called “Flu Trends”, collected search keywords being used by people in a region. If there were more searches regarding flu symptoms, effects or cures, then it would alert the local health authorities to take an action.

But the program failed to take into account changes Google made to its own search algorithm. In early 2012 Google made such an adjustment so that it suggested a diagnosis when people typed in symptoms (such as “cough”). Flu Trends didn’t know about this, so it alerted authorities to flu epidemics much larger than the reality turned out to be.

Bias in data analytics continues to be a major problem and steps should be taken to reduce it by developing advanced algorithms. Also, data scientists should not be biased while collecting and analyzing responses which favor their research. They should be open to all kinds of viewpoints that would ultimately help to take better decisions.

Naveen Joshi is Director at Allerin Tech, a Mumbai based software solutions provider. This article is an edited adaptation of a post on his LinkedIn page.