We often think of “data” as an objective form of information, as something we just observe and record into computers and write on spreadsheets. In reality, data can be just as subjective as the humans that record it. In scientific fields especially, where empirical methods are used to observe and understand nature, data must always be collected and interpreted from an impartial point of view. Cognitive biases, however, are an ever-present obstacle when trying to interpret data and can easily skew results. They are innate tendencies to understand and process information based on our previous experiences, preferences, and in fact, many of them are genetically wired into us! For data scientists (or anyone, in general), biases influence how we think about and interpret data and when left unchecked and misunderstood, cognitive biases can lead us errors in judgement and less objective decision-making.
As a company specializing in data integration, we want to make sure that you’re getting the most value out of your data, and that extends beyond just integrating it. In the end, it’s how you collect, analyze, and interpret it, and to be an effective data scientist, you need to be constantly aware of the biases that may cloud anyone’s judgement. Biases come in many different forms, but in this article we’ll touch on few of the major ones that are known to have considerable effects on research and science. Here is a brief list of four cognitive biases that may affect you as a researcher or data scientist:
Confirmation bias is the tendency to process information in a way that confirms one’s preconceptions, beliefs, or hypotheses. We exhibit a confirmation bias when we actively seek out and assign more priority and value to data that confirms our own hypotheses, and ignore or understate evidence that could mean otherwise. You may have “good” preconceptions from educated intuition or previous experiences, but in most cases, those with a confirmation bias tend to ignore information and data that may disprove one’s own preconceived ideas, directly affecting the results of a study or analysis.
Observation bias is the tendency to look in places where it is expected to produce good results, or where it is very convenient to observe.
However, just because a source of data is easily accessible or is the first source that you thought of off the top of your head, it doesn’t mean it’s the most important. It’s an understandable practice and it goes without saying that the most available and known data source is often a good source, but no data analysis is complete without building a complete picture of your data. Data science is about producing actionable insights, but if only the wrong things are being observed and measured, you start getting false insights. Perhaps, to be an efficient researcher, it would be great to check yourself and frequently ask, “Am I measuring the right things? Are there better sources from which to get data from?”.
This one is pretty simple. Funding bias, sometimes called sponsorship bias, is the unconscious tendency to skew models, data, or interpretations of data in a way that favors the objectives of a financial sponsor or employer.
Here’s a prime example of a funding bias: it’s well known that in the 1990’s, the tobacco industry funded a number of research studies on the effects of tobacco and smoking cigarettes. In these studies, the industry sponsors and research centers were found to present findings in a misleading way and withhold certain findings about the relationships between smoking and cancer. Any scientist and researcher should take care to keep this bias in mind, because no matter how attractive, unknowingly making a business decision with flawed data will ultimately damage your company or sponsor, and might even damage your career (not to mention that this is just bad science)!
Because of the sheer amount of time and resources needed to experiment on a whole population, we instead take a sample, which should be representative of a whole population. This is usually achieved by a range of statistical techniques and well-designed randomization, but what happens if proper randomization isn’t achieved? It’s not uncommon for researchers to encounter a sampling bias, in which the selection of groups or data is unintentionally not representative of the population to be experimented on. In any kind of experimentation, no matter how big or diverse the sample, there is always the possibility that there is an inconsistency in data and sample collection. This bias also ties into the three other biases mentioned above; if any of these biases unconsciously influences the way in which you collect data or samples for your research or experiment, then you’re also experiencing a sampling bias!
Although we’ve only touched on the very surface of biases, I believe this was a good intro on how we employ heuristics in our natural thought processes and on the other hand, how our innate experiences and tendencies can negatively affect our judgement. However, the intersection between heuristics and biases is very complex and there are still numerous studies that need to be made in order to fully understand them. Whether it’s a confirmation bias influencing how we process information or a funding bias compelling us to skew results in favor of an employer, having an understanding and a constant awareness of all of our biases will keep our thought processes in check as we make data-driven decisions.