The Essential Data Science Venn Diagram
Essential Meaning
The three spheres of expertise may also be put in simpler terms of what they provide (i.e., their essences): Intuition, Validity, and Automation. Automation could also be referred to as up-scaling. Their confluence provides us with improved Insight
The danger of “bias” exists in the “traditional research” zone and is the main inspiration for this update of the Data Science Venn Diagram. Statistical “bias” is the exclusion or ignoring of significant variables, not unlike the colloquial meaning. Since most people are not familiar with handling multivariate analyses, the danger of bias most readily creeps in when multivariate problems are treated as bivariate or univariate problems. As noted above, the presence of bias makes organizations susceptible to disruption by newcomers whose multivariate approaches make them more insightful competitors.
Also note that the overlapping area corresponding to “machine learning” lacks intuition (i.e., is “daft” — a pronounced form of bias). Some will no doubt protest this. But consider that, over our decades of interactions with the physical world and society, people gain a level of intuition that no ML model is remotely close to obtaining (exempli gratia, know of any ML application that knows and appreciates what salt tastes like?). ML models only know what we tell them (the data we provide) and model outputs will ultimately reflect that. ML models can become very good at what we train them to do, but they still need to be trained by humans. (Note: even young humans need to be trained by older, wiser humans.) The context gained from a broad experience builds intuition. Context matters.
The area labeled as “traditional software” is problematic to describe. This ties back to the limitations of trying to summarize a multivariate system with a two-dimensional graphic. This overlap could also have been used to characterize mechanical automation (which certainly constitutes a significant portion of the global economy). For the purposes of this discussion, and in the context of most white-collar intellectual endeavors, suffice it to say that this area can represent a lack of rigor when performing risk assessments.
Order of Operations
In most cases, there is an inherent order of operations or “best practices” in the sequence of how each sphere of expertise is incorporated. The sequence of involvement is typically intuition first, then validation, then automation. We typically test whether an idea is valid after our intuition has led us to the idea in the first place. Likewise, we should not upscale (automate) implementations unless they are first deemed to be valid.
Are there exceptions to the sequence proposed above? Certainly! It is not uncommon for machine learning outputs to bring an association to a domain expert’s attention that they had previously dismissed or were unaware of. There are also plenty of cases where domain experts’ heuristics were already implemented in automated systems and later validated once data collection became possible. Let us also not forget the role of automation in helping us collect data in the first place that can later be evaluated with domain expertise and statistical rigor.
Σ
To sum it up, this new Venn diagram tells us things we already knew, but perhaps had not formally verbalized. It is a simple schema, and as such can help people better prioritize their workflows. For example, it encourages data scientists to start their work by first talking to domain experts — something that is already touted as a best practice.
Nearly 70 years ago, Samuel Wilks wrote that “statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” In the near future, multivariate statistical thinking may become a prerequisite for far more occupations. If my rehashing of the Data Science Venn Diagram helps anyone with that, I will consider it time well spent.
The world is what we make of it, and it needs to be smarter. If you enjoyed this article, consider sharing it with a friend.
(Note: These Venn diagram graphics may be used with attribution. The initial iterations of the diagrams above were first posted on www.adret-llc.com and on LinkedIn in early 2017.)