I am a huge Thomas Davenport fan. His book “Competing on Analytics: The New Science of Winning” was the first book to make organizations aware of the business potential of analytics, even prior to the craziness brought on by Big Data. I happened upon a recent article of his titled “Looking Outward with Big Data: A Q&A with Tom Davenport” and one item from that article really jumped out at me:
“Initially, I didn’t see much of a distinction [between business analytics and big data], and I thought that I could kind of rest on my laurels and not write a book about big data—because the fact is that the analytical tools and approaches used are not all that different for big data. But when I started talking to companies and data scientists, I realized that there really were some fairly substantial differences—some that have yet to be fully articulated and some that are already in evidence.”
Understanding the Differences
There are significant differences between a Business Intelligence (BI) analyst and a Data Scientist, but many folks are still confused. I recently received the following email from a follower (Felix) of my blog series that highlights some of the challenges that organizations are wrestling with on this difference in definitions.
Dear Mr. Schmarzo,
I recently came across your January 9th blog post entitled “Business Analytics: Moving From Descriptive To Predictive Analytics.”
Our IT department disagrees on the capabilities of OLAP cubes. To me, a cube does not appear useful for parameterized models or most types of scenario analysis. (I am trained in statistics and other forms of financial modeling.) I showed Figure 1 of your blog to my colleagues, but was told that I do not understand OLAP technology.
Felix’s dilemma is typical of what I see in organizations that have spent considerable time and money building out their Business Intelligence capabilities. To me, the situation is similar to the construction worker discovering the “saw.” Doesn’t mean that the hammer is no longer important, but the saw and the hammer perform entirely different but complementary tasks.
Here was my response to Felix:
Hey Felix, push back on your IT department. There is nothing predictive in OLAP cubes. Cubes are great for slicing and dicing historical data looking for areas of under- and over-performance in the past week/month/quarter, but they don’t answer any of the questions about the future such as: What will be sales for Product X next month? How many customers do we expect to respond XYZ promotion? What is the likelihood that wind turbine A101 will fail within the next 30 days?
Answering questions about the future requires developing predictive models and getting results that are qualified by probabilities and confidence levels. The key difference is that a BI Analyst uses OLAP cubes and other BI tools to report on what happened in the past, while a data scientist uses predictive and prescriptive tools to forecast what might happen in the future.
There is a significant difference between what a traditional BI Analyst does and what a Data Scientist does. And one does NOT replace the other; they are complementary. Figure 1 does a nice job of summarizing the differences and how these two critical roles play off of each other.
Data Science is different than the traditional Business Analytics in some key areas. For example, data science…
- uses predictive and prescriptive analytics to predict what might happen using probabilities and confidence levels, not just report tools to report on what did happen.
- Note: when we’re dealing with historical data, there is a strong desire and need for the data to be 100% accurate. If you have your financial results wrong for the past quarter, folks are likely to go to jail. However predicting performance for the next quarter is usually measured in probabilities and confidence levels (e.g., “There is a 95% confidence that our revenues will come in next quarter between $200M to $212M).
- is used for dealing with and mitigating the uncertainty in the data. It uses several analytic and visualization techniques to understand where uncertainty may lay in the data, and then uses data transformation techniques to massage the data into a workable form – not perfect, but again not necessary when dealing with probabilities and not absolutes.
- is able to create as-needed data transformations (versus the traditional ETL process) to put the data into a format so that it can be combined with other data sources in search in insights about customers, products and operations.
To quote our Jeffrey Abbott of EMC Global Services Marketing,
“The disconnect is that with BI, people take the historical data and extend the trend lines and factor in cyclical factors. It’s slow, manual, and needs to be rebuilt each month/quarter/year. But with data science, we have the ability to automatically build the predictive apps that actively look for certain combinations of data and trigger a prediction of the future. It’s real time, re-usable, continuous, and automated.”
Summary
A recent blog “Data Science: The More Data, the Better”, talks about how Federal Reserve Chair Janet Yellen uses a dashboard of job data that doesn’t just rely upon a single measure (unemployment rate) to make economic and labor policy decisions. Instead, she uses a dozen different measures to provide a more holistic, more accurate, and hopefully more actionable view of the United States economic situation. She’s a data scientist at heart that realizes that a single measure of anything complex—whether it’s the U.S. economy or even things like customer satisfaction and predictive maintenance—is oversimplifying something to the point of not being useful or actionable.
I have written several blogs trying to highlight the differences between a traditional business analyst and a data scientist, some of which I have listed below. Enjoy!