During a recent dinner with clients, the topic of data scientist came up. Specifically, is data scientist a real job, and how does it differ from business intelligence jobs. For the former, I spoke of the work of Hal Varian's Googlenomics and DJ Patil's data science organization at LinkedIn.
With respect to the latter, I summed up the differences between data science and business intelligence as the mathematical mining of data for discovery versus collecting and slicing data to review performance. In a way, you could say data science leans towards innovation, while business intelligence leans towards optimization. Each are critical for business, government and societal progress. But, that was me. At a dinner.
Since then, I ran across an EMC branded data science community study: Data Science Revealed. The study methodology:
"The EMC Data Science Community Survey interviewed 497 data scientists and business intelligence professionals from around the world, including deliberate samples in the United States, India, China, the United Kingdom, Germany, and France."
The study opens with the standard fare on big data and data science -- too much data, too few data scientists. But, then jumps into some good detail on data science vs. business intelligence.
"It may be helpful to think of data science and business intelligence as being on two ends of the same spectrum, with business intelligence focused on managing and reporting existing business data in order to monitor or manage various concerns within the enterprise. In contrast, data science applies advanced analytical tools and algorithms to generate predictive insights and new product innovations that are a direct result of the data."
On professional backgrounds:
"The need for rigorous scientific training was born out in our research on data scientists, and paints a clear distinction between data scientists and BI professionals. The most popular undergraduate degree for BI professionals was in business at 37% - more than the next three categories combined. In contrast, the most popular degree for data science professionals was computer science (24%), followed closely by engineering (17%) and the hard sciences (11%). We also found that data science professionals were over 2.5 times more likely to have a master’s degree, and over 9 times more likely to have a doctoral degree as business intelligence professionals."
"The data science toolkit is more varied and more technically sophisticated than the BI toolkit. While most BI professionals do their analysis and data processing in Excel, data science professionals are using SQL, advanced statistical packages, and NoSQL databases. Further, although big-data tools like Hadoop, and advanced visualization tools like Tableau are just starting to emerge in the data science world, they are almost unseen in the business intelligence world. Open Source tools, like the R statistics package, Python, and Perl, are each used by one in five data science professionals, but around one in twenty BI professionals."
The study also sheds some light on the relationship between big data and data science. As many longstanding data experts know, data science isn't limited to big data, nor is big data solely the purview of data scientists. But, these points are often drowned out by the hype.
"While Data Science is most often associated with Big Data, it is important to consider the host of other professions and roles that deem their work to be data science. This includes people from fields as diverse as Market Research, Financial Analysis, Information Technology, Management Consulting, Marketing and Media, Academia, Social Research, Demographic and Census Research and the Intelligence Community..."
However, the study does point out that big data poses the greatest potential opportunity for data scientists:
"Perhaps the greatest emerging opportunity in data science is “big data” – the ability to analyze massive data sets generated by web logs, sensor systems, and transaction data in order to identify insights and derive new data products."
For the subset of the survey population who self-identified as data scientists (80%), the survey data showed differences in work habits and tools from those of business intelligence professionals:
"For instance, nearly half of big data scientists use R, despite the fact that it is only used by only 13% of other practitioners. They are also twice as likely to use a big data storage tool like Hadoop, Greenplum, or Netezza. Big data scientists are also remarkably educated – 40% have a master’s degree, and an additional 17% have a doctorate. Over 90% have at least a college education.
Big data is also even more of a team sport. Half of big data scientists partner very frequently with a Data Scientist, Statistician, or Programmer – nearly twice the rate of the normal data group. They are also more likely to partner with frequently with business management, but are interestingly no more likely to partner with IT administration.
Finally, big data scientists touch data in more ways. They are twice as likely as those working with normal data to work across the data life cycle, everywhere from acquiring new data to business decision making, and around half spend a lot of time on each of these activities."
Beyond feeling relief at the validation of my dinner response, I found the heart of the survey content interesting. Read the Data Science Revealed Study. (PDF)