Everyone agrees that there is a shortage of Data Scientists. If not addressed soon, Big Data breakthroughs in areas such as healthcare, renewable energy, public sector, etc will decelerate. I am proud to say that EMC is doing its part to solve the problem by fostering Data Science development with training and certification, hands on expertise, web events, internships, and more. For example, EMC Education Services offers a 5-day Data Science and Big Data Analytics training and certification, designed to enable immediate and effective participation in big data and other analytics projects.
As a Big Data citizen, I want to motivate those thinking about moving into the world of Data Science, to take action and get trained. I met with Barry Heller, a developer for EMC’s Data Science curriculum, who leverages his extensive education and past experience as an EMC Data Scientist for curriculum development. If Barry’s story resonates and you relate in some way, I hope it inspires you to start a career in Data Science.
1) How many people have completed the EMC Data Science and Big Data Analytics training since its creation early this year?
Approximately 1000 people have completed the Associate level course through various channels. Depending on your preference, we offer a 5-day training in person at our education centers, online or on demand training, and deliver customized training directly at a customer location. In addition to Associate level training, we are developing add-on courses for more in depth focus on specific Big Data techniques and technologies.
We also make the course material available to approximately 900 college and universities that are part of our Academic Alliance and about 50 of them have integrated the course into their curriculum. For example, Carnegie Mellon was the first partner to adopt our course and Dakota State recently taught our course as a PhD course for information systems.
2) Who predominately attends the course?
Majority of attendees are professionals with an analytical aspect to their job and have technical and quantitative aptitude at some level. They want to go beyond basic reporting and trending to predictive analytics. Some have an extensive background in quantitative methods but want to learn the tools needed to more effectively solve problems. Others are more technical and want to learn quantitative methods to add more value in their jobs. We also see fresh new graduates in the math and engineering fields that have a deep understanding of quantitative methods, but no domain or work experience.
3) After attending the training, can one apply for a Data Scientist role?
After completing the training, most people have a renewed excitement for their profession. Because they look at data differently, they go back to their organization and proactively seek to participate or start Big Data analytics projects. Those that are fresh out of college have secured Data Science internships.
4) How is this Data Science training different than others in the market?
This is an open course, developed by Data Scientists and practitioners who have worked with a breadth of tools ranging from open source to vendor specific tools. We don’t pitch EMC products, but rather train students on how to approach the analytics lifecycle using common tools in the marketplace such as R, MapReduce/Hadoop, etc.
5) Prior to this role in EMC Education Services, you were a Data Scientist in the EMC Total Customer Experience organization at EMC. What were some of your Big Data projects?
One Big Data project was around optimizing operations to improve customer satisfaction and lower costs. We developed an early warning application to detect potential issues of at customer sites. Previously, Engineers would receive information and metrics on a monthly basis that would just report on what had already happened. With the early warning application, we were not only able to compress the analysis time by 75%, but also provide the foresight into what the future may bring. This helps to prioritize root cause investigations and ensure the Engineers are focused on the right problem at the right time.
Another Big Data project I was involved in was around preventative maintenance strategies for EMC products.. The overall approach was to determine the most effective strategy to meet customer expectations of high availability .
6) What Big Data technologies were used in these projects?
We used Greenplum Data Computing Appliance to improve performance and SAS to build the predictive models and communicating results to the Engineers via a web portal.
7) How did you become a Data Scientist?
The role evolved over time. Before becoming a Data Scientist, I spent the last 14 years in the EMC Total Customer Experience group with roles such as a reliability engineer, managing a Statistical Engineering Group, and leading a Data Warehouse project as part of an ERP implementation. These roles required me to work with many areas of the business such as Customer Service, Engineering, Manufacturing, Marketing, Sales, Finance and Legal. Over time, the types of problems, that were presented to me, went beyond my day to day job. Marketing, for example, would ask ‘how long are customers keeping our products? When are they going to trade in their products?’.
Having interests in Economics and a more obscure area called Decision Science also led me to become a Data Scientist. Decision Science is about merging statistics, information, and even psychology into understanding how the best decision can be made. This background and interest has certainly been beneficial to me as a Data Scientist.
8) What is your education background?
I received a Bachelors degree in Computational Mathematics, Masters in Mathematics, and currently working on my PHD in Statistics.
9) What skills do you feel are necessary to become a successful Data Scientist?
You need a strong background or foundation in Mathematics, Statistics, Programming, and Databases. But beyond that you need to be able to work in a team environment and have communication skills to collaborate with the IT, Lines of Business, Executives, and explain results back to them. They care about what the problem is and how to solve it. The other important traits are that you need to be curious, like to solve problems, and always be skeptical about your results.
10) What is your favorite aspect of being a Data Scientist?
Providing value to the business.
11) What are your favorite software tools?
Tools that I have written myself!
12) What is your idea of an ideal data set?
Data that you can trust and don’t have to question. Though I haven’t met one yet.
So my question to you who are on the fence of becoming a Data Scientist…has Barry’s story inspired you to get trained? If you still need a nudge, attend a live web event Tuesday 8/28 at 11am pst, with David Dietrich, architect of the EMC Data Science curriculum.