The Human Face of Big Data provides a real story or ‘face’ to the abstract nature of data. Big Data initially evolved as a technology buzzword focused on transforming IT for business advantage. But now, we see Big Data also providing unprecedented value in areas such as Health and Wellness, Society, The Environment, Crime and Safety, Government Affairs, and more. The Human Face of Big Data aims to propel ‘Big Data’ from a technology term, to a household term, where each individual relates, sees value, and is inspired to take part in transforming humanity.
For example, people obsessed with their health (shown below) are inspiring data-driven health and wellness products such as wearable sensors that measure heart rate, brain activity, body temperature and hydration levels, and more. The data collected not only provides valuable insight about an individual person, but also provides insight into health patterns over an entire population.
On October 2, 2012, The Human Face of Big Data will unveil the transformational power of Big Data through beautiful visualizations that tell a compelling story about different aspects of our lives. Data Scientists will gather in New York, London, and Singapore (Mission Control centers) to showcase the insight gained from massive amounts of data generated from human activity and behavior around the world. So what’s behind the beautiful visualizations or the face of Big Data? Technology. As the sponsor of this project, EMC has contributed its Big Data technology and partnerships to support the project. Jacque Istok, Field CTO of EMC Greenplum, discusses how the Human Face of Big Data was brought to life through an architecture he helped designed.
1. What were your goals for the Human Face of Big Data project?
The goal was to ensure that the right architecture could dynamically scale across Mission Control centers worldwide. This means we had to provide high performance analytical processing across massive amounts of data coming in from the Twitter fire hose and other sources.
2. What was your methodology in designing the architecture for The Human Face of Big Data?
I’ve been with EMC Greenplum for the past 4 years so the methodology is based on numerous deployments I have worked on similar to this. We looked at the project goals and mirrored what technologies we have in house and across partners and the open source community.
One of the first things we looked for was a flexible hardware backend to house an indeterminate amount of data. The Greenplum Analytics Workbench, a 1,000-node cluster Greenplum Unified Analytics Platform(UAP) lab environment, provided us this elastic backend. From there, we determined that the analysis required would be best accomplished by using the Greeplum Database to perform structured analytics on unstructured data in stored in Greenplum HD. We then leveraged several analytics tools to facilitate the mining, including Alpine Miner, the MADlib open-source library for scalable in-database analytics, R, SAS, and Tableau.
3. What do you think are the advantages of this architecture?
For the backend tier, Greenplum UAP has the ability to rapidly ingest data in real time, such as real time Twitter feeds. Greenplum Database’s in-database analytics enables analysts to mine all the data, not just a subset of data on their desktop or data mart. For the web tier, we use a virtualized environment and SQL Fire to support rapid query of low latency data. For the end user tier, the EMC architecture supports a partner ecosystem of visualization, data mining, and analytic tools so that data scientists can leverage technologies that they are most familiar with such as R, SAS, Alpine Miner, Tableau and more.
For more information, please visit