Tackling the New Reality of Multiple Database Management Systems
In today’s global economy, companies must be multilingual to be successful. The same holds true in the world of big data, as analysts and developers increasingly need to speak or write multiple languages to effectively address complex data problems.
So, what’s the best way to navigate an ever-rising tide of big data sources? Most large enterprises already have multiple relational database management systems, data warehouses and data marts, as well as massive stores of structured and unstructured data. Customers continually ask for advice on the best way to manipulate, mine and leverage varying data models and sources, which vary depending on the business problem they’re trying to solve.
Adding to the complexity is the need for analysts to speak not only every dialect of SQL but also Open Source R and its derivatives. In the past, you would be hard pressed to find someone with multilingual database skills as most people spoke only one language. To cope with today’s data-driven environments, however, companies now are creating an ecosystem of linguists who speak all languages so they can better manage multiple data sources.
Enter the polyglot—a person who can speak or write multiple languages. In the world of big data, polyglot data persistence is the ability to apply different models to solve data problems. Sounds straightforward, right? In reality, this is quite complicated, especially when you take into consideration how these multiple views impact how data analysts interfaces with the data and how this ultimately affects the analytics. To solve complex data problems, you need multiple views of the data collected from different sources and disparate structures, which then can be combined with the appropriate metadata, aggregated and analyzed.
The key to simplifying this extremely cumbersome process is to simplify the initial ask—don’t boil the ocean when understanding a simple data flow can provide valuable insight. Start by breaking down your questions or problems into bite-sized chunks. What are you trying to learn? What data do you need to collect to shed light on an issue and where does it exist? Which two—not 200—pieces of data do you care about most and why?
At Dell, we understand the importance of taking a pragmatic view and measured steps to build an infrastructure to manage this complex environment. A couple of weeks ago we showcased a great example of polyglot data persistence with a demo of how we can help predict an asthma event and alert patients and clinical staff. It’s a powerful showcase for using multiple data source and modern predictive analytics software to greater effect and for the greater good, since asthma is the No. 1 chronic disease for children and more than 25 percent of all ER visits are asthma related.
Since asthma can be life threatening if not treated properly, we started by using Dell Boomi application integration to hook into traditional medical sources and gather relevant patient data using secure, encrypted technology. We then added air quality data by zip code from the National Oceanic and Atmospheric Administration, pollen counts, barometric pressure and weather forecasts. Other relevant data, such as a person’s personal Google calendar, were brought in as there are correlations between stress and asthma attacks.
After gathering info from multiple sources, it was aggregated and made available to Dell Statistica for the creation of different predictive models. These models then can be deployed and acted upon by clinicians, patients and even emergency room doctors. For example, clinicians can easily view a list of all their patients by zip code correlated with weather and quality for specific areas along with the patient’s adherence to their prescribed asthma medication. Armed with this data, they can pre-emptively contact patients to check on their wellbeing before an attack should occur.
Meanwhile, patients can receive text messages via push notifications, alerting them if they should change a daily routine or stay inside if poor air quality or high pollen counts could impact their condition. Emergency rooms can receive a patient list and weighted weather forecasts to help determine any spikes in patient volume due to particular environmental conditions.
While there are many, many data points and sources brought to bear in this polyglot data persistence, weather and air quality are the two most salient pieces of information. With that understanding, we were able to identify and link to other data, then manage the metadata as well as build predictive models and actionable insights to produce the greatest value.
Dell’s broad big data experience, combined with application integration and predictive analytics expertise, is proving invaluable in helping customers become more fluent in polyglot data persistence. What are your plans to use data to greater effect? Connect with me on Twitter at @joschloss to continue the conversation.