by Amy Price
The predictions have been out for a while and many are starting to feel the pressure: data is indeed growing at an accelerated rate, expanding both in volume and in variety. Much of this growth is coming from structured data sources, many driven by new operational technologies that produce data in a steady deluge. Some examples include smart electric meters, stock trading or "tick" data, and mobile phone call data records (CDR). The increasing volume of this data is over running existing data bases, to the point where much of it is being funneled off into unstructured file systems, making it more difficult to recover when it's needed for analysis.
But why not just delete this data? First, affordable analytics are providing the ability to look at ALL of the data at once, no longer restricting insights to just statistical sampling. Hadoop and its cloud cousin Cloudera can provide virtually anyone with access to big data to run analytics across multiple types of data, revealing new information from the data. Additionally, regulations and the need to defend against potential litigation are causing the creation of new retention policies that may be increasing the retention periods substantially – 10X or more for certain data types.
These compelling forces – extreme data growth, mandates to retain data, and the ability to affordably analyze data and gain value from it – are converging to create a new era of data-driven decision making and insight. However, to make all of this work means that a better way to keep this data available is not only good to have – it's absolutely required. Without it, resources to manage the data will scale about as fast as the data itself, which simply is not sustainable in most organizations.
Dell has been working on a solution that couples Dell storage with RainStor's specialized data base and compression technology to help significantly reduce the size of structured and semi-structured data to as little as 3% of its original size. It also provides information management capabilities, including automated retention periods and deletions. Having much smaller data volumes plus data management capabilities simplifies the resources and infrastructure needed to manage the data. It enables massive scalability while providing access to the data even in its 'dehydrated state' through standard BI and analytics tools.
Why not simply increase the size of a data warehouse to accommodate these new data volumes? First, this is in most cases a costly proposition, as adding additional space to an existing data warehouse can cost into the many millions of dollars. Second, data warehouses are often optimized for specific data sources, while big data introduces new varieties of data at increased velocities which may be more efficiently processed with other technologies. Lastly, some of this data simply needs to be retained near-term for future use, and an efficient storage solution will provide this capability at a lower cost than any data warehouse.
Thinking about creating your own plan to manage big data in your enterprise? Learn more about the solution by visiting www.dellstorage.com/big-data-retention. Still wondering how big data impacts real world problems? Read this blog post about the big data challenges facing the banking and securities industry.