How do we measure the mission criticality of storage systems? What comes to mind when you hear or read the words, “mission critical”? Certainly, you’d think of reliability, resiliency, data protection, etc. But I’m willing to bet that you also, almost reflexively, think of performance –measured in millions of IOPS, transactions per second, or sub-millisecond latencies. To many, mission critical means fast. Think all flash arrays and high-end block storage. This is what the industry refers to as “Hot” storage.
“Cold” storage, on the other hand, gets no love. When you think cold storage, you think of old data you don’t want but can’t get rid of. You think of tapes in caves or a $0.01 per GB/month cloud storage service. Think low cost, commodity and object storage. Cold storage has an image problem, thanks in no small part to Amazon Web Services introducing Glacier in 2011 as a cold archiving service. You don’t often hear the terms “mission critical” and “cold storage” in the same sentence (see what I did there?). You think cold storage isn’t important. And you’d be wrong.
You’d be wrong because the world of storage doesn’t bifurcate so neatly into just two storage categories. Cold storage, which is frequently delivered by an object storage platform, can actually be different temperatures – cool, chilled, cold, colder than cold, deep freeze, etc. Confused? IDC explains:
Source: IDC Worldwide Cold Storage Ecosystem Taxonomy, 2014 #246732
It all depends on the use case and how active the data is. Extreme or deep freeze archive is when the data is seldom, if ever, accessed. Amazon Glacier is an example. Access times can range from hours to more than a week depending on the service – and you pay for the retrieval. Deep archive makes up the bulk of the cold storage market. The data is also infrequently accessed but it remains online and accessible. IDC cites Facebook Open Vault as an example. Active archive is best for applications that may not modify data frequently, if at all, but can read data more frequently as in Write Once, Read Many (WORM). An example use case is email or file archiving; IDC cites EMC Centera as an example. EMC Atmos and EMC Isilon are also good examples.
Object storage, general speaking, falls under the category of cold storage and is used for any temperature. But it should not be pigeonholed as an inactive, unimportant storage tier. Object storage is a critical storage tier in its own right and directly influences the judicious use of more expensive hot storage. With the explosion in the growth of unstructured content driven by Cloud, mobile and big data applications, cold secondary storage is a new primary storage. To the salesperson or insurance adjuster in a remote location on a mobile device, the object storage system that houses the data they need is certainly critical to their mission.
The importance of cold storage is best explained in the context of use cases. The EMC Elastic Cloud Storage ECS appliance is a scale-out object storage platform that integrates commodity off-the-shelf (COTS) components with a patent-pending unstructured storage engine. The ECS Appliance is an enterprise-class alternative to open source object software and DIY COTS. ECS offers all the benefits of low cost commodity but saves the operational and support headache of racking and stacking gear and building a system that can scale to petabytes or exabytes and hundreds or thousands of apps. Organizations evaluating ECS appliance are generally pursuing a scale-out cloud storage platform for one or more of the following three use cases:
Global Content Repository
This is often an organization’s first strategic bet on object and cloud storage. Object storage, due to its efficiency and linear scalability, makes an ideal low cost utility storage tier when paired with COTS components. The ECS appliance delivers the cost profile of commodity storage and features an unstructured storage engine that maintains global access to content at a lower storage overhead than open source or competing object platforms. This lowers cost and makes their hot storage more efficient and cost- effective by moving colder data to their object archive – without diminishing data access. But it’s more than that. A crucial aspect of a global content repository is that it acts as an active archive; the content is stored efficiently but is also always accessible – often globally. And it’s accessible via standard object storage APIs. Consequently, the global content repository also supports additional uses such as next-generation file services like content publishing and sharing and enterprise file sync and share. And there is an ecosystem of ISV partners that build cloud gateways/connectors for the ECS appliance that extend the use case further.
Geo-scale Big Data Analytics
Geo-scale Big Data Analytics is how EMC refers to the additional use of a Global Content Repository for Big Data Analytics. The ECS Appliance features an HDFS data service that allows an organization to extend their existing analytics capabilities to their global content repository. As an example, one ECS customer uses their existing Hadoop implementation to perform metadata querying of a very large archive. ECS appliance treats HDFS as an API head on the object storage engine. A drop-in client in the compute nodes of an existing Hadoop implementation lets organizations point their MapReduce tasks to their global archive – without having to move or transform the data. The ECS appliance can also be the data lake storage foundation for EMC Federation Big Data solution. This can extend analytics scenarios to include Pig, Hive, etc. In addition, since ECS is a complete cloud storage platform with multi-tenancy, metering and self-service access, organization can deliver active archive analytics or their data lake foundation as a multi-tenant cloud service.
The ECS appliance overcomes some of the limitations of traditional HDFS. ECS handles the ingestion and efficient storage of a high volume of small files, high availability/disaster recovery is built in, and distributed erasure coding provides lower storage overhead than the 3 copies of data required by traditional HDFS.
Modern Applications
Mainstream enterprises are discovering what Web-centric organizations have known for years. Object storage is the platform of choice to host modern, REST-based cloud, mobile and Big Data applications. In addition to being a very efficient platform, the semantics of object make it the best fit for Web, mobile and cloud applications.
I recommend viewing the webcast, “How REST & Object Storage Make Next Generation Application Development Simple” to get an in-depth look at object architecture and writing apps to REST based APIs. However, there are two features unique to ECS that facilitate the development and deployment of modern applications:
- Broad API support. ECS supports Amazon S3, OpenStack Swift or EMC Atmos object storage APIs. If developing apps for Hadoop, ECS provides HDFS access.
- Active-active, read/write architecture – ECS features a global index that enables applications to write to and read from any site in the infrastructure. ECS offers stronger consistency semantics than typically found in eventually consistent object storage. ECS ensures it retrieves the most recent copy of a file. This helps developers who previously had to contend with the possibility of a stale read or write conflict resolution code into their applications.
Noam Chomsky once said, “I like the cold weather. It means you get work done.” You can say the same for cold storage; it also means you get work done. It’s become a workhorse storage platform. It doesn’t get the sexy headlines in trade rags. But I hope after reading this and understanding the actual use cases for ECS appliance and object storage, you have a better appreciation and some love for cold storage. There are lots of solutions for storing old data that just can’t be thrown away and most compete purely on price. But, if your applications and data fall into one or more of these use cases, then the ECS appliance should be at the top of your list.