There are a series of trends which have determined the overall direction of the IT industry over the past few decades. By understanding these trends and projecting their continued effect on the data center, applications, software, and users, it is possible to capitalize on the overall direction of the industry and to make intelligent decisions about where IT dollars should be invested or spent.
This blog looks at the trends of increasing CPU power, memory size, and demands on storage scale, resiliency, and efficiency, and examines how the logical outcome of these trends are the hyperconverged architectures which are now emerging and which will come to dominate the industry.
The storage industry is born
In the 90s, computer environments started to specialize with the emergence of storage arrays such as CLARiiON and Symmetrix. This was driven by the demand for storage resiliency, as applications needed data availability levels beyond that offered by a single disk drive. As CPU power remained a constraining factor, moving the storage off the main application computer freed up computing power for more complex protection mechanisms, such as RAID 5, and meant that more specialized components could be integrated to enable features such as hot-pull and replace of drives, as well as specialized HW components to optimize compute intensive RAID operations.
Throughout the 90s and into the 2000s, as storage, networking, and computing capabilities continued to increase, there were a series of treadmill improvements in storage, including richer replication capabilities, dual-disk failure tolerant storage schemes, faster recovery times in case of an outage, and the like. Increasingly these features were implemented purely in software, as there was sufficient CPU capacity for these more advanced algorithms, and software features were typically quicker to market, easier to upgrade, and more easily fixed in the field.
A quantum leap forward
The next architectural advance in storage technologies came in the early 2000s with the rise of scale-out storage systems. In a scale-out system, rather than rely on a small number of high-performance, expensive components, the system is composed of many lower end, cheaper components, all of which cooperate in a distributed fashion to provide storage services to applications. For the vast majority of applications, even these lower end components are more than sufficient to satisfy the application’s needs, and load from multiple applications can be distributed across the scaled out elements, allowing a broader, more diverse application load than a traditional array can support . As there may be 100 or more such components clustered together, the overall system can be driven at 80-90% of maximum load and still be able to deliver consistent application throughput despite the failure of multiple internal components, as the failure of any individual component has only a small effect on the overall system capability. The benefits and validity of the scale-out approach was first demonstrated with object systems, with scale-out NAS and scale-out block offerings following shortly thereafter.
The birth of a new application paradigm
While the above changes were taking place in the storage market, the rise in computing power was also enabling a change in how applications were being run and managed. Virtualization technologies improved to the point where the performance penalty of running applications in a virtual machine became minimal to non-existent. Traditional storage arrays could provide a large pool of storage to multiple applications, and the rise of virtualization meant that a pool of applications could be deployed on a number of systems to access the storage, with the advantages of application mobility, quick failover of applications, and easy deployment of new or intermittent applications. The data center split into a virtualized compute farm as a general application platform and a storage farm serving data to the applications.
The two trends meet
Hyperconvergence is the next logical step in this journey, made possible by the trends explained above. Scale-out storage architectures take advantage of a large farm of commodity systems which cooperate over a network to provide enterprise grade storage features and performance. Virtualized environments rely on a large farm of commodity systems which cooperate over a network to provide enterprise grade application deployment and failover features. As both the storage and the application environment need “a large farm of commodity systems which cooperate over a network”, the question becomes if two such farms are needed, or if the storage and the compute can just run on the same farm?
Hadoop provides one early answer to this question. Hadoop is a specialized framework which allows one or more applications access to data stored in the Hadoop storage environment, HDFS. As Hadoop is optimized for applications which do bulk data processing, it relocates applications dynamically to maximize application access to data. While Hadoop uses Java based technologies to provide the run-anywhere capability for applications, the general point is the same – the compute farm and the storage farm are combined on the same infrastructure which enables optimization in data processing capabilities that simply were not possible previously.
Hyperconverged systems provide storage services, via scale-out implementations of block, file, and object storage, and combine it with portable applications managed via a virtualization or containerized framework. Orchestration services provide the necessary coordination between the applications and the storage, with implementations like ViPR or OpenStack becoming increasingly popular. The industry today is just at the point where hyperconverged systems have sufficient CPU and networking resources to run both application and storage workloads; as time progresses and CPU and networking capabilities continue to improve, the overall efficiency and manageability benefits of hyperconverged systems will only become more compelling.