What do Greek sailors, knitting supplies, a Spanish painter, thermal energy, and an obscure Greek word have in common, and why are they in modern data centers?
Kubernetes, YARN, Diego, Heat, Mesos – does it ever seem like the IT industry today is simply a confusing alphabet soup of names being bandied about? Ever wonder what these things are and why they are important?
Before answering the question of what these are, it is important to first understand the transitions taking place in the data center and what new challenges are arising with new technologies.
A modern data center
The advances in commodity (generally x86) processing power, the prevalence of cheaper SATA disks, and the emergence of faster networks have been the major elements to transform the modern data center. Standard hardware platforms have now become capable enough to run enterprise grade workloads without the need for specialized hardware or purpose built resiliency and serviceability features. As such, farms of these commodity hardware platforms have replaced the specialized systems that previously characterized the data center.
Applications designed for a modern data center are composed of a set of cooperating processes, where each process is typically encased within a schedulable virtualized environment (e.g. a VM, a container, a JVM, etc.), as described in methodologies such as the 12-factor application. The reasons behind this architectural paradigm are myriad, including improved fault isolation, improved fault recovery, ability to scale the app quickly, and the ability to move the app seamlessly to newer/faster HW. Whichever form of virtualization technology is chosen as the basis for the application environment, it creates a layer of abstraction between the application environment and the hardware, opening up multiple options and strategies for scheduling application environments onto the farm of commodity hardware platforms.
With a farm of commodity hardware and a suite of disparate applications to run, the question becomes one of how best to map the applications onto the available resources in the processing farm. Having a system administrator manually monitor and rebalance applications across the processing farm would be inefficient, thus the need for an automated scheduler to perfom these actions – and this is where Kubernetes, YARN, Diego, Heat, and Mesos enter into the picture.
There is a wide variance in the capabilities and scope across these different schedulers, but at a basic level, each provides:
- The ability to select the most appropriate, currently available hardware resources for new application instances
- The ability to run multiple diverse applications on a single HW platform
- Automated application failure detection and restart capabilities
- Scaling the number of application instances up or down in response to bursts or dearths of app activity
However, new issues arise from the fact that the applications are no longer tied to individual machines that the scheduling framework must address to ensure that the application can execute properly.
- Applications state may no longer be stored local to a particular piece of HW, as applications are mobile within the environment
- The framework must provide a dynamic mechanism for an application to locate and access its shared state.
- Application instances that need to communicate with each other cannot rely on a pre-configured set of IP addresses to identify their peers
- The framework must provide dynamic network capabilities to enable app-to-app connectivitiy, especially if the connection must be secure.
- Applications may need local storage for scratch files or temporary storage of intermediate results
- The framework must provision this in a way that is appropriate on the physical HW where the application happens to be running
- The framework must provide the application a generic (i.e. not specific to the particular HW platform) channel to access the temporary local storage.
- The framework must de-provision the temporary storage whenever the application instance stops running, be it either a graceful or unexpected shutdown.
- Applications that are clients of other applications (e.g. one app is a client of a database) cannot rely on fixed addresses of where those services should be running
- The framework must provide a dynamic service to allow the app to discover what application services are available and to subscribe to those that are appropriate.
While there are many similarities between Kubernetes, YARN, Diego, Heat, and Mesos at a basic level, each of these frameworks also has differentiating features in how it works and how much of the overall application development and deployment lifecycle it captures beyond what is captured above.
- Kubernetes has been designed and optimized for the coordination and scheduling of containerized Linux applications.
- YARN is the next-generation Hadoop task scheduler, and has been generalized as an enterprise scheduling framework, especially for sets of applications accessing a common data set.
- Diego is part of the larger Cloud Foundry project, which aims to provide a full cloud scale development, deployment, and operations environment.
- Heat is part of the OpenStack project, which aims to provide a full cloud stack for private, hybrid, or public cloud environments.
- Mesos aims to provide a basic level of resource allocation and scheduling, which can then be customized by various plugins for particular application workloads.
The modern data center has been transformed at all levels by the rise of commodity components. At the lowest level, storage products like ScaleIO and Elastic Cloud Storage (ECS) provide software defined and managed storage pools which isolate the applications from the details of the hardware, while at a higher level virtualization and containerization technologies isolate the application runtime environments from the details of the hardware. Frameworks such as Kubernetes, YARN, Diego, Heat and Mesos fill the gap between the storage and the applications and complete the picture of an application environment that can adapt and change as the hardware environment is expanded or upgraded.
And now you know why Greek sailors, knitting supplies, a Spanish painter, thermal energy, and an obscure Greek word are finding a home in modern data centers.