Ed. Note: This blog post was authored by Joey Jablonski, Enterprise Technologist, Office of the CTO
There is a great deal of talk in the Hadoop community today about the impending end to MapReduce. This is not about whether or not to use MapReduce, but rather a change in how we analyze data to leverage methods that are better suited for streaming data and rapid decision making. MapReduce will continue to be leveraged in many ways, while the Hadoop ecosystem will grow with new capabilities and interfaces, making its adoption a possibility in even more organizations and for more use cases. Hadoop will remain a primary platform for data analysis through strong interfaces, APIs and flexible methods for data ingestion, analysis and processing.
The biggest change we see in the Hadoop community is that need to drive faster decisions based on inbound data. Organizations as diverse as marketing, financial services, sales, logistics, and security all want to ensure they execute faster than the competition. More and more data analysis teams are working to minimize the time between receipt of data and decision making.
There are many ways to accelerate performance. One method is to minimize IO operations through storage of key data in-memory within servers, instead of on traditional spinning media. In-memory technologies like Apache Spark give us the ability to stage data in memory as part of a Hadoop cluster for much faster and more iterative access to data.
While Spark provides a great deal of advanced functionality, it is still a very new technology that requires significant experience to deploy and manage. Dell recently announced and is now launching the Dell Cloudera In-Memory Appliance to enable organizations to gain the advantages of in-memory computing, while ensuring the platform is quick to deploy, efficient to operate and stable to process data on. This appliance combines the unique industry experience of Dell, Intel and Cloudera to enable organizations to adopt this new technology and manage it as you would other traditional enterprise data platforms.
In addition to rapid deployment through an appliance, Dell can assist with the ongoing operations of Hadoop environments through managed services. These capabilities ensure a stable production environment, with flexible options for upgrades, monitoring, and systems operations. These managed services ensure stable, production environments, with minimal operational burdens to in-house IT staff.
Spark can increase performance for a variety of use cases including the ingestion of streaming data, fraud detection, and risk modeling. The Dell Cloudera In-memory Appliance is built to accelerate workloads that demand rapid decision making on streaming data.
Just as Hadoop has fundamentally changed how we manage and analyze data, Spark is driving another revolution around enabling large scale analysis of memory at high speed through in-memory storage and processing. Dell will continue to focus on innovating how these complete technologies are deployed, ensuring that Spark can be deployed rapidly, with low risk, and integrated into existing infrastructure to drive rapid decision making and data-driven decisions.