We’re almost a month into the Major League Baseball season, and every year there’s at least one fan base that gets caught in the following trap.
Despite modest expectations, their team gets off to a fast start. Career .250 hitters look like Ted Williams. Historically light hitting outfielders are suddenly on pace to hit 70 home runs. Pitchers that once appeared to serve up mere batting practice are suddenly unhittable.
And then reality and regression to the mean sets in. Despite the hopes for a miraculous off-season transformation, the team turns out to be what it is. They sink in the standings and corresponding disappointment sets in.
Many Big Data initiatives we have seen with our global enterprise customers have fallen into similar doldrums. They gained quick traction and visibility with a high impact business use case, but now that efforts are focused on scaling, operationalizing, and demonstrating ongoing success, efforts are stalling. ‘Regression to the mean’ is occurring in the enterprise sense – the common barriers we’ve all seen due to organizational complexity, cultural resistance to change etc. have brought initiatives back to reality.
Based on the initial excitement and promise of Big Data, many of our customers have developed an overall framework for how technology, data people and process need to come together to support the global enterprise. While versions certainly differ, these visions typically look something like this:
These frameworks typically have four major components and associated capabilities required for success:
- Deliver Business Value – the first is an understanding of specifically where and how advanced analytics can drive business value and competitive advantage across a variety of use cases and users, including IoT and intelligent applications, data scientists and analysts, as well as regular business users.
- Enable Big Data as a Service – to drive the use case ‘pipeline’, data consumers must have self-serve access to the right data and tools. This means rapid service provisioning using a self-serve operating model with appropriate user access controls, similar to what enterprises have enabled with IaaS cloud.
- Provide Effective Workspaces – in addition to access to tools and data through self-serve catalogs, organizations also must provide shared workspaces for access to work with target data sets. These environments need to optimize compute resources whether they are deployed internally, in the public cloud or in a hybrid model.
- Optimize Data Capture and Storage – finally organizations must have an optimized environment for data ingestion and storage that is optimized for both performance and cost.
So given this relatively clear vision for analytics success, why are so many Big Data initiatives stalling, and what can be done to back on the path to success despite the early excitement? Here’s our quick take based on clients we’ve worked with:
- No perceived ROI / Business Value – while the first low hanging fruit use cases were easy, most organizations lack the data science talent to know what problems can be solved with advanced analytics and how. Without this understanding, developing a value-based pipeline is difficult, making it difficult to link Big Data to business value and ROI, let alone assess the overall value of data to the enterprise.
Recommendation: Develop an explicit plan and roadmap for building data science skills and capability, as well as a framework and approach for building a use case pipeline.
- Provision time for new services and environments – given the demand for speed and agility, users cannot wait for months to have tools or environments to be made available, whether it be Data Scientists or Business Analysts. This is about more than just providing self-serve catalogs and provisioning, it’s about making sure an overall operating model, including processes and roles, are in place to support it. Far too often this component is overlooked.
Recommendation: Explicitly design a new operating model, including governance, for enabling and delivering Big Data-as-a-Service (BDaaS).
- Data lineage and metadata not fully documented – as the saying goes, ‘garbage in, garbage out. Business users need to have confidence in the integrity of the data supporting analytics around their target use cases. Where did the data come from? Who has had access to it? Can it be trusted? These are all questions that organizations need to answer to give users trust and confidence in the insights being generated.
Recommendation: Design and embed a formal data governance framework in the overall operating model for BDaaS.
- EDW performance – just as many vendors and enterprises ‘cloudwashed[1]’
traditional virtualized environments to give the impression of progress, legacy technologies like enterprise data warehouses (EDWs) are still underneath many Big Data initiatives today. Legacy EDW and ETL models were not designed to keep pace with the massive data volume growth, diversity of data sources (including unstructured, sensor/device, video, and soon blockchain data sources), and the hyper-accelerated “fail fast / learn faster” data science process. Consequently, this is creating significant operational challenges that are proving to be quite expensive.
Recommendation: Aggressively Identify opportunities to replace or augment costly EDW and ETL capabilities with Hadoop-based alternatives.
- Performance and Operational Issues – while standing up a ‘quick and dirty’ environment with Hadoop or Greenplum to support an initial use case or proof-of-concept (POC), many organizations have continued to attempt to extend that environment, rather than go back and re-architect and design their long term platform. Not surprisingly this reliance on 1.0 deployments is creating performance and operational challenges for many
Recommendation: Conduct a health check on your current Big Data compute and storage platform, and ensure architecture and implementation will support anticipate use case volume.
This brief overview isn’t meant to minimize the magnitude of some of these challenges – in many cases a lot of un-learning, re-thinking and re-designing will be required to rebuild the ‘early season’ excitement and momentum enterprises initially saw with Big Data. But the first step needs to be identifying and recognizing the problems.
Dell Services is uniquely positioned to help our clients address the key challenges they face as they drive their Big Data and IoT transformations. We globally provide end-to-end capabilities from use case identification to Data Lake architecture and design and Big Data platform implementation. In addition to Solution Engineering, our Big Data transformation consulting services provides deep capabilities in areas such as Data Science and advanced analytics, use case identification, operating model, Big Data strategy and governance. Please contact us for more information.
[1] Cloudwash/cloud wash: the purposeful and sometimes deceptive attempt by a vendor to rebrand an old product or service by associating the buzzword “cloud” with it. http://searchcloudstorage.techtarget.com/definition/cloud-washing