In today’s enterprises, business and IT leaders recognize that systems for artificial intelligence (AI) and machine learning (ML) are now essential to business success. From building closer customer relationships to optimizing operational processes, AI and ML are new keys to gaining a competitive advantage.
And that’s the easy part — recognizing the opportunity. The harder part is getting there. Without a strategic approach to deploying and managing infrastructure for AI and ML initiatives, enterprises can find themselves caught in the complexity that comes with a disaggregated infrastructure environment, with many different systems devoted to many different applications. The key is to understand the fundamental requirements for a data environment that supports AI and ML initiatives across the enterprise, and then put the right infrastructure solutions in place.
This is one of the takeaway points from a new report by Enterprise Strategy Group (ESG). In this report, ESG outlines infrastructure essentials for data pipeline and data lake environments that support AI and ML applications. This report is enriched by ESG’s findings from a survey of 325 IT professionals at organizations in North America who are involved with the infrastructure associated with AI initiatives. This research sheds lights on the breadth of the IT infrastructure that comprises modern data pipeline environments and provides insights into the priorities and business objectives organizations have for their AI and ML initiatives.
The report also strikes a cautionary tone. “When organizations achieve success with AI/ML initiatives, they often quickly ramp investment, expanding projects to target multiple objectives,” ESG warns. “The resulting data pipeline infrastructure often scales quickly, resulting in a massive and often disaggregated infrastructure environment.”
A key message here is that the right data pipeline infrastructure is an essential building block for success with AI and ML initiatives.
Four infrastructure essentials for data pipeline and data lake environments
The ESG survey found that designing the right infrastructure environment helps fuel continued success with AI and ML initiatives. And for these efforts, the firm isolated four top considerations:
-
- Maximizing infrastructure performance and utilization — AI/ML environments can quickly scale in both performance and capacity. Maximizing utilization is key to keeping costs under control. This requirement extends to server accelerators such as GPUs.
- Hybrid/multi-cloud capability — Data pipeline and data lake environments typically require massive scale and often span multiple locations. Infrastructure that can simplify the management of multiple sites and locations creates value.
- Data management and security/governance — 100% of the participants in the ESG study identified that at least some of the data in their data pipeline was sensitive data. This means that security and governance must be top priorities for any AI/ML environment.
- Data durability/high availability — The data in this environment delivers business value and, for organizations with AI/ML in production, these workloads are often viewed as business-crucial. Therefore, data must be resilient and always available.
Ultimately, every AI initiative begins with a business challenge or opportunity, ESG notes. Data scientists convert a use case into a data science problem and then develop a solution, at which point IT takes over to make the solution production ready.
“These initiatives typically require very large data sets needed for training the AI model,” ESG says. “These large volumes of data need to be moved in and out of various systems while at the same time complying with the organization’s data governance and ensuring data availability and durability. Without the right infrastructure foundation, organizations lose precious time and money.”
For the full story, see the ESG ebook, “The Four Infrastructure Essentials for AI/ML Data Pipeline and Data Lake Environments.”