As the leader of Dell’s Server & Infrastructure Systems CTO team, I’m constantly drawn to the future. While many of our 2018 Server Trends and Observations came to fruition, and some are still ongoing, our technical leadership team has collaborated to bring you the top 10 trends and observations that will most greatly affect server technologies and adoption for 2019.
As the global leader in server technology, Dell has attracted some of the brightest minds in the industry. Sharing a small glimpse into our mind trust – with deep roots in listening to our customers and leaders around the industry –each of these ten trends is authored by one of our Senior Fellows, Fellows, or Senior Distinguished Engineers.
#1 – IT must be the enabler of the Transformational Journey
Robert W Hormuth – CTO & VP, Server Infrastructure Solutions
From a broader technology point of view, we are clearly in a data-driven digital ecosystem era, and you can read more about a wider set of 2019 technology industry predictions from Jeff Clarke, vice chairman of Products and Operations here at Dell Technologies. Businesses must embark on a challenging journey to enable multiple transformations: Digital, IT, Workforce and Security.
When it comes to servers, we see them as the bedrock of the modern datacenter. Transformations are bringing an incredible value to businesses and organizations of all types, making them more nimble, intelligent, competitive and adaptive. We are in the midst of a 50-year perfect storm on both technology and business fronts. Businesses must transform and embrace the digital world, or get run over by a new, more agile competitor with a new business model benefiting from advanced technologies like data analytics, AI, ML and DL. No business is safe from the wave of digital disruption.
Options for mining data are opening new opportunities that are making businesses smarter by bringing customers and businesses closer together. Companies must move fast, pick the right tool for the job, and focus on being the disruptor to avoid becoming the disrupted. Leading is easier from the front.
#2 – The Edge is Real
Ty Schmitt – Fellow & VP, Extreme Scale Infrastructure
Mark Bailey – Senior Distinguished Engineer, Extreme Scale Infrastructure
Alan Brumley – Senior Distinguished Engineer, Server Infrastructure Solutions, OEM Engineering
The expectations of IT hardware, software and datacenter infrastructure will continue to evolve in 2019. Large volumes of data ingest will require near- or real-time processing and will proliferate the concept and use cases of edge computing.
The definition of edge computing remains fluid. One perspective defines “the edge” as a vehicle to enable data aggregation and migration to the cloud. In this case, the data streams ebb and flow upwards from the location of creation/collection and finally reside in the cloud. IT hardware use cases are emerging to support this vantage point, requiring smaller form factors and more ruggedized solutions. Non-traditional deployment and maintainability environments will foster a new balance between critical hardware design considerations of power, thermals and connectivity.
An alternative perspective of “the edge” defines it as a means by which traditional cloud architectures of compute and storage are deployed closer to the consumers and creators of data, alleviating the burden within the cloud and the associated mechanisms of transport. The resulting geo-disbursement of compute and storage allows for new usage models that previously were not possible. Data collected can be analyzed locally in real-time with only the resultant data being sent to the cloud.
Each perspective on “the edge” is a reflection of your usage model, and you will ultimately define what challenge or new capability the edge represents to them. 2019 will usher in edge proof of concepts (POCs) as customers, edge hosting companies, real-estate owners, equipment manufacturers and IT innovators test business models and actively refine the new capabilities the edge affords them. Among them will be traditional collocation providers, new startups and large global infrastructure companies as they all seek to gain insight into what the edge solutions the industry will ultimately converge upon. New IT hardware, software and datacenter infrastructure form factors will be designed and trialed, allowing customers to test their solutions with small upfront capital expense. Small, self-contained micro datacenters will be deployed in efforts to enable traditional IT to be easily placed and operated closer to the data ingest or supply points.
Edge deployments will ultimately result in multi-tenant environments as initial private edge installations shift to allow for public workloads to cohabitate within the same environment. This will have a positive impact as multiple companies will require edge presence across a given geographic region, but their business models will not support the cost and complexity of a private installation. These hybrid edge deployments will allow heterogeneous solutions to work together to deliver better performance and satisfaction, while minimizing the burden on the upstream infrastructure.
The edge evolution will provide vast potential for how customers and providers use, analyze and distribute data. The creation of POCs in 2019 will allow all parties to vet and test new technologies and associated cost models. These findings will set the foundation for edge infrastructure and solutions going forward.
#3 – The Journey to Kinetic Infrastructure continues
Bill Dawkins – Fellow & VP, Server Infrastructure Solutions office of CTO
The terms “composable infrastructure” and “server disaggregation” entered the mindsets of many enterprise IT departments in 2018, as the industry made initial strides in developing the technologies that will make a fully composable infrastructure possible. Dell took a major step in our kinetic infrastructure journey with the availability of our new modular platform—the PowerEdge MX. The MX platform allows for the composition of compute and storage resources through its built-in SAS infrastructure. It is designed to incorporate new interconnect technologies as they become available. These are technologies that will enable full composability of all resources including resources with sub-microsecond latency requirements like Storage Class Memory (SCM). The MX’s unique “no mid-plane” design allows us to contemplate the incorporation of high-speed interconnect technologies like Gen-Z (check out our August 2018 blog for more). It is the natural platform for exploring the expansion of composability beyond storage in 2019.
Another key element in the kinetic infrastructure journey is the continued development of Gen-Z. To realize the vision of a fully composable infrastructure, a rack-scale, high-speed, true fabric is required to allow the composition of all components including SCM, NVMe flash devices, GPU and FPGAs. Gen-Z is the industry’s choice for this fabric. With over sixty Gen-Z Consortium members, 2019 will see more technology development and demonstrations.
While Gen-Z is critical for realizing a system where all components are composable, 2019 will also see the rise of technologies that allow the composition of certain classes of components. For example, NVMe over Fabrics will enable pools of NVMe SSDs to be dynamically assigned to servers within a data center while still maintaining low enough latencies to retain the performance benefit of these devices. 2019 will be a year of acceleration on the Kinetic Infrastructure journey.
#4 – Data Science business disruption leads need for AI/ML/DL
Jimmy Pike – Senior Fellow & SVP, Server Infrastructure Solutions office of CTO
Onur Celebioglu – Senior Distinguished Engineer, Server Infrastructure Solutions, HPC
Bhyrav Mutnury – Senior Distinguished Engineer, Server Infrastructure Solution, Advanced Engineering
In 2019, the boundaries of Information technology will be stretched to their limit as the “data creation” IT economy transitions to one of “data consumption.” Thus, as the volume and variety of data an organization needs to analyze grows, there will be an increasing need to utilize data enriched techniques like artificial intelligence/machine learning/deep learning (AI/ML/DL) to help transform this data into information.
The industry is in the midst of an undeniable deluge of data, which traditionally has originated within IT (i.e. traditional enterprise application and data centers), will begin to come from an ever-increasing number of sources that are extraordinarily diverse. Thus In 2019 the growth in demand for:
- People with expertise in applying these techniques to solve business problems;
- Advances and standardization in AI/ML/DL tools, methodologies and algorithms; and,
- Compute/storage/network infrastructure to run these workloads will be nothing short of amazing.
We already have seen the adoption of accelerators such as GPUs and FPGAs to handle this increasing demand for compute, and, this year, we will see more specialized software solutions as well as “purpose-built” ASICs that accelerate AI workloads. While providing more choice, this will make it more difficult for companies to pick which technology they need to invest in for sustained success.
In general, the undeniable effect of HPC (High-Performance Computing) will continue to impact the mainstream and stretch the performance limits of those seen in traditional batch-oriented scientific computing as well as enterprise solutions. The transition to a data consumption IT economy will create a greater focus on HTPC (High Through-Put Computing). As noted the limits of traditional deterministic computing will mandate a blend of both deterministic and probabilistic computing techniques (such as machine and deep learning). This addition to the IT tool chest will be used to help recognize and avoid circumstances where the application of computational resources to problems where “close” is good enough can occur (i.e.ML) and our traditional deterministic computing techniques can then be focused in areas where the return on their use is maximized.
In 2019, the continued growth of data (especially at the edge) will see the rise of ML-I (machine learning inferencing) as the first layer of data ‘pre-screening’ at its source. While the press associated with terms like hybrid cloud, AI, ML and edge computing will continue, the concepts by themselves will become increasingly less important as real solution providers seek to do the right thing at the right place, regardless of what it is called.
We believe 2019 will be the year of the ASIC for both training and inferencing. There will be a host of new solutions that burst on to the scene and, as quickly as they come, many will disappear. Many have realized the vastly larger market opportunity for inferencing as compared to the equally important model training activities enjoyed by GPGU providers. Several intend to take market share including companies like Graphcore with their accelerator for both training and inferencing, AMD with both their CPUs and ATI GPUs, and Intel with their CPUs and Nervana ML coprocessor. Fortunately, virtually all of the data science work takes place above several popular frameworks like Tensorflow or PyTorch. This allows providers to focus on the best underlying resources to these frameworks. Perhaps most importantly, we are already starting to see the beginnings of model transport and standardization where fully trained models can be created in one environment and fully executed in a completely different one. In 2019, more advances are expected to happen in terms of model standardization.
The next big challenge will be model validation and the removal of hidden bias in training sets, and ultimately Trust… i.e. how trust is described, measured, verified and, finally, how the industry will deal with indemnification. We already have seen the huge impact that ML has had on voice and image recognition and as a variety of “recommenders.” For most of 2019, we can expect these applications to continue in a “human-assisted decision support” role where there are limited consequences of incorrect conclusions.
#5 – The move from Data Creation Era to Data Consumption Era is leading to a silicon renaissance
Stuart Berke – Fellow & VP, Server Infrastructure Solutions, Advanced Engineering
Joe Vivio – Fellow & VP, Server Infrastructure Solutions, Extreme Scale Infrastructure
Gary Kotzur – Senior Distinguished Engineer, Server Infrastructure Solutions, Advanced Engineering
While general-purpose processor CPU advances continue to provide steady annual performance gains, the Data Consumption Era requires unprecedented computational and real-time demands that can only be met with innovative processing and system architectures. Currently, application and domain specific accelerators and offload engines are being deployed within CPUs, on Add-In Cards and IO & Storage Controllers, and in dedicated hardware nodes to deliver the necessary performance while optimizing overall cost and power.
Within traditional CPUs and System-on-Chips (SOCs), Instruction Set Architectures (ISA) are being extended to include optimized vector and matrix integer and floating-point processing, pattern searching, and other functions. Latest 10nm and below chip processes provide ample transistors to allow inclusion of numerous dedicated silicon offload engines that provide orders of magnitude performance improvement for functions such as encryption, compression, security, pattern matching and many others. And advances in multi-chip packaging and die stacking allow integration of multiple processors, memories such as High Bandwidth Memory (HBM) and other functions to efficiently process many operations entirely without going “off-chip.”
IO and storage controllers similarly are incorporating a broad set of dedicated silicon engines and embedded or local memories to dramatically reduce the load on the CPU. Smart NICs are evolving to include multiple microcontrollers, integrated FPGAs and deep packet inspection processors. And general purpose GPUs are scaling up to tightly interconnect eight or more modules each with terabytes per second of memory bandwidth and teraflops worth of processing power to address emerging edge, AI, machine learning and other workloads that cannot be met with traditional CPUs.
These innovative architectures are incorporating emerging Storage Class Memory (SCM) within the memory and storage hierarchies to handle orders of magnitude greater data capacities at significantly lower and more deterministic latencies. Examples of SCM expected in the next few years include 3D Crosspoint, Phase Change Memory (PCM), Magnetic RAM (MRAM), and Carbon Nanotube (NRAM). Processor local SCM will support terabytes of operational data and persistence in lieu of power loss, and storage systems will capitalize on SCM as primary storage, or optimized caching or tiering layers.
Finally, as traditional captive fabrication advantages are ending, as open manufacturing suppliers such as TSMC provide leading silicon process technology to all, innovation is accelerating across a wide variety of established and startup companies. A true Silicon Renaissance is underway to ensure that the computing demands of today and tomorrow continue to be realized at suitable cost, power and physical packaging.
#6 – Data: It’s mine and I want it back. On-prem repatriation is happening
Stephen Rousset – Senior Distinguished Engineer, Server Infrastructure Solutions office of CTO
As the cloud model continues to mature, companies are recognizing the challenges with a single cloud instance around public cloud and starting to repatriate data and workloads back to on-premises. While the rise of the public cloud highlighted some benefits to companies, there are challenges around loss of operational control, performance issues, security compliance and cloud/cost sprawl. With the growth of enterprise and mobile edge, a hybrid cloud model has quickly emerged as a much more appropriate solution for a majority of businesses. This data/workload placement transition, known as cloud repatriation, is seen in studies such as one from IDC (Businesses Moving From Public Cloud Due To Security, Says IDC Survey, Michael Dell: It’s Prime Time For Public Cloud Repatriation) that finds 80% of companies are expected to move 50% of their workloads from the public cloud to private or on-prem locations over the next two years.
One key driver to the listed reasons for cloud repatriation is the velocity and volumes of data generation, and with it comes the cost, control and containment of data. With the astronomical growth of data generated over the last two years, when a company needs to retrieve its data, even with public cloud companies lowering their data storage pricing in those two years, the real cost of data retrieval and access continues to increase as the data generation CAGR outpaces price reductions. This leads to what may be considered a philosophical discussion, but, with data all but locked in the public cloud due to the cost of export, there is an overriding conversation of who actually “owns” the data with some companies feeling like they are having to rent their own data versus having clear ownership of it. This concept creates a data gravity in the public cloud that is costing companies a tremendous amount of unexpected costs and accelerating the decision to take back control of their data and the workloads that use that data. Dell works with these customers to provide a breadth of infrastructure solutions to give them an optimized offering of data placement, protection and analytics to maintain true data ownership.
#7 – Blockchain can benefit Enterprise
Gaurav Chawla – Fellow & VP, Server Infrastructure Solutions office of CTO
Enterprises are always seeking ways and means to make their systems more secure and transparent. Blockchain could provide that underlying technology to build such solutions. The origin of blockchain dates back to Oct 2008, when first white paper was published for a “peer-to-peer electronic cash system” and gave birth to the Bitcoin digital currency.
The last decade leading up to 2018 saw a lot of hype and activity in the area of crypto-currencies and ICOs (Initial Coin Offerings). Like other early stage high impact technologies (e.g. AI/ML/DL, Edge Computing/IOT, 5G), we have seen both perspectives where in some technology enthusiasts see blockchain as the holy grail of decentralized identities, decentralized trusted compute and next generation Internet 2.0, while others have skepticism about blockchain just being a distributed database.
In 2019, we will see it pivot to an increased activity in the area of permissioned blockchains and its ability to address enterprise workflows and use cases. In essence, this is about applying distributed ledger, which is the underlying technology for blockchain, to enterprise workflows. We will see it move into real PoCs to deliver on this promise of distributed ledger (DLT).
Some of the initial use cases may focus on improved implementations for audit/compliance in enterprise use cases or enable secured sharing of information across multiple parties (competitors, vendors/suppliers and customers). These implementations will drive an increased industry collaboration on blockchain-based architectures and give rise to consortiums focused on specific industry verticals: finance, logistics/supply-chain, healthcare and retail just to name a few. These projects will drive DLT integration in brownfield deployments and will use a combination of off-chain and on-chain processing.
Most of the data will be stored off-chain on existing compute/storage, and information will be brought on-chain where blockchain properties of immutability, security/encryption and distributed database provides benefits. Smart contracts will play a key role, and multi-blockchain architectures will start to evolve. We also will see increased momentum of DLT for integrations with emerging use cases in IoT and AI/ML/DL. To be successful, implementations will need to pay close attention to real benefits of blockchain and integration aspects.
At Dell Technologies, we support both VMware Blockchain, based on open source Project Concord, and other open source blockchain implementations. We look forward to taking these blockchain projects to the next level of implementations and consortium engagements.
#8 – Security threats are the new exponential
Mukund Khatri – Fellow & VP, Server Infrastructure Solutions, Advanced Engineering
It can be hard for one to fathom what “worse” could mean after the barrage of high impact vulnerabilities and breaches we experienced last year. 2019 will see yet another year of the exponential growth in security threats and events, led by combination of broadened bug bounty programs, increasing design complexity and well-funded, sophisticated attackers.
Staying current with timely patch management will be more critical than ever for enterprises. There will be broader recognition of the critical need for cyber resiliency in server architectures, as currently available in Dell PowerEdge, that provide system-wide protection, integrity verification and automated remediation. While impregnable design is a myth, effective roots of trust and trustworthy boot flows will be needed for the compute, management and fabric domains for modern infrastructure. There will be enhancements to monitoring and remediation technologies that must evolve using AI and ML to enhance the security of their systems.
In 2019, supply chain concerns will be top of mind for all IT purchases. As seen recently, breach in supply chain can be extremely difficult to detect, inclusive of hardware and software, and implications can be catastrophic for all involved. One of the key objectives this year will be holding a successful intrusion harmless. In other words, if someone can get into the platform, making sure they cannot obtain meaningful information or do damage. This will drive innovations delivering more intense trust strategy based on enhanced identity management.
Identity at all levels (user, device, and platform) will be a great focus and require a complete end-to-end trust chain for any agent that is able to install executables on the platform and policy tools for ensuring trust. This will likely include options based on blockchain.
Greater focus on encryption will emerge, requiring any data at rest to be encrypted, whether at the edge or in the datacenter, along with robust encryption key management. Secure enclaves for better protection of secrets is another emerging solution space that will see more focus. Regulations to protect customer data, similar to EU’s General Data Protection Regulation (GDPR), California’s Consumer Privacy Act (CCPA) and Australia’s encryption law, also can be expected to increase thereby driving compliance costs and forcing tradeoffs.
And, finally, newer technologies like Storage Class Memory (SCM), Field Programmable Gate Arrays (FPGA), Smart NICs, while all critical for digital transformation, will bring their own set of unique security challenges. For 2019 and the foreseeable future, the exponential trajectory for security threats is here to stay.
#9 – Is OpenSource “the gift that keeps giving?”
Stephen Rousset – Senior Distinguished Engineer, Server Infrastructure Solutions office of CTO
Shawn Dube – Senior Distinguished Engineer, Server Infrastructure Solutions, Advanced Engineering
The adoption and proliferation of open source software (OSS) has created communities of creativity and provided knowledge leverage across many disparate fields to provide a vast selection of offerings in the IT ecosystem. This continued broadening of open source choices and companies’ unyielding desires to reduce expenses has accentuated the appeal of “free” CapEx with open source at the C-Suite.
But the realization for companies around open source is showing that the “free” of open source is not free as in beer, but free as in a free puppy. A free beer is quite enjoyable on a hot Texas summer day, and although a free puppy can also bring a different type of enjoyment, it does require a significantly more amount of attention, care and ongoing expense to keep it healthy and out of trouble. Not a lot of planning needs to go in to consuming a free beer, but taking on a free puppy does require real planning around time and money.
Dell has always supported OSS and remains very bullish on the open source community, but Ready Solutions that are built, delivered and working are resonating more than a DIY model. While open source can initially look very appealing, an open source DIY model requires retaining the right (often hard to find) skillsets in your company, the diligence in selecting the right parts and pieces to integrate together, and, of course, the continued maintenance of all those integrated pieces. We have seen numerous customers having to reset their strict DIY model and look to alternative ways to achieve the high-level business objective. Dell recognizes the desire for customer choice and has put together a portfolio of options ranging from a fully supported Ready Solution of open source packages to address customer workloads to highly optimized engineered solutions leveraging open source or partnered packages.
#10 – Telemetry will bring new levels of intelligence to IT
Elie Jreij – Fellow & VP, Server Infrastructure Solutions, Advanced Engineering
Jon Hass – Senior Distinguished Engineer, Server Infrastructure Solutions office of CTO
Optimizing and making IT operations more efficient is a goal every enterprise shares. One of the means to accomplish this goal is to muster more telemetry data on hardware and software infrastructures for use by management applications and analytics. As the need for acquiring telemetry data increases, collection methods need to be improved and standardized. This has been recognized by Dell and the DMTF standards organization, which recently released the Redfish Telemetry schema and Streaming & Eventing specifications. These specifications simplify and make the data collection task more consistent across infrastructure components and enable data analytics applications to focus on the data content without having to deal with multiple collection methods and formats.
IT infrastructure components have a variety of interface mechanisms and protocols that vary widely between devices (ex. Modbus, I2C, PWM, PECI, APML…). A local management controller or instrumentation can collect telemetry data using device and component specific protocols and then stream the data in standardized formats to remote telemetry clients and analytics applications. Examples of local manageability controllers include IOT gateways, Service Processor or Baseboard Management Controllers in IT equipment, or other controllers inside or outside the data center. Factors to consider when planning telemetry utilization include bandwidth, security, consistency, latency and accuracy.
While there has been a lot of focus on application specific telemetry, such as face recognition or customer shopping patterns, expect a new focus on IT infrastructure telemetry. This will allow smarter management of the compute, storage, networking and related software infrastructures. Streaming consistent and standardized telemetry information about the infrastructures will enable analytics applications to optimize manageability and deliver automation such as predictive failure and network intrusion detection, and run the infrastructure more effectively. These features become more important as IT infrastructure characteristics evolve, and aspects like energy efficiency, edge deployment and reduced total cost of ownership continue to be prioritized.