Symptoms
When entering Maintenance Mode, DRS can migrate Virtual Machines automatically to other compatible hosts in the Cluster.
However, when an ESXi Host with vGPU Virtual Machines is put into Maintenance Mode, the "Enter maintenance mode" task does not complete with failure events:
DRS failed to generate a vMotion recommendation for a virtual machine on a host entering Maintenance Mode.
Cause
DRS does not automatically migrate vGPU Virtual Machines when a ESXi Host enters Maintenance Mode due to workload disruption from long Virtual Machine Stun Times.
Resolution
Manually remediate this issue by manually migrating the ESXi Host’s vGPU Virtual Machines.
Workaround:
In vCenter Server 7.0 Update 3f and vSphere 7.0.3 or later, a DRS Cluster Advanced Options override was added to provide Virtual Infrastructure Admins a way to OPT-IN to automated evacuation of vGPU Virtual Machines.
- Option: VgpuMMAutomationTimeoutSecs
Value: -1
The above override comes with the following behavior changes:
- Evacuation of vGPU Virtual Machines is automated, subject to the 100 second vMotion timeout.
- During Switchover, a vGPU Virtual Machines Stun Time may exceed 10 seconds (depending on both network bandwidth and the size of the vGPU profile).
- Evacuation of Virtual Machines is serialized to avoid network contention.
Requirements:
- Extra vGPU host capacity in the DRS cluster
Example: Duplicate host configuration for the host going into Maintenance Mode
- No compatibility issues reported for the VMs on the host going into Maintenance Mode.
Affected Products
VxRail, VMWare Cloud on Dell EMC VxRail E560F, VMWare Cloud on Dell EMC VxRail E560N, VxRail Appliance Family, VxRail Appliance Series, VxRail G Series Nodes, VxRail D Series Nodes, VxRail E Series Nodes, VxRail P Series Nodes, VxRail S Series Nodes
Products
XC Core Systems, XC Series Appliances, VxRail Software, VxRail V Series Nodes, VxRail VD Series Nodes