VxRail: Node Health-check Fails for Test 'maint_mode'
Summary: The 'maint_mode' check parses the host summary data, to check if maintenance mode is enabled on each node.
Symptoms
VxVerify runs several tests on each node by uploading a 'minion' heath-check program, which detects issues that may cause upgrades to fail.
The 'maint_mode' check parses the host summary data, to check if maintenance mode is enabled on each node. Having a node in Maintenance mode, without having fully evacuated the data from the node, would put some vSAN objects at risk if there were any reboots, which occur in an ESXi upgrade.
This check also runs the vSAN evacuation precheck, to see if a node could be successfully evacuated (that is when being put into Maintenance mode).
The results of this health-check are listed as a health-check with one of the following results:
|
Test Result
|
Result code
|
Result Interpretation |
|
Pass
|
0
|
Node not in Maintenance mode |
|
Warning
|
1
|
The predicted outcome of an evacuation is not 'success'. |
| Failure | 2 | Node already in Maintenance mode |
| Critical | 3 | This test has no critical result. |
Each test that passes is not listed in the summary report, for ease of reading.
An example of the health-check output is shown below:
#========================#======#=========#====================================================================#==============# | Hostname / Category |Status Dell_KB | Warnings or Failures, unless tests Passed ; Product S.N. | #========================#======#=========#====================================================================#==============# | node03 | Failure 43141 | maint_mode: Maintenance mode enabled .|
Cause
The 'maint_mode' test looks at whether the node is in Maintenance mode using command:
/bin/vim-cmd /hostsvc/hostsummary
If this shows that the node has the 'inMaintenanceMode' status as True, then this health-check will report a Failure result.
This check also runs:
esxcli vsan debug evacuation precheck -a ensureAccess -e localhost
The command above examines what it takes if the 'localhost' is evacuated, using the 'ensureAccess' option. The result is accurate when all hosts in the vSAN cluster are of the same version and have the same disk format.
This prechecks that the node can be successfully evacuated and listing how much data there is to evacuate and the impact this will have on vSAN objects. These results can be seen in the VxTii report (vxtii.txt), which is produced when VxVerify runs:
======================================================================================================================== Maint-mode Evacuation_Data Prediction DU_objects At_risk_obj KB 43141 ------------------------------------------------------------------------------------------------------------------------ No 83.26 GB Success 0 74 ========================================================================================================================
Where:
Maint-mode = ESXi Maintenance mode status for this node.
Evacuation_Data = The amount of data that would have to be moved in order to keep all vSAN objects accessible.
Prediction = The node can enter Maintenance mode using 'ensure accessibility' option.
DU_objects = The number of objects that would become inaccessible if the node enters Maintenance mode.
At_risk_obj = Number Of Objects That Would Have Redundancy Reduced if the node enters Maintenance mode.
For more information about the VxRail Triage report (VxTii), see:
Dell VxRail: Hardware Troubleshooting using VxRail Triage Reports (VxTii)
Resolution
A node is already in Maintenance mode.
- The reason for Maintenance mode being enabled must be investigated and fixed.
- If possible, disable Maintenance mode on the node and check that it comes back online.
- If the node cannot be taken out of Maintenance mode, inform the customer that the VxRail upgrade cannot proceed while a host in Maintenance mode.
- Retry the health-check, to make sure that there are no further issues.
- Proceed with, or Retry, the upgrade.
If there are extenuating circumstances for keeping the node in Maintenance mode, and the upgrade must proceed, escalate to Dell Support for review.
Predicted unsuccessful evacuation outcome.
If a node is warning for a predicted unsuccessful evacuation outcome (i.e. the node cannot go into Maintenance mode), run the following command on a node:
esxcli vsan debug evacuation precheck -a ensureAccess -e localhost --verbose
This command will give details about any objects that would become inaccessible were the node to go offline or be forced into Maintenance mode. Action that could be taken to correct this includes increasing the 'Failures To Tolerate' setting (FTT), on the Virtual Machine (VM) that owns each listed object.
Additional Information
See the minion logs for more results on this failure, as detailed in this article:
VxRail: Troubleshooting when VxVerify reports an error