現象
It may happen that some ESXi hosts show as unresponsive in vCenter. Rebooting the host may resolve the issue temporarily, however, after several day, the issue reoccur. This issue only happen on Dell PowerEdge 14G servers with iDRAC9.
In TSR log, message like,
2019-06-04 15:26:05 ISM0049 The iDRAC Service Module (iSM) is unable to communicate to the iDRAC because the client certificate is either unavailable or invalid.
In vmkernel.log,
2019-06-04T02:05:56.920Z cpu61:2105520)WARNING: VisorFSObj: 1576: Cannot create file /etc/cim/dell/srvadmin/iSM/ini/tttttttttttttyZxIL9 for process sfcb-dcism because the inode table of its ramdisk (etc) is full.
In hostd.log,
2019-06-02T13:39:59.688Z error hostd[2105490] [Originator@6876 sub=Libs opID=e4a0107a-853b-11e9-f2a3 user=dcui:vsanmgmtd] VsanUtil: Failed to lock esx.conf /etc/vmware/esx.conf.LOCK.2104629: symlink failed: No space left on device
In idrac gui,
原因
iDRAC9 v3.30.30 introduced a mandatory requirement to create a secure TLS channel with iSM v3.4.0-1471 or newer.
Dell Engineering has identified a scenario where a memory leak occurs if the iDRAC9 has not yet negotiated this secure TLS connection if iSM v3.4.0-1471 was installed or upgraded before iDRAC firmware was upgraded. The leak eventually also causes loss of kernel inode count because of a flood of temporary INI files created in /etc/dell.
VxRail SW releases 4.5.400, 4.7.200 and above integrated iSM v3.4.0-1471. A workaround to prevent this issue was added to 4.5.400 and 4.7.212. 4.7.210 is not impacted because it is a manufacturing only release so no upgrades to it. Therefore, the VxRail 4.7.200 and 4.7.211 releases are most likely to encounter this issue
Please note it is also possible to encounter this issue if the system board has been replaced due to hardware fault. This applies to nodes running 4.7.2xx [including 4.7.212 and future code]
解決方法
Reboot ESXi host if it already shows as unresponsive in vCenter.
Reinstalling iSM can trigger the secure TLS channel to be renegotiated with iDRAC9 and resolve the issue from happening again.
On the affected ESXi hosts, run the following commands to reinstall iSM,
esxcli software vib remove -n dcism
esxcli software vib install -d <path to iSM VIB>
If there is no inode available in the esxi, you may remove unnecessary files first because this issue can also cause running out of inode.
ls -l /etc/cim/dell/srvadmin/iSM/ini/
rm -f /etc/cim/dell/srvadmin/iSM/ini/tttttt*
If the system board has been replaced due to hardware failure, the above resolution steps also apply
対象製品
VxRail Appliance Series
製品
VxRail Appliance Series