Start a Conversation

Unsolved

L

6 Posts

768

September 1st, 2020 03:00

VxRail manager reports lost nodes and missing disks, also vspexblue folder missing on lost nodes

Hello,

I inherited the administration of a VxRail appliance after switching jobs.

It's based on a QuantaPlex T41S-2U.

I'm new to VxRail and vSan and the first thing I'm trying to do is to clear all the alarms in VxRail Manager.

Out of the 4 nodes, 3 nodes are reported missing (Node 4 is still ok) also some disks are reported as missing too.

vxrail_errors.png

I investigated and managed to understand how VxRail manager monitors the nodes (correct me if i'm wrong):

There's normally a vspexblue folder in /tmp on each esxi nodes.

[root@vxrail-esxi-04:/tmp/vspexblue] ls -al
total 64
drwxr-xr-x 1 201 201 512 Sep 1 10:33 .
drwxrwxrwt 1 root root 512 Sep 1 10:34 ..
lrwxrwxrwx 1 root root 8 Jun 26 2019 3.sh -> schedule
drwxr-xr-x 1 201 201 512 Jun 26 2019 bin
drwxr-xr-x 1 201 201 512 Jul 8 2019 logs
drwxr-xr-x 1 201 201 512 Jul 8 2019 outputs
-rwxr-xr-x 1 201 201 2750 Jul 13 2017 schedule
drwxr-xr-x 1 201 201 512 Jul 8 2019 scripts
-rwxr-xr-x 1 201 201 21440 Jul 13 2017 shutdown_ESX.py
-rw-r--r-- 1 root root 14 Jun 26 2019 vbm_hosts
-rw-r--r-- 1 root root 16 Jun 26 2019 version











I see this folder and its content on Node 4, but the folder is missing on Node 1,2 and 3.

Inside this vspexblue folder we can find scripts that will gather various ESXi metrics via ipmi.

A general script called '3.sh', which is actually a symbolic link to another file called 'schedule' is scheduled to run every 3 minutes:

*/3 * * * * /tmp/vspexblue/3.sh >> /tmp/vspexblue/logs/3.log 2>&1

Metrics collected are stored in the outputs folder and I guess VxRail manager collects these metrics somehow, process them and display info on the Health page.

Back to my problem, I restored the vspexblue folder on Node 1 and modified /var/spool/cron/crontabs/root as it was missing the cron job for 3.sh. Killed the current crond process and started it again.

The cron job is running, I see it with:

grep cron /var/log/syslog.log

However, no metrics are collected and all the metric files in /tmp/vspexblue/outputs/3 do not get updated.

It's the same if I call the script manually with /tmp/vspexblue/3.sh

Additionally, after rebooting Node 1, vspexblue folder goes missing again as well as the cron job for 3.sh.

Do you have any idea how I can permanently restore the ipmi monitoring on my VxRail hosts?

Thx!

No Responses!
No Events found!

Top