Start a Conversation

Unsolved

This post is more than 5 years old

4762

January 23rd, 2018 19:00

How often does VxRail Manager poll hardware information, and can it be changed?

I notice that the VxRail Manager does not display hardware error immediately.

I tried to unplug one power cable from node1. Nothing shows on VxRail Manager. iDRAC immediately logged power event and sent email notification. vSphere Web Client shows alert and warning for hardware after 2 mins.

Then I waited for 5 mins to unplug a power cable of node2. Then waited another 5 mins to unplug a power cable of node3.

Finally 2 mins after I unplug node3, VxRail Manager logged critical events for all 3 nodes' Power Supply. The events time are all the same, instead of logged the actual time I unplug the cable.

8 Posts

January 25th, 2018 22:00

I think that VxRail Manager is acquiring hardware information on a node every few minutes using ipmi.

So VxRail Manager does not immediately reflect the failure that occurred on the node.

7 Posts

February 1st, 2018 18:00

The delay seems to be around 10 minutes. And I have seen delay of 20 minutes.

Any idea if it is possible to change its settings so it update the information more frequently?

1 Message

February 7th, 2018 02:00

Hi,

From what i see vxRail manager scans randonly on the time (there isa thashold i thing when vxRail asks hosts about health status). I made several tests with power calbles also and I didn#t get the way how often report goes to vxRail from Host. vSphere console reacts faster. For my hardware I decided to use SNMP ans asks iDrac every minute what is system health.

I have other problem (maybe is not connected with refresh time, but that can put some light on communication betweenvxRail and host): many times I get vxRail critical error (no power supplies or Hdd's) but device is in good condition. On vSphere there is no warnings. When my vxRail informs me about small disaster also SMNP sensor goes to offline. It looks like source of problem is iDRAC /ipmi. Perhaps your problem has this same source problem: something wrong not on vxRail but on the host management software.... I think - after my tests and problems - most reliable informations about host health are on the vmware console right now.

Regards,

7 Posts

February 7th, 2018 07:00

You described exactly what I am seeing:

  • VxRail Manager collect health status randomly
  • VxRail Manager reports critical errors randomly (mostly power supplies, sometimes  HDDs)

There is no problem for Nagios monitoring those hosts. No error/warning/etc.

February 7th, 2018 19:00

  • For Quanta models, local scripts are executed every 3 minutes, and the results are sent to VxRail Manager via scp. VxRail Manager will also try to parse the output every 3 minutes (not in sync) and update in DB. So worst-case-scenario is 6 minutes plus execution time
  • For Dell models, the cycle is almost the same. PTAgent queries iDRAC for H/W info every 3 minutes, and VxRail Manager parses the result every 3 minutes. However, the query to iDRAC takes over 1 minute to finish. So worst-case-scenario is 7+ minutes plus execution time.
  • For Dell models losing power, this will be more complicated because we try to contact PTAgent but not response. There will be complicated retry mechanism, until we finally believe that PTAgent is gone and cannot be brought back by restarting the service. And then, we finally regard the host as gone, and this may take up to 10 minutes.

Technically, you can change the frequency of script(on Quanta) or PTAgent (on Dell) service in crontab, but I won't recommend it in customer environment. And you cannot change the frequency of VxRail Manager parsing.

No Events found!

Top