Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1244

March 4th, 2013 03:00

Guest level monitoring in vSphere environment

Hi Folks,

I have a vSphere 5.1 environment, and monitoring all ESXi Hosts, vCenter, VMs running using EMC SMARTS AM-PM, ESM and alerts directed to Global Console. 

Recently I saw Alerts were not being reported correctly for some the changes happened in the infrastructure. So would like to check few things, could you please help me  find answers to these? Where can I find the licensing info for each Manager, ESM, AM-PM etc, especially if I want to know expiry dates, licensing features etc. And finally, what attributes I should enable to monitor these events?

  1. Changes in VM network
  2. Guest monitoring -- if guest operation system crashed
  3. Network isolation - network is not reachable
  4. Guest Filesystem - no free space left (any thresholds??)
  5. Resource consumption, CPU, Memory etc - high cpu and memory usage (any thresholds??)

Thanks,

/T

March 4th, 2013 06:00

Hello Vtrack,

If you would like to see if certain features have expired then you would need to check the FLEX LM log that should be written to the  /smarts/local/logs directory of the Smarts install where you are running your license server from.  This should tell you if there are any expired features.  The other choice is to open the license file and look for the expiration dates on the features to see when they expire.  If you have purchased permanent license then you will not see an expiration date on those features but should see a keyword along the lines of PERMANENT for the feature.

We use internal codes for the features and if you see any that are expired that you want to know what they do you will have to ask us so we can try and get you that information.

To find out exact which domains are using which licenses you can run the following command against each domain:

./dmctl -s getInstances InCharge_Feature


This will list all the license features that have been requested and granted to the domain.  If you want further details on these license features you can use the following command:


./dmctl -s get InCharge_Feature::Feature-


To answer your questions:

  1. Changes in VM network

    ESM should be able to monitor the Virtual Switch and VMKernalPorts, if this what you are asking.  You would need to provide me with some more specifics if this doesn't answer your question.

    In terms of virtual machines, the ESM software will monitor many aspects of the VMWare deployment and alert any changes such as VMs being added, deleted, or migrated.  You can actually view a list of these event in the documentation "ESM User and Configuration Guide.pd" on pages 41 and 42.  I will attach this documentation to the post.

  2. Guest monitoring -- if guest operation system crashed

    The ability to tell if a guest OS has crashed is limited.  If the guest OS crash causes the guest OS to reboot and load again then it may trigger a GuestOSNotRunning alert.  If the guest OS is just stuck in a zombie state and is completely unresponsive then there really is no way for the system to know that.  If there was an efficient way to detect that the OS was in a zombie state it would likely have been built into the OS itself as to remedy a crash with some sort of automatic tool.  The major exception I can see to this would be the Blue Screen of Death on MicroSoft Windows operating systems, but I still do not believe that the ESM software nor the IP software can detect these conditions, that would be a bit advanced for the software and would require the ESM software to to run some sort of process on the VMs to test for these issues. 

    The VMWare software may actually have the ability to test the guest OS for a crash or zombie state but that is beyond my field of knowledge.  If it does, it may have some sort of auto-recovery tool that would restore/reboot the OS.  If it does have these types of capabilities it is my suspicion that in restarting/restoring the guest OS the ESM software would alert a GuestOSNotRunning.

  3. Network isolation - network is not reachable:

    If the network is being monitored by the Smarts IP availability manager then you should be able to see when section of network is down if it is a complete partition. 

    If is not a complete partition but a subset of a partition then one of your options is to create a Service Offering in the SAM console.  This Service offering should be named appropriately for what it is monitoring then you can add subgroups to that Service offering to define the specifics so that it can be monitored and alerted on by the SAM.  The specifics could a section of the IPNetwork, or it could specific routers/switches that are the entry points into that section of network.  You can locate the necessary interface in the console to do this by logging into the SAM console and going to Configure -> Groups in the menu bar.

  4. Guest Filesystem - no free space left (any thresholds??)

    If you have both the Smarts ESM and the Smart IP Performance Manager software monitoring the VMs then you will be able to get alerts from both domains on that status of the file system and the space available.  These does depend on if the file system has the correct SNMP agent installed.  If you are on ESX 3.5 you need to run two agents, FMWare agent and NetSNMP agent, if you are on ESX 4.0 and above you should run only the VMware Agent, if you have both running on ESX4.0 you may have issues properly monitoring your VMs.  This note about using only one agent for ESX4.0 can be seen on page 63 of "ESM User and Configuration Guide.pdf" which is part of the documentation portfolio that I will attach to this post.

  5. Resource consumption, CPU, Memory etc - high cpu and memory usage (any thresholds??)

    The answer for this questions is exactly the same as the answer for question number for above.  If you are monitoring the VM hosts using the ESM or Smarts IP Performance Manager software then you should get alerts on resource consumption.


Does this answer your questions Vtrack?  If so please mark this post as answered, if you have any follow-up questions please feel free to ask and we will answer them to the best of our abilities.

Cheers,

Sean

Sean Mackinnon | EMC ASD - Smarts |  Monday - Friday 8:30 - 16:30 EST/EDT | Hopkinton, MA, USA

1 Attachment

22 Posts

March 6th, 2013 04:00

Hi Sean,

This really helps, thanks much for your answers. Few updates -

For 1: Changes in VM network 

I meant here about issues with VM/Guest network. I had a network failure occurred on a Virtual Machine, and did not see any Alerts in Global Console. I tested this manually by a network stop from within the Guest and found no Alerts reported. Is that expected? But any network configuration changes from vCenter triggers an Alert, may be that was translated from vCenter event. By the way, is it possible to change the severity level of an event? For example, ESXpoweroff (Host shutdown) was reported a minor event, but expected to be critical.

For 3: I will test creating a service group, this is definitely useful in the environment.

Thanks again for the user guide, and will refer for the requirements.

/T

March 6th, 2013 08:00

Hello Vtrack,

The ESM software should monitor the VMware virtual network that is running on the deployment of your VMware.  As for why it did not alert on a network failure for a virtual machine, it would depend on what that network failure was.  I am not 100% sure how a network failure can occur on a VM unless there is an issue on the Guest OS that causes the network to be unresponsive.

For example if you have a Guest OS of Windows installed on the VM and you have a network failure due to the Windows services that manage network connection within windows crashing then I  believe there is a possibility of not alerting on the failure.  If the host was discovered through monitoring the VMWare environment and was not discovered in the Smarts IP software then all the information for monitoring the host would come from the VMWare and be processed using the ESM software.  If the Virtual hosts Guest OS had an SNMP agent and that host was discovered into the Smarts IP domain then you would get a means of monitoring that host that would be practically independent from the VMWare monitoring.

If you then have a failure of the networking on the Guest OS level while the host is being monitored via the Smarts IP software then it should throw an alert as soon as the SNMP agent stops responding to the Smarts IP queries or as soon as the host stops responding to the ping requests made by the Smarts IP software.  If you have a failure on the VMWare networking side then you would get an alert from the ESM of a failure on the network or change in the network, and you would get an alert from the Smarts IP regarding the host(s) that are affected negatively in terms of the IP address being used to monitor those hosts.

Does this answer your question in regards to the network failure you saw?  If you can give me more details about the network failure that occurred on your VM that was not alerted on I can see if I can find out why you may not have received an alert.  If it was on the Guest OS level and not the actual VM level then I am pretty sure it would fall under the example I gave above.

In regard to change the severity level of an alert you will have to log into your SAM domain host and make some configuration changes and then restart the SAM domain.  The first step that you need to take is to determine which configuration file you need to modify.  This depends on where the alert is coming from.  In the example you gave, ESXpoweroff (Host shutdown), would be an alert that comes from the ESM domain.  You need to go through the following steps in order to determine which configuration file you need to modify.

  1. Log into the Smarts SAM console software.
  2. Click on Configuration -> Global Manager Administration Console
  3. Expand ICS Configuration.
  4. Expand IC Domain Configuration.
  5. Expand Domain types and select the domain that belongs to the domain the alert is coming from (in this example INCHARGE-ESM-SUITE).
  6. In the right pain window look for the value in the drop down box DXA Configuration file (in this example the vale is dxa-esm.conf).

Now that you know which file you need to modify you will have to navigate to the /smarts/bin directory of your SAM install and execute the command ./sm_edit conf/ics/ (in this example the command would be ./sm_edit conf/ics/dxa-esm.conf).  Once you have opened this file for editing (you will need administrative privileges) you will have not navigate through the file and find the severity setting for the alert you want to modify.  Near the bottom of the file you will find the severity levels for the alerts that are generated by the Smarts SAM software for events being reported by the Smarts ESM software.

In the newer 9.1 software and the 9.2 software there is a line that reads:

sev VMwareESX  ESXShutDown Availability 1

Which would give a Critical alert for the ESX server being shutdown.

While in the older software that particular setting doesn't exist.  In the older settings the only values that I can see that would possible be used would be:

sev  VirtualMachine PoweredOff Operational  3

or

sev VMwareESX  ESXNotHosted   Discovery   3

Both of these alerts are minor in their nature due to the 3 value being assigned to them.  The Event name in the should match the 3rd entry in the severity definition line (ESXShutDown, PoweredOff, ESXNotHosted, from the examples above).  Please review your configuration files and compare the Name of the alert to the values to find what you need to adjust.  In the file there is a translation table that lets you know which number equals which severity.

If you have any other questions Vtrack please feel free to ask and we will try at answer them to the best of our abilities.  Thank you very much for your time.

Cheers,

Sean

Sean Mackinnon | EMC ASD - Smarts |  Monday - Friday 8:30 - 16:30 EST/EDT | Hopkinton, MA, USA

No Events found!

Top