Metro Node: How to collect logs from the Metro Node

Summary: This article outlines the steps on how to collect logs from the Metro Node, and also covers what logs/data may be needed for a performance issue.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Check out other resources

Instructions

Steps on how to accomplish the following tasks:

What logs are required to debug Metro Node problems?
How do I capture collect-diagnostics on a Metro Node cluster?
How to validate the existing collect-diagnostics packages on the management servers.
How to abort and clean up an on-going collect-diagnostics in Metro Node?

Note: If SupportAssist logs are needed, refer the customer to KBA 000135669, "How to export a SupportAssist Log Collection from SupportAssist Enterprise? Connected or Disconnected "

A. What logs are required to debug Metro Node problems?

The command that is needed to collect logs from the Metro Node is called "collect-diagnostics" and can be run from any^[1] node in the Metro Node setup. Executing this command on one director of a Metro Node cluster should have all the data from all directors, from all nodes of a cluster. DO NOT run this command on more than one node at a time.

[1] NOTE: Execute the 'collect-diagnostics' command from only one director, on only one cluster if a Metro configuration, and wait for its full completion before gathering the CDs from another director or from the peer cluster, if needed.

The 'collect-diagnostics' command produces a compressed tar.gz log file containing configuration and log files. The collect-diagnostics file is placed in the /diag/collect-diagnostics-out/ directory on the node that it was run from. Once the command finishes, use WINSCP, or an equivalent SCP utility, to copy the file off the node and then it can be provided to support for analysis. There is more info on the use of this command in section B below.

Notes:

If the 'collect-diagnostics' command is run with no options, two files are generated, a base file and an extended file. This can take quite a long time on scaled systems.
Metro Node support generally requires only the Base file, however, in some circumstances, performance issues they may ask for the extended file as well.
Standard options that may be asked to be used when running collect-diagnostics are,

"--noextended," this option omits collection of extended diagnostics.

"--last-logs," this option captures logs back x number of hours or days.

For more details on the command you can type "collect-diagnostics -h"

These are samples of what these two filenames look like, the date and time, shown as YYYY-MM-DD-HH.MM.SS, will be from the date and time these were collected:

Base file - <Serial number>-c1-diag-YYYY-MM-DD-HH.MM.SS.tar.gz
Extended file - <Serial number>-c1-diag-ext-YYYY-MM-DD-HH.MM.SS.tar.gz

Performance issues are complex and require a lot of specific information to be gathered. As a result, we have a performance questionnaire which customers are requested to fill out to expedite this process. The questionnaire can be found attached to this knowledge base article in the attachment section at the end.

In some types of performance issues, it is helpful to capture an additional log called "fe_perf_stats.” The logs are continuously generated, but not captured by collect-diagnostics. To capture this log, cd (change directory) to /var/log/VPlex/cli on a node from each cluster and run the command "tar cvzf fe-perf-stats.tar.gz fe_perf_stats*" to compress the data of the files into a tar file. Connect to the node with WINSCP, or an equivalent SCP utility, and navigate to /var/log/VPlex/cli. Copy the "fe-perf-stats.tar.gz" file to your system. Upload the tar file along with the collect-diagnostics file(s), if requested by support, to the SR or an ftp link support will provide to you in the SR and an e-mail.

In addition to collect diagnostics, it may be helpful to capture the following information;

open logging for a putty session,
then run the commands below,
then collect the Putty log and download to your system,
then attach the PuTTY log, the collect-diagnostics,
and any other data requested to the SR.

The following commands are to be run from the VPlexcli prompt.

cluster status
ll clusters/**/storage-views/* --full
ll ~ports
show-use-hierarchy /clusters/**/virtual-volumes/*
ll ~system-volumes
ls -t /clusters/*/directors/*::serial-number [this command will list out all the DSTs for each node]
ls -t /clusters/**/director-*/::hostname [the hostnames displayed will be the IP Addresses, this is expected]

B. How do I capture collect-diagnostics on a Metro Node cluster?

Note: The base file, covering the last 30 days, is sufficient to investigate and resolve most issues. These options should be used unless instructed otherwise by support.
To capture this data, run a collect-diagnostics command with the following flags "--noextended" and "--last-logs 30d."

Establish an SSH session at a director node Linux prompt, example, service@director-1-1-a, then log into the vplexcli.

Sample output:

login as: service 
Keyboard-interactive authentication prompts from server: 
| Password: 
End of keyboard-interactive prompts from server 
Last login: <date and timestamp data> from x.x.x.x
service@director-1-1-a:~> 
service@director-1-1-a:~> vplexcli 
Trying ::1... 
Connected to localhost. 
Escape character is '^]'. 
 
VPlexcli:/>

To start the collect-diagnostics, from the vplexcli prompt run the "collect-diagnostics" command with the directed options as shown in the example below.

Example Output:

VPlexcli:/> collect-diagnostics --noextended --last-logs 30d 

('WARNING:The collect-diagnostics command was issued with option --noextended.\n',) 

The following file(s) will NOT be collected: 

        core files 
        fast trace dump files 
        slow trace dump files 
        udcom trace dump files 
        udcom legacy trace files 
        user-defined performance sink files 
        the management console's heap 

('WARNING:Only the logs that are generated in the last 30 days are collected.') 

2024-02-09 19:55:12 UTC: ****Initializing collect-diagnostics... 
2024-02-09 19:55:13 UTC: No cluster-witness server found. 
2024-02-09 19:55:13 UTC: Free space = 88G 
2024-02-09 19:55:13 UTC: Total space needed = 1907M 

================================================================================ 

Starting collect-diagnostics, this operation might take a while... 

================================================================================ 

Executing cluster collection ..

C. How to validate the existing collect-diagnostics packages on the director/node.

When the collect-diagnostics command finishes and returns to the vplexcli prompt, connect to the director you ran the command from using winscp [or equivalent SCP utility] and navigate to the folder /diag/collect-diagnostics-out/

Identify the log file(s) with the correct timestamp and download them to your local workstation.

D. How to abort an on-going collect-diagnostics

Note: This is a non-disruptive activity. As there are no direct commands to abort the collection process, you will have to restart the management console. Yet, before aborting a running collect-diagnostics contact support to explain why you want to abort the running of the collect-diagnostics to ensure it is OK, as there may be data that could be lost. This lost data will not be available for collection again when the collect-diagnostics are re-run following the abort action.

If you are still on the PuTTY session where you started the collect-diagnostics you should be seeing the collect-diagnostics output streaming, showing it is still running.

Sample Output:

VPlexcli:/> collect-diagnostics --noextended --last-logs 30d 

('WARNING:The collect-diagnostics command was issued with option --noextended.\n',) 

The following file(s) will NOT be collected: 

        core files 
        fast trace dump files 
        slow trace dump files 
        udcom trace dump files 
        udcom legacy trace files 
        user-defined performance sink files 
        the management console's heap 

('WARNING:Only the logs that are generated in the last 30 days are collected.') 

2022-02-09 19:55:12 UTC: ****Initializing collect-diagnostics... 
2022-02-09 19:55:13 UTC: No cluster-witness server found. 
2022-02-09 19:55:13 UTC: Free space = 88G 
2022-02-09 19:55:13 UTC: Total space needed = 1907M 

================================================================================ 

Starting collect-diagnostics, this operation might take a while... 

================================================================================ 

Executing cluster collection ..

Open a duplicate PuTTY session and login to the director where you started the collect-diagnostics, using the service account.

  Sample Output:

login as: service 
Using keyboard-interactive authentication. 
Password: 
Last login: <date and time stamp data> from x.x.x.x 
service@director-1-1-b:~>

Once on the director, restart the management console using following command to abort the collect-diagnostics that is running.

Sample Output:

service@director-1-1-b:~> sudo systemctl restart VPlexManagementConsole.service

Looking back at the first PuTTY session that has the collect-diagnostics running in it when you  restarted the management console you should see the collect-diagnostics report the following as the last noted output,

"Connection closed by foreign host."

Sample output (check the last line of the output):

VPlexcli:/> collect-diagnostics --noextended --last-logs 30d 

('WARNING:The collect-diagnostics command was issued with option --noextended.\n',) 

The following file(s) will NOT be collected: 

        core files 
        fast trace dump files 
        slow trace dump files 
        udcom trace dump files 
        udcom legacy trace files 
        user-defined performance sink files 
        the management console's heap 

('WARNING:Only the logs that are generated in the last 30 days are collected.') 

2022-02-09 20:02:03 UTC: ****Initializing collect-diagnostics... 
2022-02-09 20:02:04 UTC: No cluster-witness server found. 
2022-02-09 20:02:04 UTC: Free space = 88G 
2022-02-09 20:02:04 UTC: Total space needed = 1907M 

================================================================================ 

Starting collect-diagnostics, this operation might take a while... 

================================================================================ 

Executing cluster collection ..                               ERROR 
Executing SMS log collection ..                               Connection closed by foreign host. <<<

Once the collect-diagnostics are seen stopped, step 3 above, go back to the second PuTTY session and 'cd' to the /diag directory, then run ' ll ' and you should see some extra directories,

collect-diagnostics-tmp
collect-diagnostics-jobs
collect-diagnostics-tmp-ext*

*if extended files were not omitted

Sample output:

service@director-1-1-b:/diag> ll 
total 32 
drwxr-xr-x 2 service groupSvc  4096 Feb  9 20:03 collect-diagnostics-tmp-ext
drwxr-xr-x 2 service groupSvc  4096 Feb  9 20:03 collect-diagnostics-jobs 
drwxr-xr-x 2 service groupSvc  4096 Feb  9 20:04 collect-diagnostics-out 
drwxr-xr-x 3 service groupSvc  4096 Feb  9 20:02 collect-diagnostics-tmp 
drwx------ 2 root    root     16384 Jan 27 16:54 lost+found 
drwx--x--x 3 service groupSvc  4096 Dec 17 03:08 share 
service@director-1-1-b:/diag>

If you look inside each of these directories, you will see files with the date and time you had started the now cancelled collect-diagnostics. These files take up space in the /diag  partition and should be removed.

To remove/delete the files from the /diag directory type "rm -r collect-diagnostics-jobs" and "rm -r collect-diagnostics-tmp,” then enter ' ll ' again to ensure the directories have been deleted/removed.

Sample output:

service@director-1-1-b:/diag> rm -r collect-diagnostics-jobs 
service@director-1-1-b:/diag> rm -r collect-diagnostics-tmp 

service@director-1-1-b:/diag> ll 
total 24 
drwxr-xr-x 2 service groupSvc  4096 Feb  9 20:04 collect-diagnostics-out 
drwx------ 2 root    root     16384 Jan 27 16:54 lost+found 
drwx--x--x 3 service groupSvc  4096 Dec 17 03:08 share 
service@director-1-1-b:/diag>

If a 'collect-diagnostics-tmp-ext' directory does exist, remove it run "rm -r collect-diagnostics-tmp-ext"

Note: The extended file is typically used to investigate node crashes. If there is an ongoing investigation into a node crash and support has not captured all necessary logs, check with support before cleaning up the collect-diagnostics-tmp-ext directory as doing so may delete necessary core files.

Affected Products

metro node mn-114, metro node mn-215

Article Number: 000197436

Article Type: How To

Last Modified: 03 Apr 2024

Version: 6

Check if your device is covered by Support Services.

Metro Node: How to collect logs from the Metro Node

Summary: This article outlines the steps on how to collect logs from the Metro Node, and also covers what logs/data may be needed for a performance issue.

Instructions

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services

Welcome

Welcome to Dell

Metro Node: How to collect logs from the Metro Node

Summary: This article outlines the steps on how to collect logs from the Metro Node, and also covers what logs/data may be needed for a performance issue.

Detailed Article

Instructions

Affected Products

Instructions

Affected Products

Article Properties

Find answers to your questions from other Dell users

Support Services

Article Properties

Find answers to your questions from other Dell users

Support Services