Steps on how to accomplish the following tasks:
Note: If SupportAssist logs are needed, refer the customer to KBA 000135669, "How to export a SupportAssist Log Collection from SupportAssist Enterprise? Connected or Disconnected "
A. What logs are required to debug Metro Node problems?
Notes:
These are samples of what these two filenames look like, the date and time, shown as YYYY-MM-DD-HH.MM.SS, will be from the date and time these were collected:
B. How do I capture collect-diagnostics on a Metro Node cluster?
Note: The base file, covering the last 30 days, is sufficient to investigate and resolve most issues. These options should be used unless instructed otherwise by support.
To capture this data, run a collect-diagnostics command with the following flags "--noextended" and "--last-logs 30d."
Establish an SSH session at a director node Linux prompt, example, service@director-1-1-a, then log into the vplexcli.
Sample output:
login as: service Keyboard-interactive authentication prompts from server: | Password: End of keyboard-interactive prompts from server Last login: <date and timestamp data> from x.x.x.x service@director-1-1-a:~> service@director-1-1-a:~> vplexcli Trying ::1... Connected to localhost. Escape character is '^]'. VPlexcli:/>
Example Output:
VPlexcli:/> collect-diagnostics --noextended --last-logs 30d ('WARNING:The collect-diagnostics command was issued with option --noextended.\n',) The following file(s) will NOT be collected: core files fast trace dump files slow trace dump files udcom trace dump files udcom legacy trace files user-defined performance sink files the management console's heap ('WARNING:Only the logs that are generated in the last 30 days are collected.') 2024-02-09 19:55:12 UTC: ****Initializing collect-diagnostics... 2024-02-09 19:55:13 UTC: No cluster-witness server found. 2024-02-09 19:55:13 UTC: Free space = 88G 2024-02-09 19:55:13 UTC: Total space needed = 1907M ================================================================================ Starting collect-diagnostics, this operation might take a while... ================================================================================ Executing cluster collection ..
C. How to validate the existing collect-diagnostics packages on the director/node.
When the collect-diagnostics command finishes and returns to the vplexcli prompt, connect to the director you ran the command from using winscp [or equivalent SCP utility] and navigate to the folder /diag/collect-diagnostics-out/
Identify the log file(s) with the correct timestamp and download them to your local workstation.
D. How to abort an on-going collect-diagnostics
Note: This is a non-disruptive activity. As there are no direct commands to abort the collection process, you will have to restart the management console. Yet, before aborting a running collect-diagnostics contact support to explain why you want to abort the running of the collect-diagnostics to ensure it is OK, as there may be data that could be lost. This lost data will not be available for collection again when the collect-diagnostics are re-run following the abort action.
Sample Output:
VPlexcli:/> collect-diagnostics --noextended --last-logs 30d ('WARNING:The collect-diagnostics command was issued with option --noextended.\n',) The following file(s) will NOT be collected: core files fast trace dump files slow trace dump files udcom trace dump files udcom legacy trace files user-defined performance sink files the management console's heap ('WARNING:Only the logs that are generated in the last 30 days are collected.') 2022-02-09 19:55:12 UTC: ****Initializing collect-diagnostics... 2022-02-09 19:55:13 UTC: No cluster-witness server found. 2022-02-09 19:55:13 UTC: Free space = 88G 2022-02-09 19:55:13 UTC: Total space needed = 1907M ================================================================================ Starting collect-diagnostics, this operation might take a while... ================================================================================ Executing cluster collection ..
Sample Output:
login as: service Using keyboard-interactive authentication. Password: Last login: <date and time stamp data> from x.x.x.x service@director-1-1-b:~>
Sample Output:
service@director-1-1-b:~> sudo systemctl restart VPlexManagementConsole.service
"Connection closed by foreign host."
Sample output (check the last line of the output):
VPlexcli:/> collect-diagnostics --noextended --last-logs 30d ('WARNING:The collect-diagnostics command was issued with option --noextended.\n',) The following file(s) will NOT be collected: core files fast trace dump files slow trace dump files udcom trace dump files udcom legacy trace files user-defined performance sink files the management console's heap ('WARNING:Only the logs that are generated in the last 30 days are collected.') 2022-02-09 20:02:03 UTC: ****Initializing collect-diagnostics... 2022-02-09 20:02:04 UTC: No cluster-witness server found. 2022-02-09 20:02:04 UTC: Free space = 88G 2022-02-09 20:02:04 UTC: Total space needed = 1907M ================================================================================ Starting collect-diagnostics, this operation might take a while... ================================================================================ Executing cluster collection .. ERROR Executing SMS log collection .. Connection closed by foreign host. <<<
*if extended files were not omitted
Sample output:
service@director-1-1-b:/diag> ll total 32 drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:03 collect-diagnostics-tmp-ext drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:03 collect-diagnostics-jobs drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:04 collect-diagnostics-out drwxr-xr-x 3 service groupSvc 4096 Feb 9 20:02 collect-diagnostics-tmp drwx------ 2 root root 16384 Jan 27 16:54 lost+found drwx--x--x 3 service groupSvc 4096 Dec 17 03:08 share service@director-1-1-b:/diag>
Sample output:
service@director-1-1-b:/diag> rm -r collect-diagnostics-jobs service@director-1-1-b:/diag> rm -r collect-diagnostics-tmp service@director-1-1-b:/diag> ll total 24 drwxr-xr-x 2 service groupSvc 4096 Feb 9 20:04 collect-diagnostics-out drwx------ 2 root root 16384 Jan 27 16:54 lost+found drwx--x--x 3 service groupSvc 4096 Dec 17 03:08 share service@director-1-1-b:/diag>
Note: The extended file is typically used to investigate node crashes. If there is an ongoing investigation into a node crash and support has not captured all necessary logs, check with support before cleaning up the collect-diagnostics-tmp-ext directory as doing so may delete necessary core files.