Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Create and access a list of your products
Some article numbers may have changed. If this isn't what you're looking for, try searching all articles. Search articles

NetWorker Troubleshooting Guide: Process Crashes and Core Dumps

Summary: Dell NetWorker Comprehensive Guide to Troubleshooting Process Crashes and Core Dumps

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

NetWorker Troubleshooting Guide: Process Crashes and Core Dumps

Video: Dell NetWorker Comprehensive Guide to Troubleshooting Process Crashes and Core Dumps

Watch on YouTubeThis hyperlink is taking you to a website outside of Dell Technologies.

Cause

There are many different reasons why a NetWorker process may be unresponsive. This article sets out the recommended method to isolate and resolve a NetWorker process being unresponsive issue.

Resolution

Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document in order to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.

Step 1: Gathering Information - Problem Description
In order to generate a complete problem description, address the following questions:
    - Under what circumstances does the process crash. Is this behavior consistent?
    - Did this work better before.
    - Times of occurrences and observed behavior trend    
    - Does the issue happen only at times of heavy load on the backup environment or backups or a particular type of backup group.
    - When did the issue first occur. What changed then?
    - What is the scope of the issue (all clients/some clients, all backup targets or some)
    - What has been tried so far to fix and what conclusions have been drawn from this.

Step 2: Gathering Information - Environment
     -  Which NetWorker process is being unresponsive and on which machine (Server, Storage Node or Client).
     - NetWorker server version and platform
     - Overview of the size and nature of the backup datazone
     - Target media for these backups

Step 3: Supportability
      - Using the online NetWorker Compatibility Guide, check that all components (NetWorker server, file system version, proxy, storage nodes, clients, target) are supported.
      - Check that there is no underlying Operating System or hardware deficiency that would account for the process crashes (disk failures, disk full, network errors and so forth).

Step 4: Best Practices
      The NetWorker Performance Optimization Planning Guide contains several suggested software and hardware requirements and recommendations that should be implemented in order to have an optimally tuned NetWorker environment. This should be reviewed to be sure that the best practices are being followed for this datazone. This is relevant if the process being unresponsive is happening at times of heaviest load.

Step 5: Component Isolation
       How we go about finding the root cause of process being unresponsive issue depends on the behavior as defined in Step 1. If the trigger is unknown, tests can be carried out to try to establish what is triggering the crash:

    - Monitor system performance under heavy load
    - Examine the Operating System log files around the time of the crashes for commonality in behavior
    - Read the NetWorker schedule to determine if there is a correlation between times of occurrence a particular NetWorker scheduled activity.
    - Find out what non-NetWorker operations run on this machine that could affect its behavior and whether their schedule correlates with the times of crashes.
    - If the crash occurs consistently, change some parameters to try to narrow down the cause.  For example, backing up to a different target media or backing up different types of data from the same NetWorker client

Step 6: Resolution
A coredump is a special file which represents a dump of the working memory of a process at a specific time, usually when the program has terminated abnormally.  Core dump files can be used to diagnose the reason for a being unresponsive process, by analyzing what functions of the process were running at the time of the crash and what data was being accessed.

Most Operating Systems do not generate core dump files automatically.  The Operating System parameters must be modified so that a core dump file will be generated at the time of a process crash.  This modification must be done before the crash.

1) Check the /nsr/cores directory for recent core dumps of NetWorker processes in unix or linux or check the crash directory as defined in the Windows registry (see step 2).

2) If there is none, check that the Operating System is set up to generate core dump files if there is a process crash.  See Operating System Documentation for full details, but in brief, this will likely involve changing ulimit -c and -f values in linux or unix and making a registry change in Windows.

For windows 2008R2:  
- Update the registry with the new key provided at http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx.
- Using the recommended values, the dump file gets created in C:\Users\Administrator\AppData\Local\CrashDumps
- Enable full crash dumps.

3) The core file can be examined on the host machine itself or can be packaged for analysis on another machine.  Details of how to package these core files are available here:

UNIX and Linux core file packaging:
489272: How to collect core/crash dump information and related logs

For Windows, follow the instructions herein:
198564: How to collect the kernel and user dump for hung process(s) on Windows             

4) Analyze the available data:

- Operating System log files
- NetWorker daemon log file from the NetWorker server and relevant Storage Node.  
- Core file or Crash file

Detailed analysis of a core file requires an advanced knowledge of NetWorker internal operations and should be done by EMC NetWorker Support.  However, an initial read of the core file can be done to compare the contents of the core file with known issues.

Linux and HP-UX
gdb [full path to process] [core file]
(gdb) where

AIX
dbx [full path to process] [core file]
(dbx) where

Solaris
pstack [ core file ]
dbx [full path to process] [core file]
(dbx) where

Windows
- Start the windbg windows debugger program
- Click on File and Open Dump File in windbg.
- Type analyze --v in the bottom command window to retrieve full information.
    
5) Based on the above analysis and knowledge about the system behavior, you can compare the incident to the list of known issues detailed in the NetWorker Release Notes for the latest version.

Step 7: Advanced Debugging (if required)
If you suspect that there is a fault in the NetWorker software which is responsible for the being unresponsive process, you must package the crash file (see Step 3) and provide this with a full description of the observed behavior to Dell Technologies NetWorker Support for a detailed analysis of the issue.

Affected Products

NetWorker

Products

NetWorker
Article Properties
Article Number: 000034716
Article Type: Solution
Last Modified: 23 Sept 2024
Version:  5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.