The principle logs for debugging backup failures are the policy log files which are at the following location.
Linux: /nsr/logs/policy_name/workflow_name/action_name
Windows: ..\Program Files\EMC NetWorker\nsr\logs\policy_name\workflow_name\action_name
There are workflow log files in the raw format under /nsr/logs/policy/policy_name/workflow_name/jobid.raw and a subdirectory for each action. Each child action of an action has its own log file with the jobid of that child job. When the parent action starts a child action, NetWorker creates a directory for these child action logs.
Example:
Here we can see the location of the policy logs and that the logs are of different sizes depending on the debug level that is used during the backup. The raw files are the workflow logs, while the backup_[jobid]_logs directories contain the action logs and child action logs.
Example:
NetWorker client-based backups use the save process. The save process communicates with the NetWorker server, storage node (where applicable), or target backup device media. Debug can be enabled on the save process by passing the -D debug flag to the save process using either the NetWorker Management Console (NMC) or using he nsradmin command.
In the NMC, you change the 'Backup command' field in the relevant client properties to 'save -D9':
Example:
You can do the same operation using the nsradmin command:
Example:
Alternatively, on a linux system, you can use the printf command to make this nsradmin change in one line:
Example:
printf "show \n . type : NSR Client; name : vm-lego-231; save set : /alice\n update backup command : save -D9\n" | nsradmin -i -
NetWorker Command Reference Guide
How to Use NetWorker nsradmin validation checking
Special Uses for the NetWorker nsradmin program Technical Note
nsrworkflow -D9 -p [policy] -w [workflow]
This logs the workflow job debug output to the raw file in:
/nsr/logs/policy/policy_name/workflow_name/
Example:
Running the nsrworkflow command initiates the job manually but use the same scheduling and level configuration options that are used as a scheduled automated backup. Another possibility is to use the -a flag to define the nsrworkflow run as an adhoc backup which allows to override the backup schedule or level. To specifiy the backup level that you want (not what is set for today's run of the workflow), you use the -l (or -L for virtual machine backups).
Example:
nsrworkflow -p [policy] -w [workflow] -A "'[action]' -l [level]" -a
nsrworkflow -p Mona -w Bokonon_wf -A "'backup' -l full" -a
516616 : How to use the NetWorker nsrworkflow command
513030 : How to use the NetWorker nsrpolicy command
NetWorker 9.1.x Release Notes:
NetWorker Command Reference Guide
The savefs command is used during client-based backups. It is sent to the NetWorker client after the backup is initiated on the NetWorker server. savefs is this process which is responsible for determining the files and directories to back up for this specific backup run on this client.
You can obtain the exact savefs command which is being run on the client side from the raw file in the policy logs (/nsr/logs/policy/[policy name]/[workflow name]). Then run this on the client side, adding the -D9 option:
Example:
On the NetWorker server:
And then on the client side:
The assignment of the correct target volume for a backup is managed by the nsrd process on the NetWorker server. To debug this, you must temporarily increase the debug level of the nsrd process on the NetWorker server using the dbgcommand.
Example:
After debugging is completed, you must turn off the debugging like so:
If the NetWorker server cannot find a suitable NetWorker volume to write to, it will stop responding and generate an alert. In this case, the job will be in the 'active' state. You can check the state of the job using the nsrpolicy monitor command.
Example:
The alert in the NetWorker Management Console gives more details on what type of volume is being sought and on which Storage Node.
Example:
If the NetWorker server determines that it cannot continue with the backup because there is no free parallelism slot. In this case, the job is in the 'queued' state.
In order to debug the parallelism, you need must increase the debug level of the nsrjobd process on the NetWorker server as shown below. The daemon log file outputs a lot of debugging data relative to parallelism.
NetWorker Performance Optimization Planning Guide
Parallelism and Target Sessions
A "Client direct" backup sends data directly from the NetWorker client to the target media without first writing to the NetWorker Storage Node.
You can define in the client properties whether client direct backup should be used or not for this client instance.
In order to troubleshoot whether client direct is working or not, you must inspect the logs as per the below example:
Example:
Log output: Client direct in operation.
Daemon log file on the NetWorker server:
91787 08/01/2014 01:37:35 PM nsrmmd NSR notice Save-set ID '4091251191' (vm-lego-231:/NetWorker) is using direct file save with Data Domain device 'dd4500-dd.local_onetwoone'.
lsof on the NetWorker client
[root@vm-lego-231 ~]# lsof -i TCP | grep save
save 9831 root 3u IPv4 111668 0t0 TCP vm-lego-231:23178->vm-lego-121:8985 (ESTABLISHED)
save 9831 root 5u IPv4 111695 0t0 TCP vm-lego-231:19752->vm-lego-121:9417 (ESTABLISHED)
save 9831 root 7u IPv4 111720 0t0 TCP vm-lego-231:31095->vm-lego-121:9035 (ESTABLISHED)
save 9831 root 8u IPv4 111728 0t0 TCP vm-lego-231:12421->vm-lego-121:9653 (ESTABLISHED)
save 9831 root 9u IPv4 111731 0t0 TCP vm-lego-231:33739->dd4500-dd.local:nfs (ESTABLISHED)
save 9831 root 10u IPv4 111736 0t0 TCP vm-lego-231:60278->dd4500-dd.local:midnight-tech (ESTABLISHED)
Note: We can see that there are open TCP connections from the client both to the NetWorker server and to the DD. If you need to know which processes exactly on the NetWorker server are connected to, you can cross-check with lsof on the server. The fourth column is the file descriptor being used.
On a windows system, you could see similar output by using resmon: Start - Run - resmon - Network tab - TCP Connections
Daemon log file on the NetWorker server:
91797 08/01/2014 01:57:51 PM nsrmmd NSR severe Unable to perform direct file save with Data Domain device 'ONETWOONE'; setting up traditional save for save-set ID '4024143566' (vm-lego-231:/NetWorker)
Note: Looking for the word traditional in the log gives you this output quickly. If you need to find out why it is not using client direct, start with the NetWorker Administration Guide's list of conditions that need to be met for client direct to work. The most common reasons would be that the client has no direct network access to the DD from the NIC it is using or that the name resolution is not working correctly from the client.
lsof on the NetWorker client:
[root@vm-lego-231 ~]# lsof -i TCP | grep save
save 10114 root 3u IPv4 123335 0t0 TCP vm-lego-231:46461->vm-lego-121:8985 (ESTABLISHED)
save 10114 root 5u IPv4 123369 0t0 TCP vm-lego-231:12593->vm-lego-121:9417 (ESTABLISHED)
save 10114 root 7u IPv4 123392 0t0 TCP vm-lego-231:63952->vm-lego-121:9035 (ESTABLISHED)
save 10114 root 8u IPv4 123400 0t0 TCP vm-lego-231:29597->vm-lego-121:9653 (ESTABLISHED)
Note: Only TCP connections to the NetWorker Server (which is also the Storage Node in this example) are open here. There is no TCP connection open to the DD. All the data is going to the Storage Node.
NetWorker Performance Optimization Planning Guide
To debug PSS backups. Ensure that the 'parallel save stream' property is ticked in the client resource in the NetWorker Management Console. Modify the save command to put it in debug as per number 1 above. Also, create an empty file in ../nsr/debug called 'mbsdopen'. This provides extra debug logging both on the client in /nsr/tmp and in the policy logs on the NetWorker server (see number 1 above).
Example:
How to Troubleshoot NetWorker Parallel Save Stream backups
NetWorker Performance Optimization Planning Guide
You can increase the debug level of the nsrmmd processes using the dbgcommand (described in number 7 above). You can either increase the debug level of all the nsrmmd processes or else use operating system tools to identify which nsrmmd process is active:
479665 : Triage Article: Troubleshooting Tape Library Problems in NetWorker
NetWorker Data Domain Boost Integration Guide