Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products

Dell EMC Data Mover Admin Guide

PDF

Configure error handling

Manage settings related to how Data Mover handles error conditions such as connectivity or service availability.

From the web UI, select Settings > Data management configuration. In the Plugins section, select the (vertical ellipses) icon next to Data Mover, and select Edit configuration. The Data Mover configuration file opens.

Modify the following values:

Setting name Description
2nd_volume_check_timeout

When a worker process cannot connect to a volume, the process waits for 3 seconds and then tries again. If the process doesn't receive a response, the process attempts to connect for another period of time (example, 8.5 seconds) as configured by this setting. If the process still cannot connect, it marks the operation as failed.

Examples of a connection problems would be transferring bytes or performing other operations on NFS volumes or S3 object stores.

The default value is 8.5 (seconds). The minimum recommended time is 8.5. Do not enter a value greater than 56.

audit_file_size

The maximum size of each file in the audit log. The service calls logrotate when the current audit log file exceeds audit_file_size.

The default value is 100MiB.

audit_rotations

The number of gzip files (rotations) that logrotate will create.

The default value is 10.

backoff_secs_per_attempt

When a task fails, this value sets the number of seconds per attempt to wait before trying the task again.

The default value is 5 (seconds). The maximum amount of time a job will wait is 15 minutes.

The amount of seconds increases with each new attempt, by this same value. For example, using the default values, the first delay is 5 seconds, the second delay is 10 seconds, the third delay is 15 seconds, and so on, up to a maximum of 15 minutes per attempt.

This setting works with the setting heavy_task_attempts.

heavy_task_attempts

This option sets the number of times that DataIQ attempts to complete a task before considering the task a failure.

By default, in Data Mover for DataIQ 2.2 and later, this value is 1000. (In previous versions, the default value was 3.) With the new default values, jobs continue their attempts to complete each task for roughly 10 days before giving up.

This setting works with the setting backoff_secs_per_attempt.

error_handling

This option determines how Data Mover responds to errors.

  • When set to Continue, after an error, Data Mover continues to transfer files, even when some files could not be transferred correctly.

    The View Transfers dialog shows the files or folders that fail to be transferred in the Failed Paths/Bytes column.

    From that menu, select the job to see a list of all failed paths.

    When a folder name is shown, the entire folder and its contents have failed to be transferred. The total path counts and byte counts of failed paths includes the total number of files and bytes contained within failed folder.

  • When set to Pause, when the first error is encountered, Data Mover stops issuing new tasks for workers for that job.

    The worker that sent the error message stops. Any other workers that are also working on transferring files for the same job continue transferring to the end of their prescribed task, unless they also encounter an error.

    From DataIQ, users can choose to unpause or cancel the job.

    If a job is unpaused, Data Mover continues the job, and the workers restart. If additional errors are reported, Data Mover continues, even if some files cannot be transferred correctly. (See the description for Continue above.)

  • When set to Error, after any error, Data Mover stops the entire job and reports it as Failed.

The default value is Pause.

heavy_req_timeoutSeconds after which Data Mover closes heavy worker request connections.
NOTE Only increase this value if you suspect communication issues
light_req_timeoutSeconds after which Data Mover closes light worker request connections.
NOTE Only increase this value if you suspect communication issues
light_task_timeoutSeconds after which Data Mover cancels a Light Worker task that has not responded back to the Data Mover Service. Note: Only increase this value if you suspect communication issues between the Data Mover Workers and Data Mover Service, due to latency or networking issues.

The default value is 60.

max_connection_fail_checksTotal number of times to check a volume or S3 connection for connectivity/responsiveness, prior to transferring files to/from it, or when the volume or S3 source/destination seems to no longer be responsive. Once this number of connection checks has been reached and no responsiveness is found, the task will fail and be reissued 'heavy_task_attempts' times.

The default value is 2.

max_fs_transfer_triesMaximum number of times a heavy worker process will (internally) try to transfer a file from file system location A to file system location B, after encountering an error on the first attempt. Used when neither the source nor destination are object stores (like S3). After the configured number of attempts have failed, it will report the task attempt as failed.

The default value is 10.

max_s3_transfer_triesMaximum number of times a heavy worker process will (internally) try to transfer a file from location A to location B, where one of them is an S3 source/destination, after encountering an error on the first attempt. After the configured number of attempts have failed, it will report the task attempt as failed.

The default value is 10.

max_transfer_response_retriesMaximum number of times for a heavy worker to try to report task completion to the Data Mover Service, before giving up. Note: This should only be changed if it seems workers are frequently failing to report to the Data Mover Service and the task being timed out.

The default value is 5.

task_status_timeout

Seconds after which the Data Mover Service will cancel a Heavy Worker task that has not responded back to the Service.

NOTE Only increase this value if you suspect communication issues between the Data Mover Workers and Data Mover Service, due to latency or networking issues.

The default value is 120.

timeout_extension_per_gb

This is the number of seconds, per gibibyte already transferred, to extend a task's timeout. The timeout extension grows as the worker completes more work.

This configuration value is not present in the sample file, but may be entered manually.

This new setting and the logic behind it will cause the Data Mover Service to be more tolerant of workers that have completed a lot of their task already, but haven't been heard from in a while. It reduces the chance of worker tasks being timed out and reissued. Reissuance of tasks due to timeouts can be especially painful when the task is a very large one and is almost complete.

The default value is 6.0.

Example:

#Error handling
"Global Configurations":
  "Label Data Mover Service":
    error_handling:
      value: "Pause"

   light_req_timeout:
      value: 60
      default: 60

    light_task_timeout:
      value: 60
      default: 60

    heavy_req_timeout:
      value: 60
      default: 60

    task_status_timeout:
      value: 180
      default: 120

    audit_file_size:
      value: "100MiB"
      default: "100MiB"

    audit_rotations:
      value: 10
      default: 10

    2nd_volume_check_timeout:
      value: 8.5

    backoff_secs_per_attempt:
      value: 5

    heavy_task_attempts:
      value: 1000

    max_connection_fail_checks:
      value: 2

    max_fs_transfer_tries:
      value: 10

    max_s3_transfer_tries:
      value: 10

    max_transfer_response_retries:
      value: 5

    timeout_extension_per_gb:
      value: "6.0"

When you are finished editing settings, select Save.


Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\