Symptoms
When a snapshot is deleted, it is scanned to free the blocks it exclusively owned. This process, aka reclaimer, is performed in the background.
The bigger the snapshot - the longer this operation takes. After the reclaimer process has finished, the entire freed space becomes available to the volume.
It must run before blocks can be unmapped using scsi unmap (if enabled) into the backend SAN volumes.
Reclaimer queues up to run when any data is deleted on the NAS pool, including data deleted from shares, NAS volumes, and snapshot deletions.
Cause
Known limitations and Issues
- The reclaimer service cannot be ran manually or stopped for an extended period of time, once it begins it must finish its queue before space is released to the NAS pool.
- Reclaiming snapshots is resource-intensive. If a lot of reclaiming activity occurs concurrently, it could cause performance problems across the cluster.
- Resource-intensive reclaim operations can become so performance-impacting that it can affect client access to the cluster.
- There is a snapshot creation/expiration limit that varies by appliance based on overall system load. This could directly impact the reclaimer and system functionality.
- While reclaimer has been improved in FluidFS firmware v6 for snapshot deletions, it is possible for an overloaded reclaimer service to affect client access. These events are reported as "clients may encounter a long period of partial data access"
"Clients may encounter a long period of partial data access"
Check whether the performance problems occur around the time that some snapshots expire.
There are multiple types of snapshots:
- Ad-hoc snapshots - Snapshots that expire when the administrator deletes them, or according to the expiry time set by the administrator.
- Scheduled snapshots - Snapshots that expire according to the schedule details. The names are based on the schedule name.
- NDMP snapshots - Snapshots that expire when the NDMP backup completes. The names start with ndmp.
- Replication snapshots - Snapshots that expire after the next replication completes successfully. (During a replication there are two snapshots, the previous snapshot and the current snapshot.) Replication snapshots names begin with rep.
Resolution
Staggering snapshots tasks (Standard Snapshots, Replication, NDMP)
If many snapshots expire simultaneously, it might cause performance issues.
Fewer, but larger, snapshots that expire simultaneously can also cause performance problems.
It is recommended to stagger hourly snapshots across time (steps of 10 minutes), and stagger daily snapshots across the day (preferably expiring at night). Weekly snapshots should preferably expire on weekends.
Affected Products
Dell Compellent FS8600