PowerScale OneFS 9.3.0.0 CLI Administration Guide

PDF

Inline Data Deduplication overview

Inline data deduplication for Isilon F810, H5600, and PowerScale F200 and F600 nodes deduplicates data before the data is committed to disk. Deduplicating data before it is committed avoids redundant writes to disk.

Inline data deduplication (inline deduplication) includes inline zero block elimination, asynchronous data deduplication, and an in-memory, nonpersistent index table. Inline deduplication is supported as follows:

Isilon F810 disk pools with OneFS 8.2.1 or later on all nodes
Isilon H5600 disk pools with OneFS 8.2.2 or later on all nodes
PowerScale F200 and F600 disk pools with OneFS 9.0.0.0 on all nodes

Depending on workload, the data reduction rate with the inline compression and inline data deduplication features enabled is typically around 3:1.

No license is required for inline data deduplication.

Inline deduplication is a cluster-wide setting and is disabled by default. When enabled, the feature is always active, applies globally, and applies to all files on disk pools that support data reduction. Exceptions include:

Packed files
Writes to snapshots, though deduplicated data can be copied on write to snapshots.
Shadow stores
Stubbed files, such as CloudPools files
Files with the no_dedupe attribute set

You cannot selectively enable inline deduplication on individual files.

To be deduplicated, two files with identical data blocks or a file and a shadow store with identical data blocks must have the same disk pool policy ID. OneFS deduplicates data to a shadow store. OneFS uses a protection policy that is at least as high as the protection policy of the files being deduplicated.

NOTE: The "always on" aspect of inline deduplication can affect performance. Inline deduplication may not be right for performance-sensitive workloads. More guidance is available in Considerations for using inline deduplication.

You must have the ISI_PRIV_CLUSTER privilege to enable or disable inline deduplication.

You enable inline deduplication from the command line:

isi dedupe inline settings modify --mode enabled

Comparing inline deduplication with SmartDedupe

The following table compares inline deduplication with the SmartDedupe service.


Inline deduplication	SmartDedupe
Globally enabled	Directory tree based
Processes all regular files	Skips files less than 32 KB by default
Deduplicates sequential runs of blocks of matching data to single blocks	Can only deduplicate between files.
Per node, nonpersistent in-memory index	Large persistent on-disk index
Can convert copy operations to clone.	Post process only
Opportunistic	Exhaustive
No license required	License required

Inline deduplication workflow

Inline deduplication begins when data is flushed from the SmartCache (also known as the coalescer). The stages are:

SmartCache (coalescer) flush.
Determine the data to copy on write to snapshots.
Remove zero blocks.
Replace duplicate data with shadow store references.
Compress the remaining data.
Write to storage.

Zero block elimination is performed before inline deduplication. Files that are not eligible for deduplication may still have zero blocks that are removed. Data blocks that contain only zeros are detected and prevented from being written to disk. Skipping zero blocks can reduce the work that inline deduplication and data compression require.

The in-memory index table

Inline deduplication uses an in-memory index table to track dedupable data blocks. The index table is allocated on each node that supports the feature. Allocating the index table depends on available resources.

Inline deduplication is an opportunistic best effort service and is not a substitute for the SmartDedupe service. However, inline deduplication can reduce the amount of work that SmartDedupe has to do.

The default size of the index table is 10% of RAM up to a maximum of 16 GB. Each node has its own index: there is no sharing between nodes. Because the index is in-memory only, its contents are lost on reboot.

If you enable inline deduplication on a system that is booting, index allocation should happen quickly. If the system has been running for a while, locating the memory required for the index table may be difficult. In that case, index allocation can take longer and, if there is insufficient memory, can fail. See Troubleshoot index allocation issues for guidance.

The newly allocated index table is empty. Inline deduplication hashes data blocks as they are read and written and records the results in the index table. If inline deduplication encounters matching data blocks, data is deduplicated immediately. Over time, finding matching data becomes more effective as the index accumulates file system data.

The following describes the deduplication process when an initial data match is found between two files.

The data being written is redirected to a shadow store.
Shadow references are inserted into the current file.
Inline deduplication queues an asynchronous worker process to deduplicate the matching file with the shadow store.

After the initial match, inline deduplication compares data being written with the data in the shadow store. If it finds a match, it updates the current file with shadow references and does not write data to storage. Subsequent data matches are typically faster than the initial match since they involve less work.

Inline deduplication upgrade considerations

The following are upgrade considerations for using inline deduplication.

Specific versions of OneFS must run on all nodes in the cluster, as follows:
- F810 requires 8.2.1 or later.
- H5600 requires 8.2.2 or later.
- F200 and F600 require 9.0.0.0 or later.
Disk pools that can support inline deduplication must have the data_reduce flag set.
The data_reduce flag is set automatically on upgrade commit on all disk pools that support compression and inline deduplication.

Welcome

Welcome to Dell