we get around 10-30% dedupe rates on your typical "departmental" file systems. The issue that i have with enabling dedupe is if you have existing checkpoints, when you enable dedupe, you will see your savvol growth exponentially. Here is an example: I enabled dedupe on one file system the other day, it got deduped by 400G ..i was like Yay ..but then i look at savvol utialization and it actually grew by 400G !!!. So who benefits here ? Customer did becasue they got 400G worth of space back, but i ended up suffering because it took 400G out of my storage pool. As we all know savvol can not be shrunk so here are am stuck with ballooned savvol. I know there is a dedupe paper that explains all this but in the end i don't feel like i (celerra admin) gained anything (my customer did at my expense).
That's a bit disappointing to hear, and it's the main reason why I haven't turned on dedupe for most filesytems (we've had running threads all over on this topic). I think as long as we're unable to reclaim unused space in the SavVol, this is the end result. My smaller, non-deduped filesystems stay very small, relative to their allocated capacity. But their SavVols are all 25% to 45% of the filesystem size - as you said, I have less space in my pool and my customers get to use more of the space that they've purchased).
In my environment, keeping six weeks of snapshots on 90% full filesystems returns SavVols at 15 to 25% the size of the PFS (800GB PFS with 190GB SavVol). The same snapshot schedule with deduped filesystems at 50% full returns SavVols of 40 to 60% (500GB PFS with 220GB SavVol). At this ratio, enabling dedupe would probably reduce the number of PFS we use by 1/3 or a little more.
My results are probably atypical, but it really put me off from enabling dedupe on most filesytems.
dont worry about performance - deduplication scanning and processing is specifically built to "back off" and do less in case there is other production work competing for the CPU
deduplication itself will not grow the savvol - its built on purpose to abort once a savvol is 90% full (configurable param)
since it does need to allocate new blocks when it compresses a file it can cause the previous contents of these blocks to be copied to the savvol thus temporarily needing space in the savvol.
if dedupe have "filled" the savvol to 90% then regular write activity can cause the savvol to auto-extend
so you might want to lower that 90% value
my advice there for file systems full with existing data is to not try to dedupe everything at once - first configure it for example to only intake files not accessed for 180 days in the first run.
once the savvol blocks for that run have aged out of the savvol go lower and maybe repeat this till you get to the config times you want
alternatively you could delete all snapshots before turning on dedupe for the first time and recreate the schedules once dedupe is finished
With temporarily I meant that the block used by the dedupe can later be used for other checkpoints – not that the savvol can shrink.
If you are concerned about space usage by checkpoints I suggest to setup a max value per savvol – as well as tuning the dedupe params and monitoring usage
deduplication itself will not grow the savvol - its built on purpose to abort once a savvol is 90% full (configurable param)
since it does need to allocate new blocks when it compresses a file it can cause the previous contents of these blocks to be copied to the savvol thus temporarily needing space in the savvol.
temporary ? It extended my savvol by 400G and i can't get that space back. How's that temporary ? Sorry for being cynical but i am laughing at your proposal of deleting snapshots before enabling dedupe.
I do not have snapshot or checkpoint but I have celerra replication in place for DR Purpose.
I have 5.6.49.3 code running and using Rep.v2
My question is
In this case if i enabled DeDup on 500GB fs (92% full) ; whether it going to affect SavVol since it is part of replication? Is SavVol is used in Rep-V2?
technically yes since Replicator does use two internal checkpoints that also use a savvol
However for a normal ongoing (not stopped) replication I dont think you will see much of a difference since these checkpoints get refreshed every couple of minutes (basically what you set as RTO)
You will probably need to extend or lower it's utilization before enable dedupe :
" The file system must have at least 1 MB of free space before deduplication can be enabled. If there is not enough free space, an error message is generated and the server log is updated. "
" During the deduplication process, the file system must have enough free space available that is equivalent to the size of the original file to be deduplicated plus the size of the compressed version of the file to be stored. An additional 1 percent of the file system must be free, or if auto-extension is enabled, an additional 1 percent below the auto-extension threshold must be free."
dynamox
9 Legend
•
20.4K Posts
0
April 29th, 2010 05:00
we get around 10-30% dedupe rates on your typical "departmental" file systems. The issue that i have with enabling dedupe is if you have existing checkpoints, when you enable dedupe, you will see your savvol growth exponentially. Here is an example: I enabled dedupe on one file system the other day, it got deduped by 400G ..i was like Yay ..but then i look at savvol utialization and it actually grew by 400G !!!. So who benefits here ? Customer did becasue they got 400G worth of space back, but i ended up suffering because it took 400G out of my storage pool. As we all know savvol can not be shrunk so here are am stuck with ballooned savvol. I know there is a dedupe paper that explains all this but in the end i don't feel like i (celerra admin) gained anything (my customer did at my expense).
White Paper: Achieving Storage Efficiency through EMC Celerra Data Deduplication - Applied Technology
umichklewis_ac7b91
300 Posts
0
April 29th, 2010 10:00
That's a bit disappointing to hear, and it's the main reason why I haven't turned on dedupe for most filesytems (we've had running threads all over on this topic). I think as long as we're unable to reclaim unused space in the SavVol, this is the end result. My smaller, non-deduped filesystems stay very small, relative to their allocated capacity. But their SavVols are all 25% to 45% of the filesystem size - as you said, I have less space in my pool and my customers get to use more of the space that they've purchased).
In my environment, keeping six weeks of snapshots on 90% full filesystems returns SavVols at 15 to 25% the size of the PFS (800GB PFS with 190GB SavVol). The same snapshot schedule with deduped filesystems at 50% full returns SavVols of 40 to 60% (500GB PFS with 220GB SavVol). At this ratio, enabling dedupe would probably reduce the number of PFS we use by 1/3 or a little more.
My results are probably atypical, but it really put me off from enabling dedupe on most filesytems.
Karl
Rainer_EMC
4 Operator
•
8.6K Posts
0
April 30th, 2010 05:00
dont worry about performance - deduplication scanning and processing is specifically built to "back off" and do less in case there is other production work competing for the CPU
Rainer_EMC
4 Operator
•
8.6K Posts
0
April 30th, 2010 06:00
deduplication itself will not grow the savvol - its built on purpose to abort once a savvol is 90% full (configurable param)
since it does need to allocate new blocks when it compresses a file it can cause the previous contents of these blocks to be copied to the savvol thus temporarily needing space in the savvol.
if dedupe have "filled" the savvol to 90% then regular write activity can cause the savvol to auto-extend
so you might want to lower that 90% value
my advice there for file systems full with existing data is to not try to dedupe everything at once - first configure it for example to only intake files not accessed for 180 days in the first run.
once the savvol blocks for that run have aged out of the savvol go lower and maybe repeat this till you get to the config times you want
alternatively you could delete all snapshots before turning on dedupe for the first time and recreate the schedules once dedupe is finished
Rainer
Rainer_EMC
4 Operator
•
8.6K Posts
0
April 30th, 2010 07:00
I understand
With temporarily I meant that the block used by the dedupe can later be used for other checkpoints – not that the savvol can shrink.
If you are concerned about space usage by checkpoints I suggest to setup a max value per savvol – as well as tuning the dedupe params and monitoring usage
Rainer
dynamox
9 Legend
•
20.4K Posts
0
April 30th, 2010 07:00
temporary ? It extended my savvol by 400G and i can't get that space back. How's that temporary ? Sorry for being cynical but i am laughing at your proposal of deleting snapshots before enabling dedupe.
pmenon5
1 Rookie
•
96 Posts
0
March 14th, 2011 13:00
Hi Rainer
quick question
I do not have snapshot or checkpoint but I have celerra replication in place for DR Purpose.
I have 5.6.49.3 code running and using Rep.v2
My question is
In this case if i enabled DeDup on 500GB fs (92% full) ; whether it going to affect SavVol since it is part of replication? Is SavVol is used in Rep-V2?
thanks in advance.
Rainer_EMC
4 Operator
•
8.6K Posts
0
March 14th, 2011 15:00
technically yes since Replicator does use two internal checkpoints that also use a savvol
However for a normal ongoing (not stopped) replication I dont think you will see much of a difference since these checkpoints get refreshed every couple of minutes (basically what you set as RTO)
Rainer
pmenon5
1 Rookie
•
96 Posts
0
March 15th, 2011 09:00
Thanks Rainer!
(ref. to my environment) In that case i do not need to worry about SAVvol space utlisation during dedup as stated in above threads.
I can go ahead enable DeDup in above filesystem 500Gb (92% full)
am i right? just to reconfirm...
gbarretoxx1
2 Intern
•
366 Posts
0
March 15th, 2011 10:00
Hi,
You will probably need to extend or lower it's utilization before enable dedupe :
" The file system must have at least 1 MB of free space before deduplication can be enabled. If there is not enough free space, an error message is generated and the server log is updated. "
" During the deduplication process, the file system must have enough free space available that is equivalent to the size of the original file to be deduplicated plus the size of the compressed version of the file to be stored. An additional 1 percent of the file system must be free, or if auto-extension is enabled, an additional 1 percent below the auto-extension threshold must be free."
Gustavo Barreto.