This post is more than 5 years old
51 Posts
0
1318
We have just started Archiving our Celerra onto the Centera
It looks like we have archivied about 10 TB of data. On the Centera it looks like there is 10 TB of used space on it. My thought was that the Centera was single instance by default. Is there something we need to turn on for it. I can't beleive that out of the 10 TB of data nothing is the same.
EMCDennis
124 Posts
0
January 5th, 2010 11:00
Besides SIS, your 10 TB of used capacity also accounts for protection. The amount of overhead consumed by protecting the data depends on the protection scheme (mirroring = 100% overhead, Parity = 1/6th overhead)
Regarding SIS - below is from some old notes I have
There are 2 settings that determine if/when SIS is done.
On the Centera – Storage Strategy:
– Capacity – Allow SIS to occur (not entirely true)
– Performace full - Enables fast addressing for blobs (larger than 256KB by default) and all C-Clips. Clips will be 53 characters
– Performance partial - Enables fast addressing for blobs only. Clips will be 27 characters
In the SDK – collision avoidance.
– Clip collision avoidance – method to ensure clipIDs are unique (GM naming)
– Blob collision avoidance – method to ensure blobIDs are unique (GM naming)
There are 3 naming schemes currently used by Centera
M
– MD5 hash
– 27 characters long
– Allows for SIS
– No timestamp in name
M++
– MD5 hash plus a 120 bit truncated SHA-256 hash of the data
– 53 characters long
– Allows for SIS
– No timestamp in name
– Differs from SDK format. Platform name starts with a G5
GM (w/ discriminator)
– MD5 hash plus a time stamp plus a header plus Unique Identifier (TS+Hdr+GUID)
– 53 characters long
– Does NOT allow for SIS
– Differs from SDK format. Platform name starts with a G4
There are two others but they are no longer used by current CentraStar code versions
– MG (53 characters)
– GM w/o discriminator (53 characters, Differs from SDK format. Platform name starts with a G6)
CLIPS
Storage Strategy
SDK Collision Avoidance
ENABLED
SDK Collision Avoidance
DISABLED
Capacity
GM (53 char)
M (27 char)
Performance Full
GM (53 char)
GM (53 char)
Performance Partial
GM (53 char)
M (27 char)
BLOBS (small – less than 256KB in size for each fragment. That means the object must be smaller than 256KB for CPM or 1.5MB for CPP)
Storage Strategy
SDK Collision Avoidance
ENABLED
SDK Collision Avoidance
DISABLED
Capacity
GM (53 char)
M++ (53 char)
Performance Full
GM (53 char)
GM (53 char)
Performance Partial
GM (53 char)
GM (53 char)
BLOBS (large – more than 256KB in size for each fragment. That means the object must be larger than 256KB for CPM or 1.5MB for CPP)
Storage Strategy
SDK Collision Avoidance
ENABLED
SDK Collision Avoidance
DISABLED
Capacity
GM (53 char)
M++ (53 char)
Performance Full
GM (53 char)
M++ (53 char)
Performance Partial
GM (53 char)
M++ (53 char)
Remember:
M and M++ allow for SIS - Shown in Green
GM does NOT allow for SIS
holgerjakob_c0722c
337 Posts
0
January 5th, 2010 23:00
Hi Martin
Dennis, thanks for the detailed explanation. It's great to see such a summary.
Martin, we manage 35 Centera systems and I hear your question quite often. From an admin point of view, there are two views to capacity on a system. The application point of view and the system point of view.
If you have archived 10TB of data, you would want your application point of view to show 10TB as you have archived that amount.
show pool capacity is the command in CenteraViewer CLI to show you the application point of view of archived data.
How much capacity your data that has been written to Centera really uses on Centera is shown with show capacity total and listed as protected user data.
If your Centera is in mirrored mode (show protection will tell you) you would expect protected user data to list 20TB as everything is stored twice. If you see less than 20TB that is what SIS saved.
If your Centera is in parity mode you would expect 11.6 TB (10*(7/6) for the 6+1 parity scheme) to be shown under protected user data. This is only true if 100% of your data is larger than the configured threshold for parity.
Assuming that 20% is small files smaller than then threshold of the default 256KB then it would be 2TB in mirrored and 8 TB in parity: 2*2 + 8*(7/6) = 13.3TB of protected user data. if you see less, than that is what SIS saved you.
I add these explanations as I guess you want to verify your single instancing over time to see if it changes and how much Centera saved you.
CU, Holger
ICI
9 Posts
0
April 12th, 2010 09:00
Hi, just a followup question on this topic. According to the chart even a Storage Strategy of Performance Full will result in M++ naming for large blobs. I would have thought that the GM naming would be used for this due to its significant performance increase. I see no difference between Full and Partial in this chart. I would have expected Performance Partial to use M++ on large blobs and Performance Full I to use GM.
Thanks
EMCDennis
124 Posts
0
April 12th, 2010 09:00
Performance partial and performance full are equal when writing small objects (by default, under 256 KB)
The difference comes into play when dealing with large objects over the performance threshold (256 KB by default)