Unsolved

This post is more than 5 years old

16 Posts

4400

August 20th, 2009 23:00

XAM - containerization and embedded blobs?

Is it possible to use the concepts of containerization and embedded blobs with XAM?

I understand that embedded data can be configured globally by setting the FP_OPTION_EMBEDDED_DATA_THRESHOLD value, and I assume that this will also be applied when using XAM to ensure that the XStream is stored in the CDF?

To implement containerization, is it simply a matter of managing the metadata (i.e. indexing into a byte array of all files to be "containered") as fields on the XSet?

Regarding the FP_OPTION_EMBEDDED_DATA_THRESHOLD, I have read that the normally recommended maximum is 100K, however this can be increased to 1MB if necessary. Is the appropriate choice here based on the difference in time to base64 encode/decode the blob, compared to the time taken to perform two writes/reads? (i.e. CDF and blob)

I will more often than not be dealing with blobs of an average size of ~300-400k - why wouldn't I store these as embedded data?

409 Posts

August 21st, 2009 04:00

Containerisation is completely independant of the API being use so yes you can use it with XAM

You can also use embedded blobs by setting the field .vnd.com.emc.centera.embedded.data.threshold field on the XAMLibrary (you can set all centera options via this lok at the centera vim reference guide ans search for embedded until you get to the correct part of the doc page 36 I think).

You cannot increase the embedded blob size beyond a max of 100KB.  You can however have multiple xstreams in the xset which are all 100KB and embedded up to an overall limit of 100MB

You are right about the trade off in performance.  The other one to take into account is that when a xset is opened the sdk will read all the xset (except non embedded xstreams) into memory and parse it (its actually an xml file) before you actually do anything else.  So in the case of a multi xstream xset with all xstreams embedded you would be reading into memory all xstreams wether you were interested in reading them or not.

Also bear in mind your content will not be single instanced by the centera if you use embedded blobs.

One other thing you should do if you use embedded blobs is set .vnd.com.emc.centera.buffersize to be the total size of your xset (not including non embedded xstreams).  When you readin in an xset it is read into a memory buffer which is be default 16KB.  So if you have embedded blobs it is likely that the xset size is much larger than than and the sdk will be forced to page parts of the xset to disk which will slow down performance

16 Posts

August 24th, 2009 22:00

Thanks for the info Paul.

I was unable to set .vnd.com.emc.centera.embedded.data.threshold on the XAMLibrary (xam/field not found), but was able to set com.emc.centera.embedded.data.threshold on the XSystem.

I'm doing some rudimentary performance testing on adjusting these values (buffer as well). Regarding performance, I am wondering if there will be a noticeable impact on performance as the number of items stored within the Centera increases?

409 Posts

August 25th, 2009 03:00

Sorry I think I got suckered by a documentation error I had forgotten about when I cut n pasted part of the post.  You have correctly found it should be com.etc

You will possibly see a difference in performance between a completely empty cluster and one with a few million objects stored per node.  After that there should be no difference up to the limit of the high object count for the node (currently 25million objects per disk by default)

No Events found!

Top