smartpools file policy with 'directory' type criteria

Question

Hello,

I would like to hear if anyone has tried creating SmartPools file policy using the 'directory' file type filter criteria.

I would like to run a smartpools job so that it will set all directories under /ifs/data , and only directories, to store NEW files on a specific node pool. I don't want to have SmartPools move existing files because there are over 1Billion files and that will take forever (in fact it is running for weeks now and not making any progress). I only want the NEW files to be created in a specified new node pool.

So will using a the 'directory' file type help set the node pool target of all directories under /ifs/data and create new files in that pool?

Shai

Peter_Sero · Answer

The 'directory' file type rules are very efficient, because they don't rely on filename/attribute matching, but affected directories store within their own metadata the rule for new files.

Therefore, for this becoming effective, each directory in question must be UPDATED once AFTER the SmartPools rule has been set. Took me a roundtrip to support to get this clear the other day; I think it's now better documented.

Normally the full SmartPools job does it... but you a seeking to avoid exactly this.

You can use

isi smartpools apply /PATH/TO/DIRECTORY

instead -- very fast -- but need to be applied to every single directory in turn...

You can also try out

isi smartpools apply --recurse --dont-restripe /ifs/data

and see wether it is more efficient than a crafted find-apply script.

Either approach you can split to sub-sub-dirs and run in parallel on multiple nodes.

(How many disks are there in the cluster, and with or w/o metadata on SSD?)

-- Peter

shaisilon · Answer

Hi Peter,

I am actually not looking to avoid running a full smartpools job just to confirm what you have said below - 'directory' type based rule are effective in marking the directory meta-data to create new files in the set target directory.

Thanks,

Shai

Peter_Sero · Answer

Hello Shai,

in fact 'directory' based rules are different from other rules (file name, size, attributes etc) insofar as they can be applied to newly created files at the very moment of file creation rather than later 'offline'. However this does NOT happen by matching each new file's full pathname to the rules, but instead the file's immediate owner directory is consulted.

This difference is very important.

It means that this directory must have been actively 'prepared' to 'carry' the rule with it before files are created. The preparation is done by either the SmartPools job, or the 'isi smart pools apply' command.

The latter is also a fantastic tool for trying out new SmartPool rules on a smaller set of dirs/files, and it's also valuable for 'fire fighting' in case the regular SmartPools jobs cannot cope with the amount of incoming data.

(E.g. you can check with 'isi statistics heat -nall --classes=create' where new files a being generated, and run the 'apply' command just there.)

Can you do me a favor and send the number of disks in your pools (or number of nodes plus node types), and wether metadata is on SSD? I'm testing a simple hypothesis on OneFS behavior...

Cheers

-- Peter

shaisilon · Answer

Hi Peter,

I fully agree with your feedback. A key point I want to iterate. I am considering not a straigth forward path based 'directory' rule but a rule which includes BOTH directory path AND a 'directory file type' in the filter criteria.

My goal is to take an EXISTING directory and only mark the directory objects because I want to avoid moving the hundreds of millions of files themselves. I want to tell the cluster to START putting NEW files (only) in the new disk pool target and ignore and therefore bypass moving the EXISTING files.

I am doing this because I have a cluster with 8 NL400 nodes and 4 X200 nodes and NO SSD. This is ARCHIVE ONLY environment with 1Billion files. I want to run a single smartpools job so that new data will be created on NL400 only.

After this job will finish (faster because it it avoid moving actual files) I will have all NEW data created on NL400. That is my goal.

Hope this makes sense.

Shai

Peter_Sero · Answer

Ah, 'directory type' versus 'path', that has slipped my attention...

I don't think it is going to work, you want to 'fix the directories now'

so that 'files get created on NL afterwards'. That's kind of having two different

levels of operation in one rule; seems the grammar or the SmartPools mechanism

are not that powerful. If you restrict this rule to directories by type, this rule can

hardly affect regular files later, only new (sub)directories.

Some quick tests with the 'apply' command should make this more familiar.

That said, the '--dont-restripe' option might come close to what want;

it's not really part of a rule, though.

Alternatively, if the larger part of the billion files are located

in directories with no other subdirectories ('leaves' in the dir tree),

then you can use 'find' with -prune on the leaves

to relatively fast identify all directories without getting stuck

in scanning the leaves! It depends a bit on the structure

of the actual directory tree and how exactly it is populated with files.

As for the file count:

If a billion files sit on 4 X200 nodes or 48 disks, that would

be 20 million (pretty small) files per disk.

On the 8 NL400 nodes with 288 disks, it would

make still 3.5 million files per disk.

On the whole cluster, the ratio is 3 million files per disk.

(Important: 'Files per disk' is a purely computational ratio here,

as we know, each file is spread to multiple disks for protection).

My hypothesis is, that while a low files/disk ration is not

a guarantee for a well working cluster (of course not,

bandwidth or IOPS bottleneck can be imposed by

actual workloads), a high ratio would likely ask for trouble.

With SATA and no SSD, probably in the area of 1 million files/disk

things become 'interesting', jobs need to be carefully scheduled,

balanced and monitored. (Assuming a foreground workload

that is 'adequate' for the used node type.)

For 3 million files/disk there have been reported cases in the community

where the foreground workload is doing fine -- until the background

jobs kick in. Maybe there are pure SATA clusters working well

at 3 mio f/d, I'd love to hear from those. With 20 mio f/d

things must be hopeless when it comes to any restriping.

Maybe it is a bold claim to say that the simple files/disk ratio

is an indicator for whatever kind of cluster behavior.

I repeat, it can hardly be a positive sign for good conditions,

but it can help drawing the attention to potentially--if not most

likely--problematic situations in advance.

You say your cluster serves as an archive. Are you sure

that a disk or node failure can be handled properly, within

a reasonable amount of time? The underlying restripe

mechanism is mainly identical for various jobs...

Looking forward to any feedback (also from others),

and to learning how you finally deal with your system.

Cheers

-- Peter

Isilon

smartpools file policy with 'directory' type criteria

Was this post helpful?