I can’t help but notice a continued war of words between the caching and tiering camps.
Come on folks, let’s not insult our readers and our customers. Neither option is a panacea – and even together, there are clearly edge cases in which both are ineffective.
There is a simple computer science principle at work here – locality of reference. Caches exploit temporal locality; pre-fetch algorithms exploit spatial locality. The principle of locality (and the larger one of predictability) is key toward building any automation.
Automated tiering works using the principles of temporal locality too, although the time window is much larger. Rather than a block being hot over a period of seconds, it may be hot over a period of hours – or there may be a repeated pattern of that particular block being hot compared relative to its peers.
Can your cache always be as large as your storage system? No, that would be absurd. So clearly the cache hit rate cannot be 100%. Hence, automated tiering. Can automation ensure that a hot block (or cold block) is always on the correct tier when it is accessed (or not)? No, access patterns are not 100% reproducible. So clearly the automated tier hit ratio cannot be 100% either.
If either of these methods fails you, the end result is less performance and more cost. In both cases, it is prudent to consider what happens when the storage system doesn’t do everything automatically … because ultimately it is your workflow, your data and your organization that needs to function optimally.
I loved Jon Toigo’s recent post – an informed organization cannot avoid data classification, because it is part of understanding their workflow. Purely automated methods without any user input (both caching and fully automated tiering) are not sufficient to maximize performance and minimize cost – it is simple mathematics. A storage system has to give the user simple controls over how the system interacts with their data based on straight-forward business rules, and then use those hints to automate to the hilt.
The computer science principle that most people tend to remember and forget at the same time is the best one of all: KISS
I want maximum performance and minimal cost without sacrificing an ounce of simplicity. Don’t you?