Start a Conversation

Unsolved

This post is more than 5 years old

2104

January 22nd, 2014 17:00

what is minimum Isilon's minimum RTO, and can snapshot be supported?

Minimum RTO means the minimum amount of data could be lost in Async mode.

Thank you!

2 Intern

 • 

20.4K Posts

January 22nd, 2014 17:00

2 Intern

 • 

211 Posts

January 22nd, 2014 18:00

I found SnapshotIQ for snapshot.

But could not find the minimum RTO?

2 Intern

 • 

20.4K Posts

January 22nd, 2014 18:00

RTO - recovery time objective - what is your question ?

2 Intern

 • 

211 Posts

January 23rd, 2014 02:00

I am looking for the value of a parameter which is the minimum data lost I can define during data replication in async mode.

For example, I can define data transfer in every hour, or in every 10 mins,.. etc. If I define it in every 10 mins, then I could lost 10 mins worth of data. So, what is the minimum time frame I can define in Isilon replication?

I am not sure if the parameter could be called RTO or not. but, you know what I meant...

1.2K Posts

January 23rd, 2014 04:00

That's RPO btw...

And ironically while the name is "SyncIQ", it always operates in "async" mode...

As I see it,  two questions can be asked:

What's the minimum interval ("RPO") in scheduled mode?

And what can one expect from the new "continuous" ("when-source-modified") mode?

Technically, the minimum schedule interval seems to be 1 minute,

but few if any  6.5 clusters would be able to keep up with that (on large directories).

With 7.0 and 7.1, diffs are more efficient, so one might try out somewhat ambitious RPOs to see what's achievable in a given  environment.

What I find really interesting is the new continuous mode, which I understand to be an efficient "best efforts" method with the chance to have an RPO equalling the sum of change-detecting + network + target-writing latencies. If changes are detected in real-time, and traffic doesn't get stuck due to load/network throttling, that can give you a sub-second RPO in the ideal case...

But how are changes actually detected, and what latencies should be expected for this?

And when are snapshots taken here, if at all?

Any insights are welcome!

-- Peter

93 Posts

January 23rd, 2014 08:00

Unfortunately, the answer is, "it depends."

It completely depends on how much data is changing (and to a somewhat lesser extent the type) through time.

Assume we have a complete SyncIQ replication (the initial replication often takes longer than the subsequent incremental's).

Let us suppose that 1M files change every 5 minutes.  It takes a finite amount of time to process these changes in the SyncIQ snapshot from the last time a complete SyncIQ job finished.

Now consider the amount of data that changed; 100MB?  500GB?  This will influence the amount of time that the SyncIQ job will take to transfer the data and complete the job; undoubtedly longer than 5 minutes.

So we cannot set our minimum SyncIQ "trigger" to 5 minutes if lots of data is rapidly changing; the job will not complete in that time and we will not make our RPO.  You have to choose a reasonable amount of time between replication events that allows the cluster to successfully complete a replication.

There is overhead involved with each SyncIQ job; if they are triggered too often they won't complete before the next one is scheduled.

Does that make sense?  Or have I misunderstood your question?

Cheers,

Matt

1.2K Posts

January 23rd, 2014 22:00

Thanks Matt, that's absolutely clear.

But how does the new "continuous" mode works in practice, where no triggers are scheduled,

or no RPO is explicitly stated?

Of course it depends, but assuming that the target cluster can write as fast as the source cluster,

and network bandwidth to the target is as good as to the source (from clients),

what latencies (achieved RPOs) can by expected?

Cheers

-- Peter

99 Posts

January 24th, 2014 06:00

Yes it is named SyncIQ, because the goal is to synchronize the metadata between the two clusters.  The data may take some time to replicate - depends on the length of file along with many other network factors - but identical metadata is job 1.

As for the 7.1 continuous mode, the premise is simple.  Shortly after a change is recognized by OneFS, a sync job will begin to move that change (delta) to the target.  'Shortly' means a few seconds - you may see the job begin anywhere from 1 second to a few seconds, depends on how busy the cluster is at the moment.

The goal is to establish a sub-minute RPO.  Try it out with the virtual machine; set up two virtual clusters, and establish a 'continuous' policy.  Ingest a new file into the source cluster and watch what happens. Then open that file, modify it, and watch.  Rinse, lather, repeat :-)

1.2K Posts

January 24th, 2014 07:00

Rob, thanks a lot!

57 Posts

July 14th, 2015 08:00

Thought I would post a link to Eyeglass (full disclosure) my company develops it.    Has an RPO reporting module for SyncIQ to trend and report on RPO (age of data for failover) planning based on cluster change rate (also per policy).

The release below is aimed at RTO, which is reducing the time of the steps to failover using automation.   We have reduced the manual steps to < 5 Minutes with automation.   

Superna - Eyeglass Isilon Release 1.4 Overview

No Events found!

Top