Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1093

October 31st, 2007 02:00

SRDF/STAR: 1 R1 and 2 R2 --> what happens when the R1 goes down ?

I was wondering.... SRDF/STAR has 1 R1 and 2 R2's, which are syncing using SRDF/S and SRDF/A. What happens when the R1-site goes down ? I can imagine that the /S R2 somehow becomes the new R1 and that the /A R2 will be incrementally updated ???
Can any of the 2 R2's become the new R1 ?
Will the R2 which will be the new source become a true R1 or is this a virtual R1 and will you have replication between 2 R2's ?

2.8K Posts

October 31st, 2007 04:00

As you noted, when the "old workload" site comes up again you'll have a "real" workload site with the R1 devices that are working and "fake" R1 devices that are simply a leftover.

1) SRDF will NEVER EVER overwrite data without prompting you :-)
2) querying the RDFG between the "real" workload and the "old" workload will report the pairs as Invalid (you'll have two R1 devices ;-)

If you want to repair your STAR you need to

1) disable and destroy STAR config (more on this later)
2) break the links between the two R1 devices
3) use symrdf half_deletepair to carefully clean up the configuration
4) bring the links online again
5) create the pairs and issue a full establish leaving the pairs in ACP_DISK
6) recreate STAR and connect/protect the sites, then enable STAR again :-)

Please note that while STAR is disabled, you loose the incremental establish in case of disaster .. But it will leave your data protected since disabling STAR will leave the groups in ACP_DISK (but you can turn them to async or sync by hand)

As I said before .. it's a LOT of HARD work :-) .. But can be done ... At least we did :D

2.8K Posts

October 31st, 2007 03:00

RRR some quick answers to your complex and interesting questions ...

Let's think in term of sites (instead of R1 or R2 devices) .. With STAR you have THREE sites. The Workload site, the Synch target and the Async target. Your Workload site is the site where you have R1 devices while Sync and Asynch target are (obviously) where you have sync and async legs of your concurrent RDF.

When you define a STAR, you need a third RDFG between the Synch and the Async target. In fact you have a triangle (and not a real star ;-) .. But it was easier to find a meaning for 4 letters (STAR) instead of 8 ;-)

The workload site is so called since is the place where you'll have your working hosts, while in the other sites you'll have "silent" hosts waiting for a disaster :-)

When the Workload site fails, you can move the workload (R1 personality) from the current to another site. You can choose whichever site you want. symstar will execute the needed half_delete and half_swap pairs commands and will give you real R1 devices on the NEW workload site, turning the OLD workload to R2 devices. While switching the workload all the establish will be incremental (that's the main point with STAR against the competition).

Obviously you'll usually switch between the workload and the sync target since switching the workload to the async target means you'll loose data (up to 60 seconds) due to SRD/FA. And you usually DON'T WANT TO LOOSE DATA ;-) .. If you are already had a disaster and switched from the old workload to the old sync target and you have a new disaster bringing down your old sync target you have the last chance to move the workload to the third (async) site.

If you do a planned switch between sites you'll end up with a new workload site that pushes data to the sync and async target, just as before. You simply moved the workload from a site to the other.
In case of a real disaster, the old workload site will be unavailable so symstar won't be able to modify the old workload site turning it to a sync target.

Remember that switching sites while all the boxes are available and connectivity between boxes is OK is almost painless.. Switching sites while either boxes or connectivity are NOT good, means you will "failover" painless .. but the "failback" will require some hard work :-)

Just in case you ask .. switching sites with symstar is obviously TOTALLY MANUAL .. when a whole site fails, you already have BIG PROBLEMS .. if the storage decides to "switch" the workload back and forth, you'll have BIGGER problems ;-)

-s-

PS and fortunatly it was a quick answer ;-)

1 Rookie

 • 

5.7K Posts

October 31st, 2007 04:00

Hypothetically speaking: when the workload site goes down we will do a failover to for example the sync site, so the sync R2 will become the new R1 and the async R2 will get incremental updates from the new R1.
But what when the original workload site comes up again ? Since that site was really really down, the R1 didn't get a chance to become R2 and we end up having 2 R1's ! HELP.... What is going to happen at that time ? Will the old workload R1 resume syncing to the async R2 ? What happens with the from R2 to R1 promoted sync R2 (now R1) ?

1 Rookie

 • 

5.7K Posts

October 31st, 2007 06:00

I guess you need a test setup to do some serious testing with this. I'll ask my manager for 3 small DMX's right away !
I'm sure you don't want to find out the hard way if anything goed wrong ;-)

2.8K Posts

October 31st, 2007 06:00

You don't need a whole DMX to test STAR ;-)

In fact if you want to test STAR you need

- three boxes with RF ready for switched-dynamic-concurrent-RDF
- a SAN where your SRDF traffic will flow
- some volumes on the three boxes
- a lot of help from EMC :-)

Setting up STAR requires you to create RDFGs between the boxes, create pairs as needed (you can have STAR if you already have a concurrent RDF) and -obviously- the right license on the hosts (that's why you need help from EMC :-) ). That's it :D

We have our "teststar" (that's the name of the consistency group) that is made of its own three RDFG and a bunch of devices.. The RDFG are defined only on a single RF processor. We can put a single RF processor offline on one of the storages and simulate a real disaster. Obviously without affecting other hosts :-)

1 Rookie

 • 

5.7K Posts

October 31st, 2007 07:00

I still want 3 DMX's :p




But you are right of course.... Our 3rd DMX arrived yesterday (a DMX4) and I think I'm going to ask my EMC account manager about a star license. This could be a great disaster tolerant solution ;)

2.8K Posts

October 31st, 2007 08:00

whooops .. If you want to test with STAR, think about setting up at least 4 hosts that will coordinate STAR. They must be dedicated to STAR. No applications, no database, no anything :-)

Setup 2 hosts on the workload site, 1 host at the sync target and another host at the async target. They will need only an O.S. and Solution Enabler. No need for tons of disk or RAM. Simply a small CPU and an HBA since you need to see gatekeepers ;-)

1 Rookie

 • 

5.7K Posts

November 1st, 2007 00:00

4 hosts just for some very light solutions enabler work ? Will VM's do the trick ? Can't we do the STAR stuff on our ECC hosts ?
We have 6 hosts setup for ECC at the moment and I can imagine they can manage to handle STAR as well.....

2.8K Posts

November 1st, 2007 06:00

Manuals talks about dedicated hosts .. Will VM do the work ?? Don't really know .. our customer had 7 spare little hosts and 7 hba so we made it "a little" redundant :-) .. We have 3 hosts on the usual workload, 2 at sync and another 2 hosts at the async target .. just in case .. ;-)

If you want to TEST star I think you can use ECC hosts. But think about real hosts if you want to go "live" with a real configuration.

1 Rookie

 • 

5.7K Posts

November 1st, 2007 07:00

What is the actual function of those dedicated star hosts ? What is it that they do all day ? Besides wasting power and producing noise and heat of course...

1 Rookie

 • 

5.7K Posts

November 2nd, 2007 00:00

I just read a piece on EMC GDDR and when I look at the pics I see something that looks a lot like SRDF/STAR. Can you compare the two or are these ttwo the same thing with a different name ?

2.8K Posts

November 2nd, 2007 03:00

When you install Solution Enabler on the control hosts you need to enable "STAR support" .. This will install the "symstar" command.

The SCSI commands will be issued by the "storrdfd" which is part of the usual Solution Enabler.. So strictly speaking the "hard work" will be handled by the storrdfd. But you also need to install symstar to configure and manage the environment.

1 Rookie

 • 

5.7K Posts

November 2nd, 2007 03:00

These control stations: do they need something like an agent or something ? An "SRDF" agent for example ?

Waaw... I can imagine /STAR being critical.... I sure would like to play around with that !

1 Rookie

 • 

5.7K Posts

November 2nd, 2007 03:00

Thanks for explaining the diff about GDDR and STAR to me. That one's handy for my exam in 3 hours.....

2.8K Posts

November 2nd, 2007 03:00

I'll try to answer both questions in a single post :-)

1) GDDR is the same as STAR .. GDDR means Mainframe .. STAR means Unix/Windows

2) Control host have a CRITICAL role in STAR. While usually with SRDF/A the STORAGE switches cycles every 30 seconds, with STAR (or GDDR) it's up to the host to switch the cycles. So the host is required to send given SCSI commands against the storage with a strict and accurate timing. Loosing the control host, will bring the whole STAR down. That's why it's suggested to have 2 control host at the primary site .. and that's why we have 3 control host at the primary site ;-)
No Events found!

Top