This post is more than 5 years old
207 Posts
0
3516
SRDF for SQL Server Database
I'm new to SRDF - I took a class several months ago but have not used it outside of the class. We will be buying 2 new DMX4s later this year. We will do SRDF/A from California to Texas. We have mostly AIX and Windows/SQL data. The AIX file systems are pretty straight forward to me.
It seems like we would setup device groups that contain the sym devices that make up our AIX file systems. By default the file system data would move to Texas every 30 seconds. I realize we could change the SRDF cycle time to a different value if it makes sense. The changed tracks would go from the source R1 devices to the target R2 devices each cycle. If we have a disaster at in Ca (the source R1 site) we would run the symrdf failover command. On the target side in TX we would then run importvg on the AIX volume groups and mount the file systems. We would then be up and running in TX.
The SQL Server environment seems more challenging. Each database is made up of a data file and a transaction log file that are on different sym devices (actually a 24-way meta device for the data and a smaller 12-way meta for the log). Obviously the data and log info must be consistent. Do I need to use some special Time Finder product (someone referenced TSIM)? How does this process work?
It seems like we would setup device groups that contain the sym devices that make up our AIX file systems. By default the file system data would move to Texas every 30 seconds. I realize we could change the SRDF cycle time to a different value if it makes sense. The changed tracks would go from the source R1 devices to the target R2 devices each cycle. If we have a disaster at in Ca (the source R1 site) we would run the symrdf failover command. On the target side in TX we would then run importvg on the AIX volume groups and mount the file systems. We would then be up and running in TX.
The SQL Server environment seems more challenging. Each database is made up of a data file and a transaction log file that are on different sym devices (actually a 24-way meta device for the data and a smaller 12-way meta for the log). Obviously the data and log info must be consistent. Do I need to use some special Time Finder product (someone referenced TSIM)? How does this process work?
bodnarg
385 Posts
0
March 31st, 2008 06:00
Since it is "crash consistent" as has been pointed out the biggest risk revolves around two main points for recovery timing:
1) How busy your application is at the "crash" point. The more pending updates the more work to rollback the changes and start the database.
2) How your application is written. If the application is poorly written it may have a lot of data "pending" and not "committed" which could result in long recovery times. Have seen this on numerous database types (Oracle, SQL, Sybase, etc.) so this is a common issue.
SRDF/AR has its place, but the biggest issues I have with seeing that recommended over SRDF/A are:
1) You have more to manage (scripts, software tools, etc.) which are more likely to break down in a real disaster solution. I have seen numerous multi-step DR processes fail due to lack of maintenance or lack of understanding on how to implement them. Also recovery tends to require more steps than crash recovery.
2) You will increase your RPO (i.e. data-loss) because you will automatically be talking about minutes if not hours depending on your process.
The only reasons I could find to argue for SRDF/AR over SRDF/A are:
1) Your database truly may not recover from a crash consistent copy (if so why in the world are you using that application?)
2) You have a batch intensive application which only does updates at a certain period. It may be more efficient to take a consistent image after the update process and ship via SRDF/AR.
3) You want to use the image for something other than DR recovery. In that case having a "database consistent" copy may be necessary. Some examples for this may be making a copy for your development environment which may require database consistency to do things such as rename your database, apply additional transaction logs, etc.
Big point (in my mind at least) to remember is that a DR copy is NOT the same as a backup copy. You are generally taking these to recover from a major disaster which means the simpler the process the more likely you are to succeed in recovering your applications.
xe2sdc
2.8K Posts
0
March 27th, 2008 17:00
Having that said, you don't need any TF trick since SRDF/A will guarantee that all the volumes in the same RDFG working in ASYNC mode will be "consistent", meaning that every 30 seconds R1 devices will "capture" writes and transmit them to the R2 side. On the other side of the RDF link R2 devices will apply changes received from R1 side as soon as they receive all the changes. The host that see R2 devices can see an image somewhere from 30 to 60 seconds in the past of R1 devices.
AranH1
2.2K Posts
1
March 28th, 2008 10:00
AranH1
2.2K Posts
0
March 28th, 2008 11:00
And yes it is the same type of consistency as a host crash will give you. But will the database recover from it? It depends on the database. Sometimes it requires a dba to roll back transactions until they find a point in the database that they can recover from. I have seen with really large SQL databases (greater than 1TB in size, which is large for SQL) that crash recovery does not always work and requires rolling back to get the database online.
And in Microsoft white papers on Disaster Recovery they do not guarantee crash consistent databases for DR and recommend a method for creating SQL consistent snapshots for guaranteed recovery.
AranH1
2.2K Posts
0
March 28th, 2008 11:00
And the source production cluster has always been able to recover from failovers and crashes, but the replicated data in a crash consistent format has not seemed as resilient at crash recoveries. Not sure why, that is why we have the SQL snapshots every four hours: for a guaranteed RPO.
xe2sdc
2.8K Posts
0
March 28th, 2008 11:00
But what happens if you can't use VDI or VSS ?? What can you do ?? You can use ECA (-consistent option) and trust the storage.
Think to SRDF/A like a consistent snap every 30 seconds (capture) and later you move the snap elsewhere (on the R2 devices).
SRDF/A will guarantees that the image of your data on R2 devices is a "consistent" image, a snapshot taken 30 to 60 seconds ago.
xe2sdc
2.8K Posts
0
March 28th, 2008 11:00
crash will give you. But will the database recover
from it?
Will your database restart if your main host crashes and your cluster switches the application on the other host ?? I hope it will recover. If it won't recover, ask M$ to polish their filesystem and their database
xe2sdc
2.8K Posts
0
March 28th, 2008 11:00
SRDF/A will give you the same image of your data that a crash would give you .. Do you think that your database will recover from an host crash ?? I hope it will
xe2sdc
2.8K Posts
0
March 28th, 2008 11:00
If you trust EMC (and SAP) you can use SRDF/A and have recent data at your R2 site.. If you want "consistent snapshots" you have SRDF/AR that will combine TimeFinder and SRDF to move your data at disaster recovery site. But now your data will be less recent since SRDF/AR will work in cycles and every cycle takes some time. You have to establish BCV volumes at primary site, split them in a consistent way and then you can move data from R1-BCV to R2 devices. SRDF/AR will take care of all the steps .. But they will take time and R2 image will be older.
You have to choose .. If you can't loose any data, use SRDF/S. If you can tolerate to loose data, use SRDF/A if you want to loose only a few seconds of data. If you can tolerate to loose minutes or hours of data, go SRDF/AR
Now you can choose.. But don't ask for discounts !!
xe2sdc
2.8K Posts
0
March 28th, 2008 13:00
AranH1
2.2K Posts
0
March 28th, 2008 13:00
I would think that whether we are talking about SRDF/A, MirrorView/A, or RecoverPoint CRR, each transport protocol needs to ensure that there is write order consistency at the target site. And write order consistency is still only crash consistent whether it is SRDF/A or RecoverPoint CRR. Each of the technologies mentioned would still need an additional mechanism to provide SQL consitency.
AranH1
2.2K Posts
0
March 28th, 2008 13:00
As always I appreciate all the information you share so frequently on this forum
brad12341
207 Posts
0
March 28th, 2008 14:00
Excellent info! As an ex-DBA I can relate to some of the issues Aran is talking about.
We use Microsoft clustering with our SQL databases. If the active host fails the passive node will come alive and take ownership of the database on the symm (basically the .mdf and .ldf files). As the passive node starts SQL Server it will do some kind of transaction backout/cleanup that usually takes 30 seconds to three minutes depending how many transactions were in flight. I don't fully understand this process but it seems to work ok because the DB comes up on the second node. It's very likely that some of the last transactions may be lost.
It sounds like regular old srdf/a of the .ldf and .mdf files would give us the same kind of recovery that occurs on a local cluster node failure. Is this correct? That is SQL would come up in Texas pointing to the meta devices in Texas that make up the .ldf and .mdf files - some transaction recovery/backout would occur and hopefully we would have a good database in Texas.
If this is the case than it seems that SRDF could protect me as well as the current cluster environment protects me locally. It sounds like we are calling this "crash" consistent.
If I wanted to do better - it sounds like I need to use SRDF/AR to combine Time finder clones with SRDF - get a consistent mirror and then replicate it to Texas.
It seems like I can take it one step further by using TimeFinder/SQL Server Integration Module (TF/SIM). Are there additional costs for SRDF/AR and TF/SIM?
Thanks for the great info.
brad12341
207 Posts
0
March 31st, 2008 09:00
What these techniques don't have access to is the SQL buffer/memory on the active node. Upon failure of the active node everything in memory/buffers is lost.
This can cause some problems but it sounds like most of the time when SQL comes up on the failover node attached to the .mdf and .ldf it can usually do some tran log recovery that will allow the database to come up ok. Obviously it won't be exactly the same as before the failure because in flight transactions that a user thought were committed may end up getting rolled back.
That's ok as long as we have a working database at the DR site and we can resume business. Hopefully we won't see corruption when the DB comes up on the failover box. If corruption does occur you may have to go back to a snapshot or other backup.
xe2sdc
2.8K Posts
0
April 1st, 2008 06:00
I expect that any DB will log each transaction (update) even before servicing it, keeping tracks on the disk of what's going on ..
I think that the only transactions that will be "only in cache" are selects and similar "read only" operations.