Start a Conversation

Unsolved

This post is more than 5 years old

A

5 Practitioner

 • 

274.2K Posts

3757

June 26th, 2012 05:00

What's the Oracle Redo Log Layout on FAST VP ?

Hello! Buddies.

When I looked through many white papers regarding about Oracle and FAST VP in VMAX,

In some case, the Redo Logs were placed in the FC  pool which consisted of isolated and dedicated physical FC Disks.

As attached  examples shows

FAST_VP_Example.png

Option1 is to allocate the dedicated physical disks(20ea) to Redo Pool

Option2 is to allocate the shared physical disks(40ea to Redo Pool.

What is the best practice?

Should we also consider physical disk contention between redo log and user data file on FAST VP ?

FAST VP is not widely implemented  in Korea.

So I need  real reference configuration or oracle best practices of other regions such as US or EMEA.

Best Regards

YongDae.

15 Posts

June 26th, 2012 08:00

Hi YongDae,

The Oracle log writer process IO profile is sequential writes, and Oracle Archive process will issue sequential reads on the logs. Since ALL writes in VMAX go to Symmetrix cache (which is considered persistent) the disk technology behind them is of less importance; as long as the cache can destage them in a background process fast enough to not create any bottleneck.

As such, in general, EFD’s in VMAX are best used to reduce read response time, or improve read throughput. Therefore typically, there are much better candidate for EFD’s than the logs, such as data files for OLTP workloads (random reads are the primary workload with some percentage of random writes and occasional batch), large indexes in DSS workloads, and such.

So in general based on the guidelines above, a sequential write workload such as the logs, is best served by a FC tier with enough drives to sustain it. Because relatively, logs are small (compared to other database objects) isolating drives for them is wasteful and more often than not it is it likely that someone will try to save cost and not allocate enough spindles to sustain their write workload (note: FC drives have predictable write IOPS capabilities, regardless of storage vendor).

Therefore, it is best to use the same FC drives as the database, and simply take a small portion of each and provide it to the logs. No wastage and plenty of drives available with this method. Striped small on host with an LVM is ideal, for example with Oracle ASM. I’ll only mentioned that in ASM case, it is recommended to use Fine-Grain striping for the logs (which was the default prior to 11gR2, but for some unknown reason Oracle changed it).

So to answer your questions:

  1. I’m not sure which VMAX white paper you are referring to, but at least with this one, both logs and data share the same FC drives. Just as a reminder, in VMAX 20K and 40K we can mix RAID protection on the same drives, so for example we can create a RAID-1 Thin pool for logs, and a RAID-5 for data, and they will both utilize the same spindles. No need for isolation. In VMAX 10K for simplicity we don’t go to the same extent in configuration flexibility and in that case, assuming RAID-5 on the FC tier, you’ll typically have both data and logs protected with RAID-5, and benefit from Symmetrix RAID-5 optimized-writes.

  1. FAST VP does a great job placing the right data in the right storage tier. It will look at multiple criteria and random reads have higher priority than writes and therefore if both data and logs were included in a FAST VP policy the data files will have higher priority to be promoted to tiers like EFD and yet, chances are that if the logs are active, they may join to a degree and take some capacity on the EFD tier. As such, our recommendation is that if you know your log devices and they are not mixed with the data (such as a separate ASM disk group for log vs. data like we recommend), exclude them from FAST policy. If you don’t know the device allocation specifically, or logs and data are mixed, that’s ok and have FAST optimize both.

As always, there are exceptions to the rules, but I hope this helps from a best practice perspective.

Thanks

Yaron

46 Posts

June 27th, 2012 07:00

A few questions on that one.

Even though REDO writes are 100% cached (if all goes well) the writes eventually have to be destaged to disk. Assuming for a moment a spinning disk environment (no EFD), and everything shared (i.e. a spindle holding REDO log data also contains DATA, INDEX, TEMP etc).

Let's say the redo writer flushes 1 MB of redo buffer data and does this in 128K I/O. So 8 sequential I/O's are submitted and handled in cache. Now in the background the cache starts to be destaged from cache to disk. By the time the 2nd 128K piece is done, a random read request comes in on the same spindle. Disk heads move and the read request is serviced in 7-10 ms or so. Rest of the REDO data needs to be flushed. Disk head moves back to the track where the REDO log sits and starts writing another few REDO I/O's. Then another DATA read comes in and the heads move again. By the time the 8 REDO writes get flushed, the heads might have moved several times in between. If you put REDO logs on dedicated spindles (dedicated for REDO logs but I'd say sharing between redo from multiple databases should not matter) this will not happen and all data reads can be read using pre-fetch or at least without large disk head movements. In short: sharing REDO and DATA increases utilization and response time for random reads. This is why I advise my customers, *if* they have high write intensive workloads *or* seeking to get better random response times, to separate REDO logs physically. But as always I am open to different perspectives (i.e. I'd like to hear reasons why NOT to separate even in high write workloads)...?

Re FAST-VP, I don't see why the algorithms are not smart enough to exclude data with 100% sequential from moving to EFD. If REDO is on separate LUNs then IMHO, FAST-VP should be smart enough to detect and handle that, no reason for customers to make exclusions in the policy. FAST-VP was designed to remove the burden of manual tiering from customers, but now we suddenly recommend otherwise for Oracle, even within one dataset (i.e. different parts of a database). Or am I missing something?

Regards,

Bart

63 Posts

June 28th, 2012 06:00

Yaron

Oracle reasoning for using coarse striping on redo logs from 11.2.0.2 is

“The original reason for fine stripping the online log was to reduce latency. That made sense in year 2000 when a disk track was about 128K. It no longer makes sense with track sizes in 2010 of about a megabyte.” - Redo Log Striping In 11.2 ASM, is Coarse Or Fine? [ID 1269158.1]

However, for both DMX4 and VMAX, the track size remains at 64kB so the continued use of fine grained striping for redo Logs sounds reasonable  for lower latency in OLTP environments.

Also

As mentioned, in 11gR2, the default stripe size for all of the templates is 1m (coarse) except for controlfiles which is fine

http://docs.oracle.com/cd/E11882_01/server.112/e16102/asmfiles.htm#g2223792

Oracle's View is

“Coarse-grained striping provides load balancing for disk groups while fine-grained striping reduces latency for certain file types by spreading the load more widely.

To stripe data, Oracle ASM separates files into stripes and spreads data evenly across all of the disks in a disk group. The fine-grained stripe size always equals 128 KB in any configuration; this provides lower I/O latency for small I/O operations. The coarse-grained stripe size is always equal to the AU size (not the data extent size).”

15 Posts

June 29th, 2012 09:00

Hi Bart,

You’re describing IO handling of a JBOD but Symmetrix isn’t one and like I mentioned, since the cache is persistent, we have the liberty of optimizing the backend read and write IO services (to a reasonable degree). Therefore the description of reads interfering with writes and the extra disk head movement isn’t correct. We can change read/write priority, reorder and do a lot of cool stuff to optimize backend resources.

Remember that the question was regarding best practices for logs – share or isolate. There is more than one right answer but from a best practice perspective there are two main considerations:

  1. Have enough resources to sustain the write workload (usually enough spindles as I mentioned earlier – regardless of any storage vendor).
  2. Stripe small from the host to spread the workload across these resources (see my earlier comments about host striping and ASM).

If you want to isolate spindles for logs it is fine, and as we all know it is called ‘short-stroking’ (because disks are relatively large and logs are relatively small). I won’t call it best practice though and it usually leads to some wastage (capacity) or bottlenecks (if not enough spindles were allocated). Even in this scenario, with some concurrency (say, multiple RAC nodes writing to the same ASM disk group, redo mirroring, etc.) there will have to be disk head movement between the redo log files for each thread.

So, again, I’m trying to point to a reasonable best practice and sharing disks between data and log is not a bad idea. If the logs are protected with RAID1 and data with RAID5 they will end up in different Thin Pools but if they share protection, they can use the same.

I know that some will choose to separate logs from data for failure isolation reasons (which is yet another answer), but just remember that both RAID1 and RAID5 protect from a single disk failure inside a RAID group (and RAID6 protects from double disk failure in a RAID group). Therefore even if a Thin pool protected with either RAID1 or RAID5 had multiple drive failures, it will continue to be fully functional (with hot spare copying automatically the data to good drives in the background), as long any RAID group has no more than one drive failure (before the hot spare finished replacing it).

So we can look at it as a triangle between cost, performance and protection. I like to recommend the middle way such as we share the physical resources (disks) but separate the logical (LUNs). Others may choose differently based on their specific circumstances.

Btw, the question about FAST VP is probably a matter for another thread but just one comment for thought. 100% sequential is what a single host process issue. In a normal system, where multiple RAC nodes write to, say a +LOG ASM disk group with 2 log mirrors, ASM striping and an archive process – by the time the IOs get to the storage it is no longer sequential thread. In fact, ASM natively kills all notion of sequentiality – even if the database issued a single threaded FTS. That’s what ASM was always meant to do – stripe everywhere. Given that, like I mentioned earlier, FAST VP does give different priorities to different IO types and besides – who said we shouldn’t optimize writes to the best tier that can support them instead of exclusion?

Thanks,

Yaron

46 Posts

August 14th, 2012 07:00

Yaron,

Thanks, that makes sense (btw late reply due to vacation and catching up )

With enough write cache you could indeed delay sequential writes and then write out a lot of them simultaneously even in a mixed data/log pool. But I guess sometimes at the cost of higher read response times. i.e. if you're in the middle of flushing a few MB of redo data and the host comes in with a random read request on the same drive, you have 2 options: a) abort the write operation in favor of host response time, or b) continue the write operation in favor of lower disk utilization (but the host has to wait a few ms more)... correct?

The fact that ASM breaks sequential I/O is another one. ASM uses (by default) 1MB chunks so if a sequential I/O is less than 1MB and lays in the ASM chunk, it's kind of like a short sequence still. But this is the reason I have recommended my customers to increase the ASM AU size to 8 or 16 MB (my uneducated guess) - so a few MB of sequential I/O's has a good chance of triggering prefetch (or at least sequential physical disk I/O in the case of writes)... right? For DWH, I'd say the bigger the better and I guess for OLTP it does not hurt either to have large AU sizes.

Especially for the REDO ASM disk group you might have 32MB AU sizes and create near-full sequential I/O profiles (with the notable exception if the customer uses SRDF/Sync and goes ASM fine striping - but that is again another discussion)

Regards

Bart

No Events found!

Top