Unsolved
This post is more than 5 years old
643 Posts
0
3333
How to Tune 'Log File Sync' Waits Event
I am looking at a AWR report with 'log file sync' as one of top wait events. Could anyone help to explain what is the root causes for such 'Log file sync' waits? any guidlines to resolve this problem?
Thanks in advance!
zhaos2
643 Posts
0
March 18th, 2012 19:00
Below is my findings from the internet of the root causes of ‘log file sync’ essentially boil down to few scenarios. The following is not an exhaustive list, by any means.
DSYNC
flag and buffers must be flushed to the disk (or at least, written to the disk array cache in the case of SAN) before LGWR can mark a commit as complete.semctl
calls, and this can increase ‘log file sync’ waits. This can also result in sharp increase in redo wastage statistics.BartS
46 Posts
0
March 19th, 2012 04:00
Something that also might happen is not even contention but simple I/O queueing.
For example, you have 2 HBA's and 20 LUNs and REDO logs are on the same filesystems or ASM disk groups as the rest of the data. Assume we have Powerpath or another load balancer so each of the 20 LUNs is visible on both paths.
The dbwr wants to flush a bunch of dirty SGA buffers and issues a series of writes. So there might be, say, 10 I/O's on each LUN queue, meaning 200 total I/O's distributed over 2 HBA queues - 100 each. It might be that the storage backend is servicing the I/O requests pretty quick, say, 1ms per write, and that the utilization as measured on the storage side is low (say less than 10% on all levels). No bottlenecks.
Now the redo writer needs to do a commit and wants to flush 512K worth of data in 64K writes (so 8 writes of 64K each). The writes will be put on top of the queues and have to wait for the db writes to be finished first (unless some optimization algorithm picks them up first but I know of no such thing).
So now the IO wait time for the redo writes is at least 10 x 1ms (the time it takes to handle the other 10 requests per LUN) plus the time it spends in the HBA queue, plus the service time of the redo write itself. I bet you end up with 12 ms or more for the redo write (where you expect 1-2 ms because of the storage write cache).
If you would isolate the redo logs on completely separate I/O channels (at least separate LUNs but maybe also separate HBA's) you would not have this effect. As the db writes are background, I would always give priority to redo writes. A few ms more for a db buffer flush is not a problem, but an increased redo write delay can be.
Not sure if my interpretation is completely correct but this is what I have noticed several times at customers. Unfortunately I haven't been able to verify by changing the configuration in those cases.
So as soon as the redo write has to wait in line anywhere in the process (Host LUN queue, HBA, storage frontend, you name it) you will see log file sync time going up (and this is completely unnecessary IMO).
Thoughts?
reseach
225 Posts
0
March 19th, 2012 19:00
Log file IO tends to be a seq IO, in most of cases, storage write cache is able to handle it well if storage is running with proper pressure and w/o issues.
But you need to speak with App dev to see how they do commit job in system, too often commit would cause lot of problem.
mkaberle
27 Posts
0
March 22nd, 2012 08:00
Here is another situation that caused log file sync wait event to pop up in the Top 5 Events AND the Avg Wait Time was above 6ms AND the % Total Call time was 10% or more....(kind of a tipping point for me)
For a large & active SAP DB running on a DMX-4, they had a convergence of a few activities dramatically impacting Symm cache (we found this out when we(EMC & customer) sat down and discussed the issue and the nature of redo log IO to the Symm).
For this SAP DB, they were doing pretty large bulk loads, at the same time they were backing up the DB & at the same time they were refreshing 5-6 TF Clones for other DBs, and still servicing normal business transactions for this DB. So the Symm cache was being saturated & working over time to flush dirty blocks(ie. write pendings were being hit). This caused elongated log buffer flushes to occur & therefore log file sync events.
And to top it off, they could not put any more cache in the Symm. So there work around was to offload some workload to another array.
reseach
225 Posts
0
March 22nd, 2012 20:00
Matt, yours is very example of backend storage bottleneck causing high waits,
Have you checked CPU utilization of your DMX? If not much high, <=50-60%, you might consider to place EFD on it to absorb dirty pages quickly.
mkaberle
27 Posts
0
March 23rd, 2012 06:00
Eddie, this is an old issue, and I just posted my reply in response to the community thread on log file sync.
Sent from my blackberry... Please excuse typo's or brief reply since I am all thumbs.