Reducing contention for the LRU chain latch

This topic is about how to reduce contention on the cache buffer chains latch. If you are reading this topic, you may be seeing high wait times on this particular latch. In this topic we will briefly discuss what latches are. We will then talk about what the cache buffer chains latch is, and how to determine if it’s a problem. Finally we will discuss some possible resolutions to the problem.

About Latching

Internally, Oracle uses various types of structures to control access to elements of the databases including latches and various types of enqueueing (locking) mechanisms. Latches are an elemental locking mechanism that are the primary means used to control access to Oracle internal structures, such as internal memory structures in the SGA. The database buffer cache is a memory area in the Shared Global Area (SGA) where Oracle keeps copies of database blocks. Latching issues can be one major source of performance problems within an Oracle database.
When a process tries to use the latch, it is said to acquire it. Once a process acquires the latch, no other processes can acquire that latch. Latches are like a race in a way. They do not have a queuing mechanism and thus, getting a latch is a first serve first come proposition. As a result all process have to contend with each other as they attempt to acquire the same latch. Since there is generally only one latch per structure (with some exceptions) this competition for latches can lead to significant contention issues. The good news is that once the latch is acquired, it is generally released quickly. Exceptions to this rule can be the cause of additional performance problems.
Latch requests come in two flavors, willing to wait and no-wait modes. In willing to wait mode, when a latch cannot be acquired, the acquiring session will go into a spin mode, attempting to acquire the latch over and over a specified number of times. After the number of spins has reached a specific threshold, the session will sleep for a specified period of time, wake up and try the latch again. Spin mode is bad, sleep mode is worse!
A latch request may also be in no wait mode. This is a latching mode that if the request for the latch is not successful then the latch request will simply fail. The process is then free to simply try to reacquire the latch again, or perhaps try to acquire another latch.

Buffer Cache and Latching

The database buffer cache is a memory area in the Shared Global Area (SGA) where Oracle keeps copies of database blocks. Two latches are used to principally control access to structures within the SGA, the cache buffer LRU chains latch and the cache buffer chains latch.
This topic addresses the LRU chain latch. The purpose of the LRU chain latch is to protect the LRU chain, which is a list of buffers maintained to support the relative 'hotness' or 'coldness' of buffers and the associated aging mechanism that will age cold buffers out of the cache. Problems with this latch can cause unfortunate response problems with your database.
The LRU latch controls access two LRU lists. The first list, the LRUW, tracks "dirty" (modified blocks not yet written to disk) block buffers. The second list, the LRU, tracks "clean" (unmodified) block buffers. Both lists track the least recently used blocks. When a block is used or freed, it goes to the top of the list, and the blocks below are moved down the list. After the dirty blocks on the LRUW reach a certain level, they are written to disk and placed on the LRU list containing clean blocks. When a clean block is needed, the LRU list is searched. When a clean block is modified, it is placed on the LRUW list for writing by the DB writer.

Cache Buffers LRU Chain Latch Causes

The cache buffers LRU chain latch is held when updates are made to the LRU (Least Recently Used) lists. Cache buffers LRU chain latch contention is caused when the LRU and LRUW lists are updated and the chains of buffers in each hash bucket are too numerous to be scanned quickly. Contention is also caused when the volume of transactions overloads the block tracking on the LRU lists.
Waits on latch free wait event ‘Cache Buffers LRU Chain’ may be caused by one of the following:
  • SQL statements with very high logical or physical I/O, due to unselective indexes (large index range scans) or many full table scans.
     
  • DBWR not keeping up with the dirty workload, which forces the foreground process to hold the latch longer looking for a free buffer.
     
  • Undersized buffer cache.

Solutions

  1. High logical IO rates Because latches are used to protect the buffer cache, very high rates of logical IO - possibly due to untuned SQL - will increase the rate and duration of latch holds and increase the probability that latch contention will occur. The solution to this problem is tuning your SQL to reduce overall LIO’s.
    To solve this problem, first identify SQL statements that are performing a high amount of I/O and tune them to reduce the number of buffer gets and disk reads.
    To determine SQL statements with the most disk reads per execution:
    column per_exec format 999,999,999
    column disk_reads format 999,999,999  
    
    select disk_reads, executions, disk_reads/executions per_exec,
           substr(sql_text,1,4000) SQL
      from v$sqlarea
     where executions > 1
       and disk_reads/executions > 10000
    order by disk_reads/executions desc
    /
    To determine SQL statements with the most buffer gets per execution:
    column per_exec format 999,999,999
    column buffer_gets format 999,999,999  
    
    select buffer_gets, executions, buffer_gets/executions per_exec,
           substr(sql_text,1,4000) SQL
      from v$sqlarea
     where executions > 1
       and buffer_gets/executions > 10000
    order by buffer_gets/executions desc
    /
    Another potential solution is to increase the parameter db_block_lru_latches setting in Oracle versions earlier than 9i (this parameter is a hidden parameter in Oracle9i, _db_block_lru_latches). We suggest setting this parameter to a value of 2 times the number of CPU’s in your system.
  2. Determine if the Database Writer (DBWR) background process is not keeping up with writes of dirty buffers back to disk. Check and see if there are sessions waiting on a ‘free buffer’ wait event.
    column event format a30       
    column username format a20       
    column state format a10 trunc          
    column p1 format 999999999999 heading "P1"      
    column p2 format 999999999999 heading "P2"   
    column p3 format 99999 heading "P3"     
    set lin 132
    
    SELECT username, V$SESSION_WAIT.sid, event, seq#,
           seconds_in_wait, wait_time, p1 , p2, p3,
           state, p1raw, p2raw, p3raw     
      FROM V$SESSION_WAIT, V$SESSION
    WHERE V$SESSION_WAIT.SID = V$SESSION.SID
    AND event = ‘free buffer’
    /
    These waits could indicate that there is unselective SQL flooding the buffer cache or that the DBWR is not writing buffers to disk fast enough. A good way to tell if the DBWR process is not keeping up is to compare checkpoints started and completed.
    select *  
    from v$sysstat  
    where name like 'background checkpoint%' 
    /
    When a checkpoint occurs, the Log Writer (LGWR) process will instruct the DBWR to write buffers to disk. If checkpoints started and completed are not the same, then a checkpoint has occurred before the last one has completed. The way to remedy this is to increase the size of the redo logs and make sure that checkpoints only occur at a log switch.
    To find the size of the redo log files and how frequently the log switch is occurring:
    col bytes format 999,999,999  
    
    select a.group#, b.member, a.bytes, a.first_time 
      from v$log a, v$logfile b 
     where a.group# = b.group# 
    /
    The solution to this problem is to either reduce the load on the DBWR process, or increase the number of DBWR processes (typically if you are not running Asynchronous IO). You can configure multiple DBWR processes via the db_writer_processes parameter. This parameter allows you to configure up to 10 database writer processes. Generally when using AIO, you will not see any benefits when using multiple DBWR processes.
    Another solution is to look at the overall placement of files on the disks and review the IO distribution, to ensure that the datafiles being written to are receiving an optimal distribution of all disk IO’s. Also, you should ensure your disks are running optimally, as is the rest of the system. We have also seen asynchronous IO (AIO) configurations cause significant problems with DBWR performance.
  3. The buffer cache is undersized. A good buffer cache hit ratio does not necessarily mean that the buffer cache is sized correctly. As a matter of fact, poorly tuned SQL can inflate this as a large number of the same buffers are accessed more then once. However, a poor hit ratio with tuned SQL in the application can be an indicator that the buffer cache is too small.
    To check the buffer cache hit ratio:
    col "Buffer Cache Hit Ratio" format a23   
    
    select to_char((sum(decode(name, 'consistent gets',value, 0)) + 
           sum(decode(name, 'db block gets',value, 0)) – 
           sum(decode(name, 'physical reads',value, 0)) – 
           sum(decode(name, 'physical reads direct',value, 0))) / 
           (sum(decode(name, 'consistent gets',value, 0)) + 
           sum(decode(name, 'db block gets',value, 0))) * 
           100,'999.99') || ' %' "Buffer Cache Hit Ratio"  
    from v$sysstat st 
    /
    If it appears that the buffer cache will benefit from being made larger, then certainly consider increasing the size of the database buffer cache. Indications that the buffer cache is stressed, and perhaps needing resized, are high wait values for these events:
    • Free buffer waits
    • Buffer deadlock
    • Buffer busy waits
    These statistics, if high, might also indicate a stressed database buffer cache:
    • Summed dirty queue length
    • DBWR make free requests
    • DBWR summed scan depth
    • DBWR buffers scanned
    • Free buffer requested
    • Dirty buffers inspected
    • Pinned buffers inspected