Sunday, October 27, 2019

latch: cache buffers chains


latch: cache buffers chains
table of contents
·         
A session that seeks or modifies a hash chain to use the buffer cache must acquire a cache buffers chains latch (hereafter referred to as a CBC latch) that manages the chain. When acquiring this latch, if a conflict occurs, it waits in a latch: cache buffers chains wait event.
Starting with Oracle9i, to protect read-only, CBC latches can be acquired in shared mode when searching for chains, reducing contention. There is one thing to note about this. Theoretically, if a SELECT statement is issued at the same time, the CBC latch can be shared, so CBC latch contention should not occur. . The reason is related to buffer locks. If you acquire a latch in shared mode to work read-only, when you scan the buffer, you acquire a buffer lock in shared mode, but because you change some information in the buffer header, the buffer lock The latch must be changed to exclusive mode when acquiring and releasing. A conflict occurs during this process, which causes it to wait on a latch: cache buffers chains wait event.
There are two typical cases where CBC latch contention occurs.
·        1) Inefficient SQL
·        2) Hot block

1.1 Inefficient SQL
Inefficient SQL is the most important cause of CBC latch contention. For example, if multiple sessions simultaneously scan a wide range of indexes or a wide range of tables, there will be a lot of CBC latch contention.
When CBC latch contention occurs, it is important to investigate the exact cause of the contention. The most reliable basis for determining whether latch contention is occurring in a hot block or inefficient SQL is in the SQL itself. Therefore, if there is a reason that it can be determined that an inefficient SQL has been created, the problem can be solved by tuning the SQL.
If you don't have information about SQL, there is another way to determine if it's a hot block problem or an inefficient SQL problem. First, in the V $ LATCH_CHILDREN view, the CHILD # corresponding to the child CBC latch is compared with the GETS and SLEEPS values ​​to check whether the usage count and contention are concentrated in a specific child latch. Use the following command to check the child latch that has a large number of SLEEPS.
SQL>
select * from
(select child #, gets, sleeps from v $ latch_children
          where name = 'cache buffers chains'
          order by sleeps desc
) where rownum <= 20
If the GETS and SLEEPS values ​​of a child latch are very high compared to other child latches, it can be inferred that there is a hot block in the chain managed by that latch. As a result of the above test, it is determined that there is no problem with hot block because there is no phenomenon overlapping with a specific latch.
    CHILD # GETS SLEEPS
---------- ---------- ----------
       329 81160 78
       940 79773 74
       232 62792 69
       791 99123 68
       905 99185 68
       408 80687 65
       259 101793 62
       611 82187 62
       466 99870 60
       839 79744 60
       ...
Another way to determine when a hot block has occurred is to check and compare the latch address from the V $ SESSION_WAIT view. For CBC latches, V $ SESSION_WAIT.P1RAW corresponds to the child latch address. If the V $ SESSION_WAIT view appears to have too many addresses for the same latch, it can be interpreted as a hot block contention.

1.2 Hot block
Despite SQL being tuned properly, CBC latch contention may not be resolved. In applications where the SQL is written to scan a few specific blocks many times, running this SQL simultaneously in different sessions will cause CBC latch contention due to hot blocks.
In the V $ LATCH_CHILDREN view, you can determine that there was a hot block latch contention by checking that only certain child latches are being used. Alternatively, the value of the P1RAW column in the V $ SESSION_WAIT view may be used as the latch address when searching.
In the command execution results below, the three child latches CHILD # 569, 827, and 178 are used intensively, confirming that latch contention has occurred.
SQL>
select * from
(select addr, child #, gets, sleeps from v $ latch_children
        where name = 'cache buffers chains'
order by sleeps desc
) where rownum <= 20
;

ADDR CHILD # GETS SLEEPS
---------------- ---------- ---------- ----------
C0000000CDFF24F0 569 10500275 11298 <-Concentrated use of latches
C0000000CE3ADDF0 827 5250508 8085
C0000000CDF18A98 178 5250192 4781
C0000000CDEDB6E8 68 3786 17
C0000000CE3CBEE0 881 2121 8
C0000000CE359430 675 1768 1
C0000000CDEB6230 1 235 0
C0000000CDEB6B18 2 171 0
C0000000CDEB7400 3 390 0
C0000000CDEB7CE8 4 192 0
C0000000CDEB85D0 5 151 0
...
You can use the X $ BH view to see which blocks are hot blocks. Extract blocks that are user objects (tables, indexes) but have a high number of touches (TCH), and check for hot blocks. From the results below, we can deduce that most of the contention is occurring in the 45918, 45919, and 100 blocks of the CBC_TEST_IDX index.
SQL> select hladdr, obj,
(select object_name from dba_objects where
     (data_object_id is null and object_id = x.obj) or
      data_object_id = x.obj and rownum = 1) as object_name,
       dbarfil, dbablk, tch from x $ bh x
where hladdr in
    ('C0000000CDFF24F0', 'C0000000CE3ADDF0', 'C0000000CDF18A98')
order by hladdr, obj;

HLADDR OBJ OBJECT_NAME DBARFIL DBABLK TCH
------------------- -------- ---------------- ------- --- ---------- ------
C0000000CDF18A98 57 OBJAUTH $ 1 43911 1
C0000000CDF18A98 73 IDL_UB1 $ 1 27025 1
C0000000CDF18A98 181 C_TOID_VERSION # 1 26792 1
C0000000CDF18A98 181 C_TOID_VERSION # 1 14244 1
...
C0000000CDF18A98 55236 CBC_TEST_IDX 4 45919 130
...
C0000000CDFF24F0 55236 CBC_TEST_IDX 4 45918 130
C0000000CE3ADDF0 2 IND $ 1 7933 1
C0000000CE3ADDF0 2 IND $ 1 60455 9
C0000000CE3ADDF0 18 OBJ $ 1 29623 1
...
C0000000CE3ADDF0 55236 CBC_TEST_IDX 4 100 130


2. Waiting parameters and waiting time

2.1 Standby parameters
·        P1: Latch address
·        P2: Latch number (same as V $ LATCHNAME.LATCH #). To check the latch name that corresponds to the latch number, execute the following SQL.
SELECT * FROM v $ latchname WHERE latch # = & p2_value;
·        P3: Number of trials. The number of times the process has tried to acquire a latch.

2.2 Standby time
The waiting time increases exponentially.

3. Checkpoints and solutions

3.1 Inefficient SQL tuning
Tuning inefficient SQL to reduce logical reads reduces CBC latch contention by reducing access to the buffer cache.

3.2 Distributing hot blocks
If CBC latch contention occurs because of hot blocks, you can reduce contention by distributing the hot blocks. The method for distributing hot blocks is as follows.
·        Reduce block contention by increasing PCTFREE or using smaller blocks. 
Increasing PCTFREE and using smaller blocks are basically the same way to reduce block contention by reducing the number of rows in a block. While this method is certainly effective in reducing block contention, it can increase the number of blocks that must be managed and can affect the system. For example, the number of blocks increases and performance is degraded because a very large number of blocks must be scanned for the same SQL. Although latch contention due to hot blocks has decreased, the number of scans increases and latch contention increases. Therefore, full verification by testing is recommended.

·        Use the partitioning function to physically put the row in another block. 
This feature avoids latch contention by physically distributing problematic rows across other blocks. However, applying this feature to a table can cause the index clustering factor to be inefficient and slow down the data scan rate of the index range scan. Therefore, when partitioning is applied, the effect must be fully considered.

·        Delete only the rows of blocks that could cause problems, and then reinsert them. 
This method is only possible for tables. If you know exactly the ROWID of the block in question and the rows contained in the block, you can delete the row and reinsert it to distribute each row into a separate block. Use the block dump and the DBMS_ROWID package to find out which ROWID has become a hot block. This method is the most ideal because it does not change the attributes of the table. However, it is not applicable if the hot block is not fixed and changes every time depending on the SQL WHERE condition clause. Also, this method cannot be used for indexes.


If CBC latch contention occurs on a table, it can be resolved relatively easily. This is because there are various ways to distribute the rows. On the other hand, if there are conflicts in the index, it is very difficult to solve the problem. This is because the inherent characteristics of indexes stored in sorted form may make it impossible to distribute them in arbitrary blocks. In this case, there is no solution other than increasing PCTFREE or reducing the block. In this case, the number of blocks increases, and on the contrary, latch contention may increase.

4. Tips

4.1 Buffer cache structure
Oracle stores information about recently used blocks in a memory area to minimize physical I / O. This memory area is called the buffer cache. The buffer cache is one of the most important memory areas that make up the SGA along with the shared pool and redo log buffer. You can see the size of the instance buffer cache with the following command:
SQL> show sga
Total System Global Area 314572800 bytes
Fixed Size 788692 bytes
Variable Size 144963372 bytes
Database Buffers 163577856 bytes
Redo Buffers 5242880 bytes
The value of Database Buffers is the size of the current buffer cache for the instance.
Oracle uses a hash chain structure to manage the buffer cache efficiently. The hash chain exists in the shared pool and consists of the bucket-> chain-> header mechanism, which is Oracle's typical memory management technique.
Latch_Cache_Buffers_Chains Max Gauge
The starting point of the hash chain structure is a hash table. A hash table consists of several hash buckets. One hash bucket is the result of the hash function. Oracle searches for a hash bucket using the block address (DBA: Data Block Address, which consists of File # and Block #) and the result of applying a simple hash function to the block class. A hash bucket is a chain of buffer headers with the same hash value. The buffer header contains meta information for the buffer and a pointer value for the actual buffer in the memory area. The hash chain structure exists in the shared pool, but the actual information for the buffer exists in the buffer cache area.
The hash chain structure is protected with a CBC latch. A session that scans a particular block must acquire a CBC latch that manages the hash chain for that block. Basically, only one session can get one CBC latch. One CBC latch also manages several hash chains. Therefore, if a large number of sessions are searching the buffer cache at the same time, a contention will occur to acquire the CBC latch and wait in the latch: cache buffers chains wait event. From Oracle9i, CBC latches are acquired in shared mode only for read-only work. Thus, CBC latches can be shared between sessions that are simultaneously reading. However, as described above, when acquiring or releasing a buffer lock for a buffer, it is necessary to change the CBC latch to exclusive mode. .
The number of CBC latches can be obtained with the following command.
SQL> select count (*) from v $ latch_children where name =
        'cache buffers chains';
  COUNT (*)
  ----------
       1024
Alternatively, you can get the same result by checking the _DB_BLOCK_HASH_LATCHES hidden parameter. You can check the number of hash buckets with the _DB_BLOCK_HASH_BUCKETS hidden parameter.

5. Analysis examples

5.1 Waiting with an incorrect index column
The verification scenario of latch: cache buffers chains waiting event occurrence is as follows.
·        The t_cache_buffers_chains_latch (type, name, id) table has 160,000 data entered, and the number of data satisfying type = 4 and name = 'KIM' is 150,000.
·        The idx_cache_buffers_chains_latch index consists of two columns, type and name.
·        Execute SQL using idx_cache_buffers_chains_latch index to find the number of data that 10 sessions satisfy (type = 4, name = 'KIM', id = 4) at the same time.
·        The session is running and waits on a latch: cache buffers chains wait event.
Execute the following SQL from multiple sessions at the same time.
Latch_Cache_Buffers_Chains Max Gauge
The results of monitoring in this situation are as follows. You can see that the wait event for latch: cache buffers chains is occurring most.
Execution result
Type = EVENT, Name = jobq slave wait, Value = 50404 (cs) 
Type = EVENT, Name = latch: cache buffers chains, Value = 3061 (cs)
Type = EVENT, Name = read by other session, Value = 677 (cs ) 
Type = EVENT, Name = db file sequential read, Value = 537 (cs) 
Type = EVENT, Name = events in waitclass Other, Value = 534 (cs) 
Type = EVENT, Name = library cache pin, Value = 73 (cs ) 
Type = EVENT, Name = enq: TX – row lock contention, Value = 63 (cs) 
Type = EVENT, Name = buffer busy waits, Value = 15 (cs) 
Type = EVENT, Name = latch: library cache, Value = 0 (cs) 
Type = EVENT, Name = latch: shared pool, Value = 0 (cs) 
Type = EVENT, Name = cursor: mutex X, Value = 0 (cs) 
Type = STATS, Name = session pga memory max, Value = 12955464
Type = STATS, Name = session logical reads, Value = 7205239 
Type = STATS, Name = redo size, Value = 25088 
Type = STATS, Name = execute count, Value = 513 
Type = STATS, Name = physical reads, Value = 356 
Type = STATS, Name = parse count (total), Value = 229 
Type = STATS, Name = sorts (memory), Value = 110 
Type = STATS, Name = parse time elapsed, Value = 92 
Type = STATS, Name = redo entries, Value = 87 
Type = STATS, Name = parse count (hard), Value = 62 
Type = STATS, Name = session cursor cache hits, Value = 53 
Type = STATS, Name = user commits, Value = 20 
Type = STATS, Name = sorts (disk), Value = 0 
Type = TIME, Name = DB time, Value = 29893 (cs) 
Type = TIME, Name = sql execute elapsed time, Value = 29808 (cs)
Type = TIME, Name = parse time elapsed, Value = 103 (cs) 
Type = TIME, Name = hard parse elapsed time, Value = 23 (cs)
The reason why the latch: cache buffers chains wait event occurs in this situation is as follows.
·        When data is searched using the idx_cache_buffers_chains index, 150,000 out of 160,000 data will satisfy the condition. Access to the table occurs again for these 150,000 cases, resulting in repeated random access to a wide range of blocks.
·        The buffer cache is managed by a hash chain to improve search performance, and a latch is always acquired to search this chain. However, contention occurs because 10 sessions acquire the latch at the same time, and waiting in the latch: cache buffers chains wait event is unavoidable.
In the SQL search condition, the id column has good selectivity, so this column should be used when using the index.
·        Add id column to existing idx_cache_buffers_chains.
·        Searching the table as many as 150,000 times will result in a single search, so the contention will be significantly reduced.
Latch_Cache_Buffers_Chains Max Gauge
-Remove existing idx_cache_buffers_chains index
DROP INDEX idx_cache_buffers_chains;

-Add id column to index and recreate
CREATE INDEX idx_cache_buffers_chains
ON t_cache_buffers_chains_latch (type, name, id);
The system monitoring results after performance improvement are as follows. The waiting time of latch: cache buffers chains waiting event was 1133 (cs), which was about 3 times better than the waiting time before performance improvement was 3061 (cs).
Execution result
Type = EVENT, Name = jobq slave wait, Value = 4994 (cs) 
Type = EVENT, Name = latch: cache buffers chains, Value = 1133 (cs)
Type = EVENT, Name = latch: library cache, Value = 362 (cs ) 
Type = EVENT, Name = enq: TX – row lock contention, Value = 284 (cs) 
Type = EVENT, Name = latch: library cache pin, Value = 135 (cs) 
Type = EVENT, Name = library cache load lock, Value = 75 (cs) 
Type = EVENT, Name = library cache pin, Value = 32 (cs) 
Type = EVENT, Name = events in waitclass Other, Value = 18 (cs) 
Type = EVENT, Name = db file sequential read, Value = 2 (cs) 
Type = EVENT, Name = cursor: mutex S, Value = 0 (cs) 
Type = EVENT, Name = latch: In memory undo latch, Value = 0 (cs) 
Type = STATS, Name = session pga memory max, Value = 10530632
Type = STATS, Name = session logical reads, Value = 3745047 
Type = STATS, Name = execute count, Value = 748866 
Type = STATS, Name = redo size, Value = 22120 
Type = STATS, Name = parse count (total), Value = 296 
Type = STATS, Name = sorts (memory), Value = 159 
Type = STATS, Name = session cursor cache hits, Value = 131 
Type = STATS, Name = parse count (hard), Value = 67 
Type = STATS, Name = parse time elapsed, Value = 57 
Type = STATS, Name = redo entries, Value = 53 
Type = STATS, Name = user commits, Value = 20 
Type = STATS, Name = physical reads, Value = 5 
Type = STATS, Name = sorts (disk), Value = 0 
Type = TIME, Name = DB time, Value = 26548 (cs) 
Type = TIME, Name = sql execute elapsed time, Value = 26534 (cs)
Type = TIME, Name = parse time elapsed, Value = 85 (cs) 
Type = TIME, Name = hard parse elapsed time, Value = 36 (cs)

5.2 Waits due to excessive logical reads


5.2.1 Checking the performance degradation section
When checking the graph from the operation log collected on the instance where the performance problem occurred, the number of “Active session” and “Wait” events changed in a similar manner around 21:35.
■ Transition graph of “Active Session”
Max gauge screen
■ Wait event transition graph (waiting time or number of waiting times)
Max gauge screen
5.2.2 Detection and analysis of wait events
In order to investigate the cause of performance degradation due to the rapid increase in active sessions, check the occurrence of the standby event at the time of the problem (21:35).
As a result of checking the top wait event at this point in the `` Value '' tab, it is confirmed that the top wait event except for Idle Event (= SQL * Net message from client) is latch free
Max gauge screen
In order to determine the relevance of the latch free wait event to the sudden increase in active sessions, we compared the occurrence pattern with the wait event. This accounts for about 53.4% ​​(58.69 seconds out of a total of 109.89 seconds) of wait time for all wait events, so it can be inferred that the surge in active sessions is related to the rapid occurrence of latch free wait events.
Max gauge screen
Actually, even in the session list screen displaying the detailed contents at the same time, the latch free wait event is the top wait event, and among them, it is confirmed that there are many latch: cache buffers chain wait events Will be.
Max gauge screen
5.2.3 Cause of wait event occurrence
There are various causes for the occurrence of a latch free wait event. Generally, when a latch: cache buffers chain wait event occurs, a hot block is often the cause. Hot blocks occur more frequently with index range scans than with full table scans. The solution is to perform SQL tuning and reduce the scope of searching the index. If SQL tuning is not possible, you can reduce the number of rows per block by reducing the block size or increasing the PCT FREE value.

5.2.4 session and SQL analysis
latch: cache buffers chain Wait events occurred frequently from 21:30 to 21:40. Checking the list of SQL that caused latch free revealed that the SQL had the same pattern.
Max gauge screen
5.2.5 Conclusion
latch: cache buffers chain waiting, rapid increase of active sessions due to frequent events 
                ↓ 
performance degradation due to inefficient index scan 
                ↓ 
re-create index to execute only index scan on SQL search condition column, Resolution

5.3 Standby by selecting an inefficient index I
Check the active session transition in the instance where the performance problem occurred. Verify that the active session in the problem interval is waiting on a latch: cache buffers chain wait event and that the address of the active session list latch is 16321267696.
Max gauge screen
Combine the address of the latch where the latch contention occurred with the X $ bh view to see which block of the object is and which block has the highest TCH.
-SQL to check block with X $ bh view-
select a.hladdr, a.file #, a.dbablk, a.tch, a.obj,
       b.object_name, b.object_type
from sys.xm $ bh a, dba_objects b
where (a.obj = b.object_id or a.obj = b.data_object_id)
and a.hladdr = '00000003CCD2C7F0'
order by 4 desc
The result of searching with the same latch address results in several object result values ​​because one latch manages several chains. The instance in question has 1024 CBC latches and 254083 hash buckets. In other words, in the instance in question, one CBC latch manages about 249 chains.
Since the ABC $ 123 index has the highest TCH value, it is inferred that it is the block that caused the CBC latch contention.
Max gauge screen
Check the table for this index through the DBA_INDEXES view and check the index structure of the table.
select index_name, table_name
from dba_indexes
where index_name = 'ABC $ 123';
Max gauge screen
The ABC table holds a unique index ABC $ PK index composed of ABC_NUM + ABC_TIM and an ABC $ 123 index composed of ABC_TYPE. The index that caused the latch contention is the ABC $ 123 index.
Max gauge screen
Check the following SQL trace results for the session where the conflict occurred.
-SQL-
UPDATE ABC
SET version $$ = version $$ + 1,
          type $$ = 'I'
WHERE ABC_NUM =: b2
AND ABC_TIM =: b1
AND ABC_TYPE = 'D'
Latch_Cache_Buffers_Chains Max Gauge
You can see that a wide range index was selected even though the SQL WHERE condition included the ABC $ PK column. (It is confirmed from the part displayed in red that 174429 blocks were accessed unnecessarily.)
Now, you can fix the latch: cache buffers chain conflict by changing the hint to run through the ABC $ PK index.

5.4 Standby by selecting an inefficient index II
The system CPU utilization graph is similar to logical and physical reads. The Execute Count, which is the number of SQL executions, maintains an average of about 500 times, and it is unlikely that the CPU usage has increased due to many SQL executions.
If you check the transition of the “Active Session” graph in this section, you can see that it is very similar to the transition graph of the latch free wait event graph. It also checks for sessions waiting for a latch: cache buffers chain wait event.
Max gauge screen
latch: cache buffers chain wait event occurs when processing a wide range during SQL execution. In addition, since the occurrence of db file sequencial reads wait event is similar, it is considered that the index scan was performed with an index where SQL was inefficient.
Max gauge screen
This is the case when the statistics are changed in the SQL that the session is executing and an inefficient index is selected to access the data instead of the original index.
For example, in the following SQL, since the function called TO_NUMBER was used for the emp_id column, the corresponding index could not be used and it was considered that the execution was performed using the index of the emp_name column.
SELECT emp_id, emp_name, salary, emp_date, dept_id
FROM emp
WHERE emp_date> =: 1
AND emp_date <=: 2
AND    TO_NUMBER (emp_id) =: 3
AND emp_name like: 4
AND job_id =: 5
In this case, it is recommended to change the SQL as follows.
SELECT emp_id, emp_name, salary, emp_date, dept_id
FROM emp
WHERE emp_date> =: 1
AND emp_date <=: 2
AND     emp_id = to_char (: 3)
AND emp_name like: 4
AND job_id =: 5
In this way, when a latch: cache buffers chain wait event occurs, it is necessary to extract SQL with a large amount of processing and tune it to reduce the amount of processing.