Database Specialist in NTT Data Corporation • Design, Migration, Troubleshooting of Database in various large systems • Oracle, Exadata, OCI • Community (JPOUG, MOSC, Blog, Oracle ACE) https://community.oracle.com/mosc/profile/discussions/Kazuhiro
• Migration of a large mission critical system (2017 – 2021) • Consolidate to 19c (Exadata X7-2) from 11g/9i • Database upgrade from 12c to 19c in 2019
optimizer • Determine new optimizer’s behavior in design phase • Performance Tuning for each issue on test phase (1)Determine basic migration policy (2)Design of (2)Design of initialization parameters (4)Performance Test (Tuning) (3)Design of statistics gathering operation Determine a basic policy on the new optimizer behavior Implementation and Test • Top down (examine new features and obsolete parameters) • Bottom up (examine parameters of old system) Determine how to operate statistics gathering (system, dictionary, object) on the new system Performance tuning for each issue that revealed by performance test
• Requirements on database migration • No application change, with same execution plans as old system • Minimum dynamic change of execution plan during service time • Consideration points that affects optimizer’s behavior • Inherit: hint, statistics lock, no SPM • Change: manual to automatic statistics gathering • New feature: pending stats, no adaptive feature Tables DML AP Statistics information Statistics gathering Library Cache Hard parse Execution plans SQL • Bind-variable • Size of shared pool • pin/warm up of SQL cursor • HINT • SPM • Adaptive query optimization • Method (auto/manual) • Target/Timing • Adaptive stats • Lock stats • Pending stats
policy of adaptive plan • Determined whether to use adaptive feature in view of the system characteristics Adaptive Plan Adaptive Stats Change a plan from Nested Loop to Hash Join Most SQLs were already tuned with HINT Join method Parallel distribution Bitmap index pruning bitmap indexes are unused Change PX SEND RANDOM to PX SEND HASH Prune inefficient bitmap indexes Dynamic stats (Dynamic sampling) Stats feedback (cardinality feedback) Very limited use of parallel query Performance feedback SQL plan directive/ automatic extended stats Optimizer’s feature Description System Characteristics System Characteristics Gather stats for objects with dynamically Gather stats for objects with insufficient stats dynamically Monitor the num. of rows expectation Monitor the num. of rows during query execution. Hard parse when the num. differs from the optimizer's expectation Change parallel distribution method or stop parallel operation Gather additional stats Stats are gathered as gather dynamically) Stats are gathered as planned (no need to gather dynamically) Most SQLs were already tuned with HINT Do not change the Do not change the degree of parallelism (specify parallel hint explicitly) Most SQLs were already tuned with HINT optimizer_adaptive_ plans optimizer_adaptive_ plans optimizer_adaptive_ statistics optimizer_adaptive_ statistics Migration Policy Migration Policy Automatic re- optimization
• Top down (examine new features and obsolete parameters) • Parameters relates to adaptive query optimization should be carefully configured, in terms of migrating from 11g • Careful about online stats gathering on direct path load • Bottom up (examine parameters of old system) • check initialization parameters that have different values with 10053 trace • take the trace on old and new system as the following, and check "PARAMETERS USED BY THE OPTIMIZER" section • Confirm the reason of the difference, and determine whether to inherit or not • Specifically, be careful about hidden parameters explicitly modified from its default values in old system • In principle, avoid changing the hidden parameters (should be modified by instruction from Oracle support) -- to enable the 10053 trace alter session set events ‘10053 trace name context forever’; <execute any SQL> -- stop tracing alter session set events ‘10053 trace name context off’; -- to enable the 10053 trace alter session set events ‘10053 trace name context forever’; <execute any SQL> -- stop tracing alter session set events ‘10053 trace name context off’;
operation • Automatic statistics gathering on CDB/PDB level • Controlled by maintenance task, AUTOSTATS_TARGET • For performance test, switch AUTOSTATS_TARGET from “auto” to “oracle” to disable auto stats • Decide how to gather (or not gather) object, dictionary/fix, and system stats - Set table preference for pending stats (PUBLISH to true) - Set the same system stats for all environments PDB CDB Statistics Information gathered by auto stats - Pending stats - Locking stats gathered by auto stats - No workload - Exadata mode - The same stats in all environments in performance tests -Enable/Disable auto stats (*) -Maintenance window * We switch the target to "oracle" to fix the statistics in performance tests Auto stats gathering -Enable auto stats -Maintenance window Dictionary/fix stats Statistics of dictionaries (dba_) and fixed objects (x$...) Object stats statistics of tables/indexes System stats Statistics of I/O and CPU performance, as well as Exadata specific mode + object stats (auto) A feature to gather stats in a maintenance window. The target can be configured as dictionary only (oracle) or dictionary + object stats (auto)
For each performance problem, analyze and come up with a realistic solution • 10053 trace is very useful for analyzing optimizer behavior • Overall Oracle/Exadata knowledge is necessary for resolving performance problems Resolution Analysis Cause Issue No Change _smm_min_size to the size of old SORT_AREA_SIZE The minimum sort size became small (1MB) per session by changing SORT_AREA_SIZE to PGA_AGGREGATE_TARGET, estimated cost of sort by using TEMP Execution Plan (optimizer chose index full scan instead of memory sort) Poor Batch performance 1 Change _optimizer_cost_based_tran sformation from linier (default) to off An optimizer parameter was configured differently (_optimizer_cost_based_transformation was set to off, to avoid a bug in 11g) Execution Plan (inefficient join order changed) Batch abend by ORA-1652 (TEMP shortage) 2 Increase the size of buffer cache Delayed block cleanout (Due to better Exadata performance, the number of jobs running concurrently increased, that caused shortage of buffer cache and leads to delayed block cleanout) Excessive single block read by full scan (cannot reproduce by re-run) Poor Batch performance 3
out • DBWR writes "uncommit blocks" from buffer cache to storage when buffer cache is insufficient • Oracle doesn't change the state of the blocks at the commit time, leaving the next select that reads the block and change the state to commit by reading UNDO. This is called delayed block cleanouts. • Select cause excessive single block reads to UNDO that leads to poor performance. Next select can run without cleanouts, and no performance problem TBS TBS UNDO Storage Servers Storage Servers 1.TRUNCATE 2.INSERT/COMMIT 3.DBWR 4.SELECT (smart scan) 5. returns single blocks due to uncommit state 6. single block read of UNDO to check current commit state REDO Uncommit blocks are flushed due to shortage of buffer cache Conventional INSERT writes to buffer cache. DBWR writes to storage layer asynchronously Excessive access to UNDO blocks leads to slowness uncommit blocks cannot be read by smart scan 7. change the state to "committed" and generate REDO Tables 8. DBWR writes the committed blocks to storage Buffer Cache Buffer Cache Tables
2. Utilizing pending statistics Pending Statistics Enable pending statistics Stats gathering (Maintenance Window) Publish AP Statistics Stats gathering doesn't affects AP's execution plans until publish Tables AP Statistics Stats gathering affects AP's execution plans Stats gathering (Maintenance Window) • Pending statistics allows to save the statistics and not publish them immediately after the collection • Possible to switch which statistics to use at session level. Easy to confirm execution plans prior to actually publishing it • Very useful to avoid performance issues due to execution plan change caused by statistics gathering DML DML AP AP Tables Confirm execution plans with pending stats
• Old system used to implement its own execution plan monitoring • Daily statistics gathering required immediate response to any execution plan change • After the migration with pending stats, any execution plan change can be examined prior to publish, reduced risks of performance problems Stats gathering (Daily) Exec plan check (Daily) Mail notification Registered SQLs with expected execution plans NG Stats gathering (Maintenance Window) Exec plan check (Every other week) Mail notification NG Publish OK Pending stats Execution plans Expected execution plans (history) BEFORE (old system) AFTER (with Pending Stats) Examine (Immediate response) Examine (Next business day) Register important SQLs and exec plans Compare current execution plans with registered ones immediate response is required for any difference of execution plans Gathering pending stats doesn't affect current statistics Compare execution plans on pending stats with registered ones Publish only if no difference found in execution plans. If not, no immediate response required (publish after proper examination) Stats Compare Execution plans (pending stats) Expected execution plans (history) Registered SQLs with expected execution plans Compare
–issue #1 in 19c Gathering pending statistics gets slow when pending stats is not published #1 In the process of gathering pending stats, it takes a long time to delete past pending statistics of histogram Symptom gather pending statistics of a lot of tables/partition with histograms (no publish) Condition Expected behavior, as written in the following Doc ID. it is recommended not leave the statistics pending, otherwise the following delete statement takes long time. Actually, the index (I_WRI$_OPTSTAT_H_ST) currently used by the DELETE statement is inefficient to narrow down the rows to be deleted. Another index (I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST) is more efficient, and the performance won't depend on the number of rows (histograms). Cause The following SQL patch can improve the performance (confirmed in 19.3) * The SQLID and execution plan may differ on version and environment. Need to test your own environment Resolution • When Number of Pending Statistics Increases, It Takes Time to Collect Statistical Information (DocID 2642768.1) Reference declare patch_name varchar2(20); begin patch_name := dbms_sqldiag.create_sql_patch( sql_id=>'d8yp8608d866z', hint_text=>'index(@DEL$1 WRI$_OPTSTAT_HISTGRM_HISTORY@DEL$1 I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST)', name=>'pend_del_patch'); end; / declare patch_name varchar2(20); begin patch_name := dbms_sqldiag.create_sql_patch( sql_id=>'d8yp8608d866z', hint_text=>'index(@DEL$1 WRI$_OPTSTAT_HISTGRM_HISTORY@DEL$1 I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST)', name=>'pend_del_patch'); end; / SQL_ID d8yp8608d866z, child number 0 ------------------------------------- delete /* QOSH:PURGE_OLD_STS *//*+ dynamic_sampling(4) */ from sys.wri$_optstat_histgrm_history where :1 = savtime and obj# = :2 and intcol# = nvl(:3, intcol#) SQL_ID d8yp8608d866z, child number 0 ------------------------------------- delete /* QOSH:PURGE_OLD_STS *//*+ dynamic_sampling(4) */ from sys.wri$_optstat_histgrm_history where :1 = savtime and obj# = :2 and intcol# = nvl(:3, intcol#)
–issue #2 in 19c gather_table_stats doesn't gather related index stats properly when pending stats are enabled #2 After gathering pending stats on a table, the num_rows of the index becomes 0, even if the table contains rows. Symptom The (published) num_rows of the table is 0 Pending statistics is enable on the table Execute gather_table_stats with cascade option TRUE Condition Bug (was a new bug and not fixed at least until 19.11) Cause Although it is not usual to gather and publish statistics of empty tables, it may require careful consideration when num_rows of a table becomes 0 such as creating new tables. For a workaround, check indexes with num_rows=0 by the following SQL and examine the statistics are correct BEFORE publish. For indexes with wrong statistics, gather the index statistics manually by dbms_stats.gather_index_stats. Resolution Community discussion with reproduceable test case gather_table_stats doesn't gather related index stats properly when pending stats are enabled https://community.oracle.com/mosc/discussion/4497632 Reference -- check sql of index stats select t.owner, t.table_name, dt.num_rows nrows, t.num_rows p_nrows, i.index_name, i.num_rows p_nrows from dba_tab_pending_stats t, dba_ind_pending_stats i, dba_tab_statistics dt where t.owner=i.owner and t.table_name=i.table_name and dt.owner=t.owner and dt.table_name=t.table_name and t.num_rows>0 and i.num_rows =0 and dt.num_rows = 0 order by t.owner, t.table_name,i.index_name; -- check sql of index stats select t.owner, t.table_name, dt.num_rows nrows, t.num_rows p_nrows, i.index_name, i.num_rows p_nrows from dba_tab_pending_stats t, dba_ind_pending_stats i, dba_tab_statistics dt where t.owner=i.owner and t.table_name=i.table_name and dt.owner=t.owner and dt.table_name=t.table_name and t.num_rows>0 and i.num_rows =0 and dt.num_rows = 0 order by t.owner, t.table_name,i.index_name;
• Exadata’s functionality to speed up a query by offloading a part of query operations to storage layer. AP transparent (no query modification necessary) • Smart scan was actively used for a data migration AP which loads and transforms large amount of data • In order to control smart scan, it is important to understand both its prerequisites and blocking factors Storage Indexes Predicate Filtering Column Projection Limited amount of blocks Exadata with smart scan Result set Result set Large amount of blocks Storage Layer Database Layer filter Result set Oracle
~ Prerequisites • Smart scan prerequisites • Exadata (the object must be stored on Exadata storage) • FULL scan (full scan of a segment such as table/index/partitions) • Direct path read (DPR) • Decision logic of DPR depends on versions and not fully published • Some known knowledge about DPR decision • Percentage of the segment size (BLOCKS of its statistics) against the size of buffer cache, and there are three thresholds (STT, MTT, VLOT) • The logic is different in serial and parallel query • Practically, a parallel query always makes DPR to occur (with parallel hint) • Possible to confirm some of the conditions by NSMTIO trace
NSMTIO trace (19c) Buffer Cache STT (Small Table Threshold) = _small_table_threshold =~2% of buffer cache size (b) direct path read Serial Parallel MTT (Medium Table Threshold) =_small_table_threshold x 5 =~10% of buffer cache size (c)direct path read (buffer cache considered)(*1)(*2) VLOT(Very Larget Object Threshold) = _small_table_threshold x 250 =~buffer cache size x _very_large_object_threshold (500%) Direct Path read decision and NSMTIO trace (19c) (d) multiblock read(*2) (b) direct path read(*1) NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp) NSMTIO: kcbdpc:DirectRead: … or NSMTIO: qertbFetch:NoDirectRead:[- STT < OBJECT_SIZE < MTT]:Obect's size: XXX (blocks), Threshold: MTT(XXX blocks), NSMTIO: qertbFetch:NoDirectRead:[- STT < OBJECT_SIZE < MTT]:Obect's size: XXX (blocks), Threshold: MTT(XXX blocks), NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp) NSMTIO: kcbdpc:DirectRead: … NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp) NSMTIO: kcbdpc:DirectRead: … (*1)can be multiblock read by setting "_serial_direct_read"=never; (*2)can be direct path read by setting "_serial_direct_read"=always; (a) direct path read (*1) (a) direct path read (*1) NSMTIO: qertbFetch:DirectRead:[OBJECT_SIZE>VLOT] NSMTIO: qertbFetch:DirectRead:[OBJECT_SIZE>VLOT] Dec. 2020 Kazuhiro Takahashi NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp) NSMTIO: kcbdpc:DirectRead: … or NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: … (c)direct path read (buffer cache considered) • Serial DPR start to happen from 2%, although blocks on the buffer cache are considered and used on a certain condition. Above 10%, it will be always DPR regardless of state of buffer cache. • Parallel DPR happens even under 2% of the size of buffer cache, although blocks on the buffer cache are considered and used on a certain condition. Above 2%, it will be always DPR regardless of state of buffer cache.
~Blocking factors • Smart scan blocking factors include: • Delayed block cleanouts • Fragmentation (chained rows, migrated rows) • Typical symptom is excessive “single block reads”, instead of “smart scan” • A lot of “cell single block physical read” in AWR • Single block read to UNDO blocks (read consistency/ delayed block cleanouts) • Single block read to the segment despite of full table scan (fragmentation) • This happens because smart scan cannot complete offload operations in each storage server independently • Need to change blocks from uncommitted state to committed on buffer cache • Chained/migrated rows distributed more than one storage server have to be loaded on buffer cache
~Blocking factors • For delayed block cleanouts and fragmentation, it is effective for blocks to be suitable for smart scan • Extend the buffer cache to avoid delayed block cleanouts • For tables having over 255 columns, direct path INSERT will be effective to avoid fragmentation Note Resolution Analysis Symptom Blocking factor No * Resolution should be made to APs that modify the segment, not the select statement with poor performance • Extend the buffer cache • Decrease the multiplicity of APs while updating the table (*) • Commit more frequently (*) • Direct path INSERT (bypass buffer cache) (*) • Confirm single block read wait events and cleanout related statistics in AWR • Confirm single block read to UNDO segments in ASH • Performance can improve in next run Excessive single block read to UNDO blocks Delayed block cleanout 1 * Chained rows will be avoidable by direct path insert for tables with over 255 columns • Reorganization such as table move, index rebuild • Direct path INSERT (*) • Confirm the state of chained rows by analyze with utlchain.sql Excessive single block read to the segment Fragmentation (chained/ migrated rows) 2
Statistics consistent gets Buffer cache Buffer cache cache fusion gc cr blocks received cell single block physical read cell multi block physical read cell physical IO Interconnect bytes physical read total IO request physical read total bytes Smart scan Flash cache cell flash cache read hits physical read total bytes optimized physical read requests optimized cell physical IO bytes saved by storage index Physical disks cell blocks helped by commit cache KTR (kernel transaction layer) KCB (kernel cache buffer mgmt) cell blocks processed by txn layer cell blocks processed by cache layer cell blocks processed by data layer cell blocks processed by index layer smart scaned blocks row blocks Storage index cell physical IO interconnect bytes returned by smart scan Storage Server Database Server KDS (kernel data scan) KIS (kernel index scan) Smart scan cell physical IO interconnect Bytes returned by smart scan cell physical IO bytes saved by storage index cell physical IO bytes eligible for predicate offload segment single blocks are returned by read consistency or fragmentation single blocks are returned by delayed block cleanouts The figure is created based on the description from a book: Expert Oracle Exadata, Martin Bach
PDB flashback • For test efficiency, it’s important to come up with a strategy to setup/backup/recover DB in terms of DB objects and data lifecycle • Combine Oracle's backup/recovery and multi-tenant functionality to realize agile test environment Param Restore Param Change DDL Release Data Restore Data Setup DB Addition Initial Creation DB Objects RMAN - Data patch RMAN/ Flashback DB exp/imp (dataonly) PDB clone imp Data - (imp) - imp Statistics - DDL - imp Table/ Index/ View/ Partition PKG/Function Synonym Sequence Trigger - DDL/Refresh Refresh Refresh DDL/Refresh MVIEW/MLOG Manual setup - - Manual setup Manual setup DBLINK Directory - - PDB clone Role/Schema Tablespace Parameter Service PDB - - - - CDB File restore - - - Manual setup Tnsnames Effective mass production by PDB cloning Shorten recovery time and automate AP testing by flashback DB
PDB flashback • DB restore is critical for test efficiency and AP quality • Choose an optimal DB backup/recovery method for each environment • Backup / recovery shells will be required for test automation Validation (large) Production Validation (small) Development • Exadata • Archive log mode • Large DB size (same as production) • No multi-tenant, separate DB instances • Exadata • Archive log mode • Small DB size • Multi-tenant, many PDBs • ODA • No archive log mode • Small DB size • No multi-tenant, many DB instances Description • PDB full restore from online backup (PDB) • PDB recovery by flashback DB PDB full restore from online backup (PDB) DB full restore from cold backup (CDB) Backup/ Recovery RMAN: 1-3 hours FBDB: ~30min (depends on DMLs from restore point) RMAN: ~10 min RMAN: ~5 min Restore Time
PDB flashback ~Comparison of FBDB and RMAN • Fast and easy to create a restore point, and suitable for restoring entire DB to the same restore point over and over • Management of FRA is necessary. Overhead of FB log I/O is not noticeable • FBDB doesn't substitute RMAN backup RMAN Flashback DB DB BK DB BK Log Log Backup Restore Recovery DB DB Log Log Recovery Restore Point FB Log FB Log Flashback DB Additional IO and storage space required DB full restore (takes time) Rollback database by FB log close to a restore point Roll forward until a restore point Completes immediately
PDB flashback ~ Utilizing FBDB • Create RMAN full backup for initial state of data (just in case for FBDB failure) • Repeat a cycle of (create a restore point, execute test, flashback DB) for each test • Raise data between cycles (if necessary) • Setup data again upon switching a test phase (e.g. Perf to Operational test) • Important to plan tests without a recovery that crosses a DB incarnation • Difficult to recover • Not possible in PDB (19c) DB BK1 Restore point Test#1 Flashback DB(PDB) Restore point Test#2 Flashback DB(PDB) ・・・ Raise Data Setup Data BK2 Inc:1 Inc:2 Inc:3 DB DB Cross incarnation BK RMAN Backup Perf test Operational test Perf test Operational test
PDB flashback ~PDB recovery • In 21c, it is possible to reset incarnation in PDB level allowing flashback DB/PITR recovery using an isolated (orphan) PDB branch https://docs.oracle.com/cd/F39414_01/nfcon/details-pdb-point-in-time-recovery-or-flashback-to-any-time-in-the-recent-past-282450811.html PDB level flashback DB or PITR to a certain point of past
about optimizer migration, utilizing pending statistics and smart scan, efficient test with PDB flashback • Proper design will help reduce performance issues on 19c migration • Performance issues require experts who know about Oracle and Exadata architecture • A backup/restore strategy for test environment is important to improve efficiency of performance tests and AP quality • Oracle has been continuously evolving and there must be useful features available upon migration. Strong passion will help