Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tracing Parallel Execution

Avatar for Ivica Arsov Ivica Arsov
November 17, 2018

Tracing Parallel Execution

Avatar for Ivica Arsov

Ivica Arsov

November 17, 2018
Tweet

More Decks by Ivica Arsov

Other Decks in Technology

Transcript

  1. Ivica Arsov Database Engineer • Based in Skopje, Macedonia •

    Oracle Certified Master 12c & 11g • Oracle ACE • Blogger • Founder of MKOUG @IvicaArsov blog: https://blog.iarsov.com The Pythian Group Inc. 2
  2. AGENDA Overview of Parallel Execution Parallel Execution Tracing • What

    options are available? PX Control Flow Parallel DML • Two Phase Commit (2PC) • Recovery Example scenarios The Pythian Group Inc. 4
  3. Talk expectations • Technical content • Learn how to enable

    and disable PX tracing • Understand different PX trace components • Learn at which key parts of the trace file to look at • Learn how to identify downgrade reasons The Pythian Group Inc. 5
  4. Query coordinator with a different SQL (or its NULL) from

    its slaves USERNAME SID SERIAL# STATUS SERV SQL_ID --------------- ---------- ---------- --------- ---- ------------- SH 1919 25163 QC 534p3qxg51cw4 SH 25 19975 IN USE P000 39mjdqdary80p SH 1923 42908 IN USE P001 39mjdqdary80p The Pythian Group Inc. 6
  5. Parallel execution terminology • DoP – number of server processes

    used for single operation • Default DoP: PARALLEL_THREADS_PER_CPU * CPU_COUNT * INSTANCE_COUNT • Producer and consumer model = 2xDoP Acronym Description QC Query coordinator DoP Degree of parallelism TQ Table queue DFO Data flow operation DFO Tree Data flow operation tree The Pythian Group Inc. 8
  6. What can be parallelized ? • Table scans, index fast

    full scans, partitioned index range scans and tables • Nested loops, hash joins, sort merge, start transformation • Queries and subqueries • DDL and DML statements ▪ DDL: CTAS, CREATE/REBUILD INDEX, MOVE/SPLIT/COALESCE PARTITION The Pythian Group Inc. 9
  7. Understanding the role of query coordinator (QC) • The server

    (foreground) process becomes the query coordinator • QC determines the DoP before actual query execution • QC assigns each of the PX slaves with work • Starting 12c, one of the available PX servers will execute serial steps from the execution plan • Responsible to deliver the result to the client process The Pythian Group Inc. 10
  8. Understanding the granules • The basic unit of work ▪

    Block range granules ▪ Partitions granules • No option for enforcing specific granule • With partition granules the DoP correlate with the number of granules (partitions) The Pythian Group Inc. 11
  9. Trace PX execution with ”_px_trace” • Keyword - based tracing

    with “_px_trace” • Event - based tracing is still possible, but not for all PX components • Getting event - based tracing with PX can be difficult. Multiple events needs to be set, plus multiple levels The Pythian Group Inc. 13
  10. The ”_px_trace” components • Three (3) components can be set

    ▪ Verbosity ▪ Area ▪ Timing alter session set "_px_trace” = [[Verbosity,]area],...,[time] The Pythian Group Inc. 14
  11. The ”_px_trace” components • Verbosity ▪ high ▪ medium ▪

    low • Timing ▪ time The Pythian Group Inc. 15
  12. The ”_px_trace” components • Area ▪ scheduling - (equivalent to

    some of event 10384 and some of 10390) ▪ execution - (equivalent to some of event 10390) ▪ granule - (equivalent to some of event 10390 and some of 10391) ▪ messaging - (equivalent to event 10392 and event 10393) ▪ buffer - (equivalent to event 10399) ▪ compilation - (no equivalent event) ▪ tq – table queue information ▪ group ▪ affinity ▪ all - all of the above ▪ none - none of the above The Pythian Group Inc. 16
  13. Example trace with ”_px_trace” • Example trace alter session set

    "_px_trace"="high","execution","medium","buffer","time"; • To get full trace alter session set "_px_trace"="high","all"; • Each slave including QC will generate trace file. For easier management use alter session set tracefile_identifier = ‘some_string‘ The Pythian Group Inc. 17
  14. Some (key) functions to look for • kxfpiinfo - instance

    number / CPUs / parallel_max_servers • kxfrAllocSlaves(begin)- how many slaves are requested? • kxfrAllocSlaves(end)- will also show how many PX slaves were allocated (kxfpgsg) • kxfpAdaptDOP - adaptive DoP • DoP reason downgrade available under qerpxSetupAndAllocSlaves() The Pythian Group Inc. 18
  15. PX slaves downgrade reasons • 350 - downgrade due to

    adaptive DoP • 351 - downgrade due to resource manager max DoP • 352 - downgrade due to insufficient number of processes • 353 - downgrade because slaves failed to join The Pythian Group Inc. 19
  16. PX control flow general information --- cut some lines ---

    select /*+ parallel(2) */ * from sales kkscscid_pdm_eval pdml_allowed=0, cursorPdmlMode=0, sessPdmlMode=0 select /*+ parallel(2) */ * from sales select /*+ parallel(2) */ * from sales kkfdmpdml oct=3, pdml_allowed=0 kkfdmpdml cursor pdml_mode=0 kkfdmpdml session txn mode 0 kkfdmpdml cursor txn mode 0 kkfdmpdml cursor and session envs match ... user_id=110 user_name=SH module=SQL*Plus action= sql_id=88g4cq0jh8uan plan_hash_value=-1702924589 problem_type=3 --- cut some lines --- The Pythian Group Inc. 21
  17. PX control flow general information --- cut some lines ---

    2018-11-16 18:48:38.746*:PX_Granule:kxfr.c@4127:kxfrAllocSlaves(begin): DOP trace -- call kxfpgsg to get 2 slaves imc-0 2018-11-16 18:48:38.746*:PX_Granule:kxfr.c@23886:kxfrSetLoadBalOptions(): allocFlag=30 isGV?=No mode=MANUAL 2018-11-16 18:48:38.746*:PX_Messaging:kxfp.c@10680:kxfpgsg(begin): reqthreads=2 height=0 lsize=0 alloc_flg=0x230 2018-11-16 18:48:38.746*:PX_Messaging:kxfp.c@10757:kxfpgsg(): reqthreads=2 KXFPLDBL/KXFPADPT/ load balancing=on adaptive=off 2018-11-16 18:48:38.746*:PX_Messaging:kxfp.c@20637:kxfpiinfo(): inst [cpus:mxslv:bc(MB) :pga_target(MB):type] 1 [2 :40 :1248 :409 :HUB] --- cut some lines --- The Pythian Group Inc. 22
  18. PX control flow slave allocation --- cut some lines ---

    2018-09-17 02:19:26.739*:PX_Messaging:kxfp.c@4975:kxfprienq(): Sending to server=1.0 qref=0xedfd4980 msg=0xcbfa9fd0 len=2136 st=2 mty=2 (JOI) enq#=7(KXFP JOIN GROUP MESSAGE) flg=0xA state=10011 qref=0xedfd4980 qrser=1108993 qrseq=0 mty=7 fmh=0 --- cut some lines --- 2018-09-17 02:19:26.744*:PX_Messaging:kxfp.c@4975:kxfprienq(): Sending to server=1.1 qref=0xe021f7d8 msg=0xcbf8bfb8 len=2136 st=2 mty=2 (JOI) enq#=7(KXFP JOIN GROUP MESSAGE) flg=0xA state=10011 qref=0xe021f7d8 qrser=1108993 qrseq=0 mty=7 fmh=0 --- cut some lines --- The Pythian Group Inc. 23 Slave replies with: KXFP JOIN GROUP REPLY
  19. PX control flow slave allocation --- cut some lines ---

    Acquired 4 slaves on 1 instances avg height=2 #set=2 qser=1108993 P000 set 1 inst 1 spid 1382 P001 set 1 inst 1 spid 1383 P002 set 2 inst 1 spid 1394 P003 set 2 inst 1 spid 1396 --- cut some lines --- 2018-09-17 02:19:26.749*:PX_Control:kxfx.c@13870:kxfxcPutSession(): Send session information 2018-09-17 02:19:26.749*:PX_Control:kxfx.c@4114:kxfxSesCtxPut(): Send session parameters --- cut some lines --- The Pythian Group Inc. 24
  20. PX control flow object extent map dumping object objn:75760 IARSOV.T1

    so:0xe8647860 mo:0xffff80ffbea6f680 flg:4201 (KXFRO_PSC/KXFRO_RID/KXFRO_HINT/) nbparts:1 nfiles:2 nnodes:0 ninst:0 --- cut some lines --- partitions of object 1: part:0 (abs:-1 pnum:-1 spnum:-1 objn:75760 objd:75760 tsn:5) ecnt:26 size:1343 nfiles:2 naff:-1 flags: BIT Files in partition 0: kfil:12 size:659 kfil:13 size:684 Extent map of partition 0: afn:12 blk:3202 size:126 coalescable:1 pruneable:0 ... instid:65535 numaid:65535 CU#:4294967295 afn:12 blk:3458 size:126 coalescable:1 pruneable:0 ... instid:65535 numaid:65535 CU#:4294967295 --- cut some lines --- The Pythian Group Inc. 25
  21. PX control flow statement parse --- cut some lines ---

    2018-09-17 02:19:26.749*:PX_Messaging:kxfp.c@4975:kxfprienq(): Sending to server=1.0 qref=0xedfd4980 msg=0xcbfa9fd0 len=3928 st=2 mty=6 (DTA) enq#=18(KXFX PARSE) flg=0x1 state=10011 qref=0xedfd4980 qrser=1108993 qrseq=5 mty=18 fmh=-127 ----- PX Control Message ----- MSG( -->, KXFXOparse, DIALOG_HINT, slv=1 ) kxfxmh at addr: 0xcbf8c0d8 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 3640 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXOparse --- cut some lines --- The Pythian Group Inc. 26 Parse request along with whole SQL statement is sent to all slaves Wait for all parse replies ----- PX Control Message ----- MSG( <--, KXFXORokcurs, ss#=1 slv=0 ok=yes ) kxfxmh at addr: 0xcbfaa0f0 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 152 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXORokcurs kxfxmo->kxfxmossno: 1 kxfxmo->kxfxmosno: 0 kxfxmo->kxfxmosxid: xid: 0x0000.000.02010001 kxfxmo->kxfxmoflag: 0x1 [--OK] kxfxmo->kxfxmoerrcode: 0
  22. PX control flow general information --- cut some lines ---

    2018-11-16 18:48:38.731*:PX_Granule:kxfr.c@8752:kxfrDefaultDOP(): deg:2 tpc:1 #cpus:2 cap:-1 service:pdb1 pig: ser:0 locl:0 2018-11-16 18:48:38.731*:PX_Granule:kxfr.c@7915:kxfrSysInfo(): DOP trace -- compute default DOP from system info # instance alive = 1 (kxfrsnins) arch=-1 system default DOP:2 hard affinity:0 testing:0 --- cut some lines --- 2018-09-17 02:19:26.726*:PX_Blackbox:kxf.c@619:kxfCreateBlackBox(): Created QC PGA blackbox (creating from PGA) req size: 16384 granted: 16384, (sqlid: 2515965918123664279, bucket: 0xffff80ffbea4d048) 2018-09-17 02:19:26.726*:PX_Queuing:kxfxq.c@414:kxfxqOnOrderQueue(): KXFXQSTMTCHK() is TRUE, KXFXQQUEUABLE() is TRUE, pgadep:0, pgatopsql:1, pgapls 0. SlaveSQL?:NO, Parallized?:YES, DOP:2, userSQL:NO, dbmsSQL:NO, read-mostly?:NO. --- cut some lines --- The Pythian Group Inc. 27 a complex system or device whose internal workings are hidden or not readily understood.
  23. SELECT dbms_utility.data_block_address_file(50336247) "FILE” dbms_utility.data_block_address_block(50336247) "BLOCK", FROM dual; FILE BLOCK -------

    --------- 12 4599 PX control flow 2018-07-18 14:22:51.694*:PX_Granule:kxfr.c@2914:kxfrialo(): Start: generating granules (1) imc_px-0 --- cut some lines --- dumping granule generator gg:0xffff80ffbea6faf8 gdef:0xe8647810 id:0 fobj:0 nobj:1 nfiles:2 ngra:25 ndist:1 nslv:2 sset:0 isnuma:0 flg:0001(KXFRGG_RIDRANGE) Distributors associated to that generator: dist:0 nbgra:25 nid:-1 nbslv:2 size:1343 npart:0 npg:1 filedist:0 filno:0 nbgra:0 filedist:1 filno:12 nbgra:12 gnum:0 sz:55 pnum:inv rid:(file:12 blk:4599..4911) bc psize:1343 sdba: 50336247 edba: 50336559 gnum:1 sz:55 pnum:inv rid:(file:12 blk:3312..3496) bc psize:1343 sdba: 50334960 edba: 50335144 gnum:2 sz:55 pnum:inv rid:(file:12 blk:3257..3311) bc psize:1343 sdba: 50334905 edba: 50334959 gnum:3 sz:55 pnum:inv rid:(file:12 blk:3552..3736) bc psize:1343 sdba: 50335200 edba: 50335384 --- cut some lines --- The Pythian Group Inc. 28 granule generator
  24. PX control flow PXn PXn PXn PXn The Pythian Group

    Inc. 29 DATAFILE EXTENT BLOCK GRANULE First granule is sent to each slave in round-robin Granules are sent to each slave as per slave request
  25. PX control flow • Defaults ▪ _px_min_granules_per_slave = 13 ▪

    _px_granule_size = 1000000 (KB) ▪ _px_max_granules_per_slave = 100 --- cut some lines --- 2018-09-17 15:21:40.673*:PX_Granule:kxfr.c@13080:kxfrGraRidSize(): minGraSize:125000 nbSlaves:4 minPerSlave:18 maxPerSlave:100 -> ideal size:150 2018-09-17 15:21:40.673*:PX_Granule:kxfr.c@12153:kxfrGraRidIterate(): pnum:0 fileIdx:1 fileNum:12 fileSize:5773 adjustGraSize:152 mbrc:150 --- cut some lines --- The Pythian Group Inc. 30 granule generator
  26. PX control flow example sql=select /*+ parallel(2) */ owner from

    table1 order by 1 -------------------------------------------+---------------------------+-------------------------+ | Id | Operation | Name | Rows | Bytes | Time | TQ |IN-OUT|PQ Distrib | -------------------------------------------+---------------------------+-------------------------+ | 0 | SELECT STATEMENT | | | | | | | | | 1 | PX COORDINATOR | | | | | | | | | 2 | PX SEND QC (ORDER) | :TQ10001| 67K | 334K | 00:00:02 |:Q1001| P->S |QC (ORDER) | | 3 | SORT ORDER BY | | 67K | 334K | 00:00:02 |:Q1001| PCWP | | | 4 | PX RECEIVE | | 67K | 334K | 00:00:02 |:Q1001| PCWP | | | 5 | PX SEND RANGE | :TQ10000| 67K | 334K | 00:00:02 |:Q1000| P->P |RANGE | | 6 | PX BLOCK ITERATOR | | 67K | 334K | 00:00:02 |:Q1000| PCWC | | | 7 | TABLE ACCESS FULL | TABLE1 | 67K | 334K | 00:00:02 |:Q1000| PCWP | | -------------------------------------------+---------------------------+-------------------------+ The Pythian Group Inc. 31
  27. PX control flow The Pythian Group Inc. 32 KXFXOparse KXFXORokcurs

    KXFXOexecute sdba, edba PXn - parallel execution slave QC - query coordinator KXFXORreply KXFXRready Wait for all parse replies parse finished ... #failures:0 Send parse request to all slaves Producers ready to start scan
  28. PX control flow The Pythian Group Inc. 33 KXFXORreply KXFXRstarted

    KXFXOexecute PXn - parallel execution slave QC - query coordinator KXFXORreply KXFXRinput Consumers reply with ”started” Consumers reply “ready” and wait for data Instructs consumers to start execute
  29. PX control flow The Pythian Group Inc. 34 KXFXOresume PXn

    - parallel execution slave QC - query coordinator KXFXOgranule KXFXORreply KXFXRgranule KXFXORreply KXFXRdone QC indicates resume for producer set Producers reply with “done” Producers ask for next granule QC sends new granule
  30. PX control flow The Pythian Group Inc. 35 PXn -

    parallel execution slave QC - query coordinator KXFXORreply KXFXRready KXFQ ORDERED TQ New producers reply “ready” Producers send DATA BLOCK to QC QC sends “resume” message KXFXOresume
  31. PX control flow --- cut some lines --- 2018-07-23 13:48:55.144*:PX_Control:kxfx.c@8903:kxfxce(begin):

    Sending execute to tqid:0 slvno:0 1 granules defined in DFO 0 granules for definition 0 nbgra:1 size:671 batchsize:1 size: 4 aligned: 8 total: 48 rem:16032 to: 1 size: 2 aligned: 8 total: 56 rem:16024 to: 1 size: 8 aligned: 8 total: 64 rem:16016 to: 1 size: 56 aligned: 56 total: 72 rem:16008 to: 1 size: 56 aligned: 56 total: 128 rem:15952 to: 1 sz:55 pnum:inv rid:(file:12 blk:4599..4911) bc psize:1343 sdba: 50336247 edba: 50336559 Sending queue information nss:1 ssno:1 sno:0 nbslv:2 --- cut some lines --- The Pythian Group Inc. 36
  32. PX control flow --- cut some lines --- 2018-07-23 13:48:55.147*:PX_Control:kxfx.c@8903:kxfxce(begin):

    Sending execute to tqid:0 slvno:1 1 granules defined in DFO 0 granules for definition 0 nbgra:1 size:671 batchsize:1 size: 4 aligned: 8 total: 48 rem:16032 to: 1 size: 2 aligned: 8 total: 56 rem:16024 to: 1 size: 8 aligned: 8 total: 64 rem:16016 to: 1 size: 56 aligned: 56 total: 72 rem:16008 to: 1 size: 56 aligned: 56 total: 128 rem:15952 to: 1 sz:53 pnum:inv rid:(file:13 blk:4217..4527) bc psize:1343 sdba: 54530169 edba: 54530479 Sending queue information nss:1 ssno:1 sno:1 nbslv:2 --- cut some lines --- The Pythian Group Inc. 37
  33. PX control flow --- cut some lines --- ----- PX

    Control Message ----- MSG( <--, KXFXORreply(granule), dfo=0 slv=0 ) kxfxmh at addr: 0xcbfbe0f0 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 112 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXORreply kxfxr->kxfxrtyp: KXFXRgranule kxfxr->kxfxr1tag: 0 kxfxr->kxfxr2tag: 0 kxfxr->kxfxrqrows: TRUE kxfxr->kxfxrGraNum: 0 --- cut some lines --- The Pythian Group Inc. 38
  34. PX control flow --- cut some lines --- 2018-07-23 13:48:55.149*:PX_Control:kxfx.c@9694:kxfxcw():

    returning reply: kxfxr->kxfxrtyp: KXFXRgranule kxfxr->kxfxr1tag: 0 kxfxr->kxfxr2tag: 0 kxfxr->kxfxrqrows: TRUE kxfxr->kxfxrGraNum: 0 2018-07-23 13:48:55.149*:PX_Control:kxfx.c@9361:kxfxcSendNextGranule(begin): gnum:0 slvno:0 #gi:1 size: 8 aligned: 8 total: 40 rem:16040 to: 1 (kxfxcSendNextGranule) sz:55 pnum:inv rid:(file:12 blk:3312..3496) bc psize:1343 --- cut some lines --- The Pythian Group Inc. 39
  35. PX control flow 2018-07-23 13:48:47.906*:PX_Messaging:kxfp.c@7571:kxfpProcessError(): 10387 error message received from

    server=1.0(P000) qref:0xdd502d08 qrser:151553 qrseq:35 mh:0xcbfbdfd0 clear buffers: q=0xe00800e0 qref=0xdd502d08 qrser=151553 qrseq=35 nbufs=1 qrflg=0x1 --- cut some lines --- 2018-07-23 13:48:47.906*:PX_Messaging:kxfp.c@7571:kxfpProcessError(): 10387 error message received from server=1.1(P001) qref:0xe0081468 qrser:151553 qrseq:33 mh:0xcbfdbfe8 clear buffers: q=0xe00800e0 qref=0xe0081468 qrser=151553 qrseq=33 nbufs=1 qrflg=0x1 commands the slave to leave the server group normally The Pythian Group Inc. 40
  36. Parallel DML: two phase commit (2PC) • Once a transaction

    is completed COMMIT or ROLLBACK is mandatory • Two phase commit is used to guarantee all or non PXn slaves commit • KXFXO messages ▪ KXFXOprepare ▪ KXFXORtrans ▪ KXFXOforget The Pythian Group Inc. 42
  37. Parallel DML: two phase commit (2PC) The Pythian Group Inc.

    43 KXFXOprepare kxfxmpt->pxid_kxfxmpt KXFXORtrans KXFXOforget kxfxmft->pxid_kxfxmft kxfxmft->cscn_kxfxmft KXFXORtrans PXn - parallel execution slave QC - query coordinator
  38. Parallel DML: two phase commit (2PC) ----- PX Control Message

    ----- MSG( -->, KXFXOprepare, DIALOG_HINT ) kxfxmh at addr: 0xcbfa60f0 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 48 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXOprepare kxfxmpt->pxid_kxfxmpt: 0x000a.005.00001193 The Pythian Group Inc. 44
  39. Parallel DML: two phase commit (2PC) ----- PX Control Message

    ----- MSG( <--, KXFXORtrans ) kxfxmh at addr: 0xcbfa60f0 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 48 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXORtrans The Pythian Group Inc. 45
  40. Parallel DML: two phase commit (2PC) ----- PX Control Message

    ----- MSG( -->, KXFXOforget, DIALOG_HINT ) kxfxmh at addr: 0xcbfa60f0 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 56 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXOforget kxfxmft->pxid_kxfxmft: 0x000a.005.00001193 kxfxmft->cscn_kxfxmft: 0x0000000000f03159 The Pythian Group Inc. 46
  41. Parallel DML: two phase commit (2PC) ----- PX Control Message

    ----- MSG( <--, KXFXORtrans ) kxfxmh at addr: 0xcbfa60f0 kxfxmh->kxfxmhh.kxfmhtyp: KXFMHFSX kxfxmh->kxfxmhmsz: 48 kxfxmh->kxfxmhflg: 0x6 [-LF] kxfxmh->kxfxmhotyp: , KXFXORtrans The Pythian Group Inc. 47
  42. Parallel DML recovery statement failure •When statement is running the

    QC holds normal PS lock in mode 4 (S) for each slave •Once statement fails, QC posts 10388 message to the rest of slaves •QC knows slaves rolled back after it can get exclusive mode 6 (X)
  43. Parallel DML recovery statement failure SQL> update /*+ parallel(4) */

    stmt_fail set id = to_number(id); update /*+ parallel(4) */ stmt_fail set id = to_number(id) * ERROR at line 1: ORA-12801: error signaled in parallel query server P003 ORA-01722: invalid number During SQL execution (V$LOCK) it holds PS lock in mode 4 (S) for all slaves SID TY LMODE ID1 ID2 ADDR ---------- -- ---------- ---------- ---------- ---------------- 29 PS 4 1 3 00000000882282E0 29 PS 4 1 1 00000000882283F8 29 PS 4 1 2 0000000088228628 29 PS 4 1 0 0000000088229130
  44. Parallel DML recovery statement failure --- cut some lines ---

    Signaling OER(10388) in q=0xe0362708 qser=86529 qinc=1 nqr=4 2018-11-21 23:33:53.856*:PX_Messaging:kxfp.c@15503:kxfplsig(): signaling err=10388 in q=0x0 qser=86529 2018-11-21 23:33:53.856*:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dc10 dpflg=8 q=0xdd515f00 qser=86529 pr=0x8c414db8 cqr=0xe03675c8 err=0 2018-11-21 23:33:53.856*:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dcc8 dpflg=8 q=0xdd511bf0 qser=86529 pr=0x8c415e48 cqr=0xe035cf48 err=0 2018-11-21 23:33:53.857*:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dd80 dpflg=8 q=0xdd50cd40 qser=86529 pr=0x8c421478 cqr=0xe035e408 err=0 --- cut some lines --- 10388, 00000, "parallel query server interrupt (failure)" // *Cause: internal use only // *Action: this event should never be set externally // *Comment: commands a slave to leave the server group (query // failed)
  45. Parallel DML recovery statement failure --- cut some lines ---

    Release Slave q=0xe0362708 qr=0xdd502668 action=2 server=1.3 2018-11-21 23:33:53.864*:PX_Messaging:kxfp.c@4440:kxfpqsrls(): got enq=0x88228740 in exclusive mode 2018-11-21 23:33:53.864*:PX_Messaging:kxfp.c@4515:kxfpqsrls(): cleaning up enqueue lock value --- cut some lines --- Release Slave q=0xe0362708 qr=0xe035c978 action=2 server=1.2 2018-11-21 23:33:54.018*:PX_Messaging:kxfp.c@4440:kxfpqsrls(): got enq=0x882282e0 in exclusive mode 2018-11-21 23:33:54.018*:PX_Messaging:kxfp.c@4515:kxfpqsrls(): cleaning up enqueue lock value --- cut some lines --- 2018-11-21 23:33:54.057*:PX_Messaging:kxfp.c@3578:kxfpqsod_qc_sod(): all slaves released qser=86529
  46. Parallel DML recovery rollback •QC sends 10382 message (sending out

    a reset server group) to all slaves •Slaves perform rollback and send KXFP RESET ACK •QC sends 10388 to all salves (commands a slave to leave the server group) The Pythian Group Inc. 53 10382, 00000, "parallel query server interrupt (reset)" // *Cause: internal use only // *Action: this event should never be set externally // *Comment: commands a slave to clear its buffers and reset itself
  47. Parallel DML recovery rollback --- cut some lines --- 2018-11-22

    11:48:22.736*:PX_Messaging:kxfp.c@3197:kxfpqsod_qc_rsg(): q=0xe0580460 qser=68609 qinc=0 action=0x1 flg=0xc24 sending out a reset server group 2018-11-22 11:48:22.736*:PX_Messaging:kxfp.c@13457:kxfpssg(): Signaling OER(10382) in q=0xe0580460 qser=68609 qinc=1 nqr=4 *:PX_Messaging:kxfp.c@15503:kxfplsig(): signaling err=10382 in q=0x0 qser=68609 *:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dc10 dpflg=264 q=0xdd515f00 qser=68609 pr=0x8c414db8 cqr=0xe05678e0 err=0 *:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dcc8 dpflg=264 q=0xdd511bf0 qser=68609 pr=0x8c415e48 cqr=0xe04df248 err=0 *:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dd80 dpflg=264 q=0xdd50cd40 qser=68609 pr=0x8c4203e8 cqr=0xdd503b90 err=0 *:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915de38 dpflg=264 q=0xdd516220 qser=68609 pr=0x8c422508 cqr=0xdd5016f8 err=0 --- cut some lines ---
  48. Parallel DML recovery rollback 2018-11-22 11:48:22.740*:PX_Control:kxfx.c@10266:kxfxmai(): In signal-handler after catching

    10382 ----- PX Server UGA context ----- kxfxmai SLAVE RESET [2, 0] --- cut some lines --- 2018-11-22 11:48:22.748*:PX_Messaging:kxfp.c@4975:kxfprienq(): Sending to server=1.-1 qref=0xe05678e0 msg=0xcbfc3fe8 len=288 st=2 mty=9 (RSG) enq#=15(KXFP RESET ACK) flg=0x808 state=10011 qref=0xe05678e0 qrser=68609 qrseq=0 mty=15 fmh=0 sending message to server 1.-1 ser:68609 seq#:1 (remote queue on local instance, it is a priority msg) --- cut some lines --- 2018-11-22 11:48:22.753*:PX_Messaging:kxfp.c@3352:kxfpqsod_slv_rsg(): sent ack to QC ser=68609 inc=1 qrser=68609
  49. Parallel DML recovery process failure •QS process failure • QC

    receives 12805 message from PMON • QC sends 10388 message to other slaves • Each slave rolls back its transaction • PMON rolls back died slave transaction •QC process failure • Each slave receives 10389 message
  50. Parallel DML recovery process failure (slave) --- cut some lines

    --- 2018-11-22 12:46:14.124*:PX_Messaging:kxfp.c@3806:kxfpqsod(begin): q=0xdd501cc8 qser=176129 qinc=0 action=0x2 flg=0x4424 Process received 12805 signal due to P001 --- cut some lines --- 2018-11-22 12:46:14.124*:PX_Messaging:kxfp.c@13457:kxfpssg(): Signaling OER(10388) in q=0xdd501cc8 qser=176129 qinc=1 nqr=2 2018-11-22 12:46:14.124*:PX_Messaging:kxfp.c@15503:kxfplsig(): signaling err=10388 in q=0x0 qser=176129 2018-11-22 12:46:14.124*:PX_Messaging:kxfp.c@24500:kxfpSignalSlave(): signal: dp=0xd915dc10 dpflg=8 q=0xdd515f00 qser=176129 pr=0x8c414db8 cqr=0xe0580a40 err=0 2018-11-22 12:46:14.126*:PX_Messaging:kxfp.c@4174:kxfpqrsod(): --- cut some lines ---
  51. Parallel DML recovery process failure (QC) --- cut some lines

    --- *** 2018-11-22T12:58:38.518495+01:00 (PDB1(4)) 2018-11-22 12:58:38.624*:PX_Scheduler:qerpx.c@9054:qerpxSlaveFetch(): exiting pxid:1 rwsrid:1 error:10389 *** 2018-11-22T12:58:52.390967+01:00 (PDB1(4)) 2018-11-22 12:58:52.347*:PX_Control:kxfx.c@10266:kxfxmai(): In signal-handler after catching 10389 --- cut some lines --- 10389, 00000, "parallel query server interrupt (cleanup)" // *Cause: internal use only // *Action: this event should never be set externally // *Comment: commands a slave to leave the server group (signaled // by PMON, coordinator died)
  52. Examples 1. DoP limited by resource manager 2. DoP derived

    from hint 3. Join between tables with different DoP 4. PDML not enabled
  53. DoP limited by resource manager example 1 SQL> select plan,

    group_or_subplan, parallel_degree_limit_p1 2 from DBA_RSRC_PLAN_DIRECTIVES where plan = 'UKOUG_DEMO_PLAN'; PLAN GROUP_OR_SUBPLAN PARALLEL_DEGREE_LIMIT_P1 -------------------- -------------------- ------------------------ UKOUG_DEMO_PLAN OTHER_GROUPS 2 SQL> alter session force parallel query parallel 4; SQL> alter session set "_px_trace" = "high","all"; The Pythian Group Inc. 60
  54. DoP limited by resource manager example 1 --- cut some

    lines --- 2018-09-16 16:14:46.167*:PX_Granule:kxfr.c@6352:kxfrFlushObjs(end): flushed 2 objects and 0 temp objects best object:0xee303e20 hgt:0 blks:1736897 acp:0 nds:1 thr:4 threads requested = 4 (from kxfrComputeThread()) adjusted no. threads = 4 (from kxfrAdjustDOP()) 2018-09-16 16:14:46.167*:PX_Granule:kxfr.c@4127:kxfrAllocSlaves(begin): DOP trace -- call kxfpgsg to get 4 slaves imc-0 2018-09-16 16:14:46.167*:PX_Granule:kxfr.c@23886:kxfrSetLoadBalOptions(): allocFlag=30 isGV?=No mode=MANUAL 2018-09-16 16:14:46.167*:PX_Messaging:kxfp.c@10680:kxfpgsg(begin): reqthreads=4 height=0 lsize=0 alloc_flg=0x230 2018-09-16 16:14:46.167*:PX_Messaging:kxfp.c@10757:kxfpgsg(): The Pythian Group Inc. 61
  55. DoP limited by resource manager example 1 --- cut some

    lines --- session sst=0xffff80ffbec887d8 threads=0 dfo=1 height=0 width=0 query=1 dml=0 ddl=0 dop=0 #slave set=0 2018-09-16 16:14:46.244*:PX_Messaging:kxfp.c@11873:kxfpgsg(end): slave group q=0xe0c96e80 2018-09-16 16:14:46.244*:PX_Granule:kxfr.c@4204:kxfrAllocSlaves(end): actual num slaves alloc'd = 2 (kxfpqcthr) Finish: allocated actual 2 slaves for non-GV query total number of instances (union of 2 slvset): slvset # 1 total number of instances (union of 2 slvset): liid: 0 riid: 1 total number of instances (union of 2 slvset): slvset # 2 2018-09-16 16:14:46.244*:PX_Granule:kxfr.c@8268:kxfrSlvInstInfo(): total The Pythian Group Inc. 62
  56. DoP limited by resource manager example 1 --- cut some

    lines --- Slaves associated to that generator: slave 0 : defdist=0 curdist=0 szsofar:0 dgpos:0 defpg:0 dgphase:pgsteal start:0 phase:noaff slave 1 : defdist=0 curdist=0 szsofar:0 dgpos:0 defpg:0 dgphase:pgsteal start:0 phase:noaff 2018-09-16 16:14:46.244*:PX_Granule:kxfr.c@3016:kxfrialo(end): Finished granules allocation and slave acquisition (qcq:0xe0c96e80) 2018-09-16 16:14:46.244*:PX_Scheduler:qerpx.c@12093:qerpxSetupAndAllocSlaves(): DOP downgraded to 2(reason_code = 352) qerpxSgAssign [ 111/ 111] ngdef:3 ntq:7 qerpxStart [ 111/ 0] rwsrid:1 pxid:1 err:0 352 - downgrade due to insufficient number of processes The Pythian Group Inc. 63
  57. DoP derived from hint example 2 --- cut some lines

    --- Content of other_xml column =========================== derived_cpu_dop: 0 derived_io_dop : 0 dop_reason : hint dop : 6 px_in_memory_imc: no px_in_memory : no db_version : 12.2.0.1 parse_schema : SH --- cut some lines --- The Pythian Group Inc. 64 • hint • session (alter session force ...) • table property It does not account possible downgrades.
  58. DoP derived from hint example 2 --- cut some lines

    --- session sst=0xffff80ffbec787d8 threads=0 dfo=1 height=0 width=0 query=1 dml=0 ddl=0 dop=0 #slave set=0 2018-09-16 17:55:48.765*:PX_Messaging:kxfp.c@11873:kxfpgsg(end): slave group q=0xdd501f80 2018-09-16 17:55:48.765*:PX_Granule:kxfr.c@4204:kxfrAllocSlaves(end): actual num slaves alloc'd = 6 (kxfpqcthr) Finish: allocated actual 6 slaves for non-GV query total number of instances (union of 2 slvset): slvset # 1 total number of instances (union of 2 slvset): liid: 0 riid: 1 --- cut some lines --- The Pythian Group Inc. 65
  59. Join between tables with different DoP example 3 SQL> alter

    table table1 parallel 2; Table altered. SQL> alter table table2 parallel 4; Table altered. The Pythian Group Inc. 66
  60. Join between tables with different DoP example 3 --- cut

    some lines --- Content of other_xml column =========================== derived_cpu_dop: 0 derived_io_dop : 0 dop_reason : table property dop : 2 px_in_memory_imc: no px_in_memory : no db_version : 12.2.0.1 parse_schema : SH --- cut some lines --- The Pythian Group Inc. 67
  61. Join between tables with different DoP example 3 --- cut

    some lines --- DOP trace -- requested thread from best ref obj = 4 (from kxfrIsBestRef()) --- cut some lines --- adjusted no. threads = 4 (from kxfrAdjustDOP()) 2018-09-16 20:07:41.812*:PX_Granule:kxfr.c@4127:kxfrAllocSlaves(begin): DOP trace -- call kxfpgsg to get 4 slaves imc-0 2018-09-16 20:07:41.812*:PX_Granule:kxfr.c@23886:kxfrSetLoadBalOptions(): allocFlag=30 isGV?=No mode=MANUAL 2018-09-16 20:07:41.812*:PX_Messaging:kxfp.c@10680:kxfpgsg(begin): reqthreads=4 height=0 lsize=0 alloc_flg=0x230 --- cut some lines --- The Pythian Group Inc. 68
  62. PDML not enabled example 4 --------------------------------------------------------------------------------------------------------------- | Id | Operation

    | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | 133 (100)| | | | | | 1 | UPDATE | T22 | | | | | | | | | 2 | PX COORDINATOR | | | | | | | | | | 3 | PX SEND QC (RANDOM)| :TQ10000 | 68353 | 333K| 133 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) | | 4 | PX BLOCK ITERATOR | | 68353 | 333K| 133 (0)| 00:00:01 | Q1,00 | PCWC | | |* 5 | TABLE ACCESS FULL| T22 | 68353 | 333K| 133 (0)| 00:00:01 | Q1,00 | PCWP | | --------------------------------------------------------------------------------------------------------------- The Pythian Group Inc. 69
  63. PDML not enabled example 4 --- cut some lines ---

    kkscscid_pdm_eval pdml_allowed=0, cursorPdmlMode=0, sessPdmlMode=0 update /*+ parallel(2) */ t22 set owner = '1' update /*+ parallel(2) */ t22 set owner = '1' kkfdmpdml oct=6, pdml_allowed=0 kkfdmpdml cursor pdml_mode=0 kkfdmpdml session txn mode 0 kkfdmpdml cursor txn mode 0 kkfdmpdml cursor and session envs match 2018-09-17 16:42:54.128*:PX_Scheduler:qerpx.c@7406:qerpxStart(): start parallelizer pxid:1 rwsrid:2 --- cut some lines --- The Pythian Group Inc. 70
  64. PDML enabled example 4 • alter session enable parallel dml;

    -----------------------------------------+-----------------------------------+-------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib | -----------------------------------------+-----------------------------------+-------------------------+ | 0 | UPDATE STATEMENT | | | | 133 | | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000| 67K | 334K | 133 | 00:00:02 |:Q1000| P->S |QC (RANDOM)| | 3 | UPDATE | T22 | | | | |:Q1000| PCWP | | | 4 | PX BLOCK ITERATOR | | 67K | 334K | 133 | 00:00:02 |:Q1000| PCWC | | | 5 | TABLE ACCESS FULL | T22 | 67K | 334K | 133 | 00:00:02 |:Q1000| PCWP | | -----------------------------------------+-----------------------------------+-------------------------+ The Pythian Group Inc. 71
  65. Query coordinator with a different SQL (or its NULL) from

    its slaves USERNAME SID SERIAL# STATUS SERV SQL_ID --------------- ---------- ---------- --------- ---- ------------- SH 1919 25163 QC 534p3qxg51cw4 SH 25 19975 IN USE P000 39mjdqdary80p SH 1923 42908 IN USE P001 39mjdqdary80p The Pythian Group Inc. 72
  66. Query coordinator with a different SQL (or its NULL) from

    its slaves This is not a bug per Bug 1837760: PARALLEL QUERY SLAVE DOES NOT GET RELEASED AFTER CTRL-C TO CANCEL THE QUERY, which was closed as an enhancement request. Slaves will be considered "busy" until a SQL statement completes and releases the cursor. If a session is CTRL-C'd, the cursor will remain open until either (1) another SQL statement is issued in the session, or (2) the session is exited. This behavior continues through 11g and higher. The Pythian Group Inc. 73 But, this still does not answer how SQL ID can differ?
  67. • PX tracing enables you to look under the hood

    • You can see actual granules allocations • You can see execution flow / messaging between processes • Trace files are generated for each process including coordinator Summary The Pythian Group Inc. 74