The story about the migration: Oracle to Postgres | Postges Build 2020 | Alicja Kucharczyk & Sushant Pandey

The story about the migration Sushant Pandey Engineering Architect Postgres
Build 2020, Dec 9. Alicja Kucharczyk EMEA Global Black Belt OSS Data Tech Specialist

Overview

Overview Oracle Exadata to Azure Database for PostgreSQL – Single
Server, v11 4vCores, 20GB (Postgres) <–> 4vCores, 32GB (Oracle)

PoC Scope 2 packages from 1 schema + single objects
and schemas from dependent packages, e.g. for logging

Success Criteria • Comparable performance to the existing Oracle Exadata
instance • Based on test script provided by customer which tests performance of 2 packages.

Migration Tooling Schema and Data migration

Results  Schema and Data migration

Test Set Test 1 Node 1 duration Node 2 duration
[...]

Test 1 - Tree Traversal Node Oracle (ms) Postgres (ms)
Postgres vs. Oracle % Node 1 1 16 6,25 Node 2 1 16 6,25 Node 3 0 0 100 Node 4 0 0 100 Node 5 0 0 100 Node 6 0 0 100 Node 7 2 0 200 Total 4 32 12,5

Test 2 – Cursor Free Execution Node Oracle (ms) Postgres
(ms) Postgres vs. Oracle % Node 1 738 62 1190 Node 2 738 47 1570 Node 3 1427 47 3036 Node 4 1424 63 2260 Node 5 1132 31 3651 Node 6 1154 31 3722 Node 7 2340 344 680 Node 8 2256 375 601 Node 9 2240 422 530 Node 10 2242 375 597 Total 15691 1797 873

Test 3 – Cursor Free Execution – datatype mapping scenario
Node Oracle (ms) Postgres (ms) Postgres vs. Oracle % Node 1 28 31 90 Node 2 27 16 168,75 Total 55 47 117

Code Migration

SYS_CONNECT_BY_PATH

SYS_CONNECT_BY_PATH valid only in hierarchical queries. It returns the path
of a column value from root to node, with column values separated by char for each row returned by CONNECT BY condition. SELECT LPAD(' ', 2*level-1)||SYS_CONNECT_BY_PATH(last_name, '/') "Path" FROM employees START WITH last_name = 'Kochhar' CONNECT BY PRIOR employee_id = manager_id; Path --------------------------------------------------------------- /Kochhar /Kochhar/Greenberg /Kochhar/Greenberg/Faviet /Kochhar/Greenberg/Chen /Kochhar/Greenberg/Sciarra /Kochhar/Greenberg/Urman /Kochhar/Greenberg/Popp /Kochhar/Whalen /Kochhar/Mavris /Kochhar/Baer /Kochhar/Higgins /Kochhar/Higgins/Gietz https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions164.htm

CREATE OR REPLACE TYPE FOO_TYPE IS TABLE OF INTEGER; CREATE
OR REPLACE PROCEDURE SysConnectByPath IS bar_ids_tab FOO_TYPE := foo_type(111, 20, 3, 4, 5); bar_ids_string VARCHAR(1000); BEGIN SELECT SUBSTR(SYS_CONNECT_BY_PATH(column_value, ','), 2) csv INTO bar_ids_string FROM (SELECT column_value, ROW_NUMBER() OVER (ORDER BY column_value ) rn, COUNT(*) OVER () cnt FROM TABLE (bar_ids_tab)) WHERE rn = cnt START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1; DBMS_OUTPUT.PUT_LINE('out ' || bar_ids_string); END; completed in 45 ms out 3,4,5,20,111

ora2pg -i SysConnectByPath.sql -t PROCEDURE -c config/ora2pg.conf CREATE OR REPLACE
FUNCTION sysconnectbypath() RETURNS VOID AS $body$ DECLARE bar_ids_tab FOO_TYPE := foo_type(111, 20, 3, 4, 5); bar_ids_string VARCHAR(1000);WITH RECURSIVE cte AS ( BEGIN SELECT SUBSTR(column_value, 2) CSV INTO STRICT bar_ids_string FROM (SELECT column_value, ROW_NUMBER() OVER (ORDER BY column_value ) rn, COUNT(*) OVER () cnt FROM TABLE(bar_ids_tab) alias6) alias7 WHERE rn = 1 UNION ALL BEGIN SELECT C.bar_ids_string || ',' || SUBSTR(column_value, 2) CSV INTO STRICT bar_ids_string FROM (SELECT column_value, ROW_NUMBER() OVER (ORDER BY column_value ) rn, COUNT(*) OVER () cnt FROM TABLE(bar_ids_tab) alias6) JOIN cte C ON (C.rn + 1 = alias7.rn) ) SELECT * FROM cte WHERE rn = cnt; ; RAISE NOTICE 'out %', bar_ids_string; END; $body$ LANGUAGE PLPGSQL;

Let’s get rid of WITH RECURSIVE DO $$ DECLARE bar_ids_tab
NUMERIC[]; bar_ids_string TEXT; BEGIN bar_ids_tab := '{111, 20, 3, 4, 5}'; SELECT string_agg(x::TEXT, ',') INTO bar_ids_string FROM ( SELECT unnest(bar_ids_tab) AS x ORDER BY x) a; RAISE NOTICE '%', bar_ids_string; END; $$ [00000] 3,4,5,20,111 completed in 3 ms

Postgres WINS! DO $$ DECLARE bar_ids_tab NUMERIC[]; bar_ids_string TEXT; BEGIN
bar_ids_tab := '{111, 20, 3, 4, 5}'; SELECT string_agg(x::TEXT, ',') INTO bar_ids_string FROM ( SELECT unnest(bar_ids_tab) AS x ORDER BY x) a; RAISE NOTICE '%', bar_ids_string; END; $$ CREATE OR REPLACE TYPE FOO_TYPE IS TABLE OF INTEGER; CREATE OR REPLACE PROCEDURE SysConnectByPath IS bar_ids_tab FOO_TYPE := foo_type(111, 20, 3, 4, 5); bar_ids_string VARCHAR(1000); BEGIN SELECT SUBSTR(SYS_CONNECT_BY_PATH(column_value, ','), 2) csv INTO bar_ids_string FROM (SELECT column_value, ROW_NUMBER() OVER (ORDER BY column_value ) rn, COUNT(*) OVER () cnt FROM TABLE (bar_ids_tab)) WHERE rn = cnt START WITH rn = 1 CONNECT BY rn = PRIOR rn + 1; DBMS_OUTPUT.PUT_LINE('out ' || bar_ids_string); END; [00000] 3,4,5,20,111 completed in 3 ms out 3,4,5,20,111 completed in 45 ms

BULK COLLECTS + CURSORS + FOR LOOPS

BULK COLLECT With the BULK COLLECT clause, each of the
preceding statements retrieves an entire result set and stores it in one or more collection variables in a single operation (which is more efficient than using a loop statement to retrieve one result row at a time). https://docs.oracle.com/database/121/LNPLS/tuning.htm#LNPLS891

Oracle snippet CREATE OR REPLACE PROCEDURE BulkCollect(p_version IN NUMBER) IS
CURSOR get_foo (p_version number) IS SELECT * FROM secret_data WHERE foo_id = p_version ORDER BY bar_id; TYPE my_type IS TABLE of get_foo%ROWTYPE INDEX BY pls_integer; my_tab my_type; some_string VARCHAR2(5000); BEGIN OPEN get_foo(1); -- Process one level at a time. LOOP FETCH get_foo BULK COLLECT INTO my_tab LIMIT 1000; EXIT WHEN my_tab.COUNT = 0; FOR indx IN 1 .. my_tab.COUNT LOOP some_string := ''; FOR prnt_indx IN REVERSE 2..indx - 1 LOOP some_string := some_string || ', tab.' || TO_CHAR(my_tab(prnt_indx).my_flag); END LOOP; END LOOP; END LOOP; CLOSE get_foo; END; CALL BulkCollect(1) completed in 41 ms

ora2pg -i BulkCollect.sql -t PROCEDURE –c config/ora2pg.conf CREATE OR REPLACE
FUNCTION bulkcollect(p_version BIGINT) RETURNS VOID AS $body$ DECLARE get_foo CURSOR (p_version BIGINT) FOR SELECT * FROM secret_data WHERE foo_id = p_version ORDER BY bar_id; TYPE MY_TYPE IS TABLE OF RECORD INDEX BY INTEGER; my_tab MY_TYPE; some_string VARCHAR(5000); BEGIN OPEN get_foo(1); -- Process one level at a time. LOOP FETCH get_foo BULK COLLECT INTO my_tab LIMIT 1000; EXIT WHEN my_tab.COUNT = 0; FOR indx IN 1 .. my_tab.COUNT LOOP some_string := ''; FOR prnt_indx IN REVERSE indx..2 - 1 LOOP some_string := some_string || ', tab.' || my_tab[prnt_indx].my_flag::VARCHAR; END LOOP; END LOOP; END LOOP; CLOSE get_foo; END; $body$ LANGUAGE PLPGSQL ;

We don’t like cursors, right? CREATE OR REPLACE FUNCTION get_foo(p_version
INTEGER) RETURNS SETOF SECRET_DATA AS $body$ SELECT * FROM secret_data WHERE foo_id = p_version ORDER BY bar_id; $body$ LANGUAGE SQL STABLE; CURSOR get_foo (p_version number) IS SELECT * FROM secret_data WHERE foo_id = p_version ORDER BY bar_id;

PostgreSQL snippet CREATE PROCEDURE BulkCollect(p_version INT) LANGUAGE plpgsql AS $body$
DECLARE some_string TEXT; get_foo_row RECORD; i INT; BEGIN DROP TABLE IF EXISTS temp_get_foo; CREATE TEMPORARY TABLE temp_get_foo ON COMMIT DROP AS SELECT row_number() OVER () as rnum, * FROM get_foo(p_version); DELETE FROM temp_get_foo WHERE rnum = 1; ANALYZE temp_get_foo; i := 1; FOR get_foo_row IN SELECT * FROM get_foo(p_version) -- Process one level at a time. LOOP SELECT 'tab.' || string_agg(my_flag, ', tab.' ORDER BY rnum DESC) INTO some_string FROM temp_get_foo WHERE rnum < ( SELECT rnum FROM temp_get_foo WHERE bar_id = get_foo_row.bar_id) AND rnum <> (SELECT max(rnum) FROM temp_get_foo); i := i + 1; END LOOP; END; $body$; CALL BulkCollect(1) completed in 14 ms

Postgres WINS! completed in 14 ms completed in 41 ms
CREATE PROCEDURE BulkCollect(p_version INT) LANGUAGE plpgsql AS $body$ DECLARE some_string TEXT; get_foo_row RECORD; i INT; BEGIN DROP TABLE IF EXISTS temp_get_foo; CREATE TEMPORARY TABLE temp_get_foo ON COMMIT DROP AS SELECT row_number() OVER () as rnum, * FROM get_foo(p_version); DELETE FROM temp_get_foo WHERE rnum = 1; ANALYZE temp_get_foo; i := 1; FOR get_foo_row IN SELECT * FROM get_foo(p_version) -- Process one level at a time. LOOP SELECT 'tab.' || string_agg(my_flag, ', tab.' ORDER BY rnum DESC) INTO some_string FROM temp_get_foo WHERE rnum < ( SELECT rnum FROM temp_get_foo WHERE bar_id = get_foo_row.bar_id) AND rnum <> (SELECT max(rnum) FROM temp_get_foo); i := i + 1; END LOOP; END; $body$; CREATE OR REPLACE PROCEDURE BulkCollect(p_version IN NUMBER) IS CURSOR get_foo (p_version number) IS SELECT * FROM secret_data WHERE foo_id = p_version ORDER BY bar_id; TYPE my_type IS TABLE of get_foo%ROWTYPE INDEX BY pls_integer; my_tab my_type; some_string VARCHAR2(5000); BEGIN OPEN get_foo(1); -- Process one level at a time. LOOP FETCH get_foo BULK COLLECT INTO my_tab LIMIT 1000; EXIT WHEN my_tab.COUNT = 0; FOR indx IN 1 .. my_tab.COUNT LOOP some_string := ''; FOR prnt_indx IN REVERSE 2..indx - 1 LOOP some_string := some_string || ', tab.' || TO_CHAR(my_tab(prnt_indx).my_flag); END LOOP; END LOOP; END LOOP; CLOSE get_foo; END;

PostgreSQL snippet (another set replaced the cursor and used with
dynamic queries) completed in 31 ms

Cursor free execution – datatype mapping SELECT c.m_h_n_d_id FROM m_h
e JOIN m_h_l d ON d.m_h_v_id = e.current_m_h_v_id JOIN m_h_n c ON c.M_H_V_ID = d.M_H_V_ID AND c.m_h_l_id = d.m_h_l_id JOIN m_h_l_t g ON d.m_h_l_t_id = g.m_h_l_t_id WHERE UPPER(g.m_h_l_t_c) = ('L') AND e.M_H_ID = 71 I/O Timings: read=9706.450 -> Hash Join (cost=1.10..138954.78 rows=7446 width=25) (actual time=1447.797..4278.543 rows=13005 loops=3) Hash Cond: ((c_1.m_h_v_id)::numeric = e.current_m_h_v_id) Buffers: shared hit=6929 read=102289 I/O Timings: read=9706.450 -> Parallel Append (cost=0.00..131433.18 rows=1489208 width=20) (actual time=0.279..3902.550 rows=1191353 loops=3) Buffers: shared hit=6796 read=102289 I/O Timings: read=9706.450 -> Parallel Seq Scan on m_h_n_sys_p2816 c_1 (cost=0.00..123976.91 rows=1489191 width=20) (actual time=0.277..3808.935 rows=1191353 loops=3) Buffers: shared hit=6796 read=102289 I/O Timings: read=9706.450 -> Parallel Seq Scan on m_h_n_p0 c (cost=0.00..10.24 rows=24 width=20) (actual time=0.000..0.001 rows=0 loops=1) -> Hash (cost=1.09..1.09 rows=1 width=5) (actual time=0.138..0.138 rows=1 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 9kB Buffers: shared hit=3 -> Seq Scan on m_h e (cost=0.00..1.09 rows=1 width=5) (actual time=0.117..0.119 rows=1 loops=3) Filter: (m_h_id = '71'::numeric) Rows Removed by Filter: 6 Buffers: shared hit=3 -> Append (cost=0.14..7247.57 rows=1272 width=20) (actual time=0.021..16.859 rows=38383 loops=1) Buffers: shared hit=2477 -> Index Scan using m_h_n_p0_m_h_v_id_m_h_l_id_idx on m_h_n_p0 c (cost=0.14..0.30 rows=1 width=20) (never executed) Index Cond: (m_h_v_id = d_1.m_h_v_id) Filter: (d_1.m_h_l_id = (m_h_l_id)::numeric) -> Index Scan using m_h_n_sys_p2816_m_h_v_id_idx on m__n_sys_p2816 c_1 (cost=0.43..7240.91 rows=1271 width=20) (actual time=0.019..15.490 rows=38383 loops=1) Index Cond: (m_h_v_id = d_1.m_h_v_id) Filter: (d_1.m_h_l_id = (m_h_l_id)::numeric) Rows Removed by Filter: 631 Buffers: shared hit=2477 Incorrect datatype mapping captured in EXECUTION PLAN – 9329 ms overall Correct datatype mapping captured in EXECUTION PLAN – 31 ms overall

Migrate for Performance General Rules

General Rules Existing code Nested Loops Dynamic Queries Refcursor as
output along with other output params Replacements Reduced loop nesting with improvements SETOF Record as output Improved logic for dynamic queries to make them performant

Our Azure Postgres service page—and our blog! Azure Database for
PostgreSQL https://aka.ms/azure-postgres Azure Postgres Blog https://aka.ms/azure-postgres-blog

Migrations https://aka.ms/postgres-migration-tutorial Wealth of documentation resources, too Azure Postgres Quickstart
Docs https://aka.ms/azure-postgres-quickstart Azure Database for PostgreSQL https://aka.ms/azure-postgres Azure Postgres Blog https://aka.ms/azure-postgres-blog

Migrations https://aka.ms/postgres-migration-tutorial [email protected] Citus open source packages on GitHub—also, Email
https://aka.ms/citus Azure Postgres Quickstart Docs https://aka.ms/azure-postgres-quickstart Azure Database for PostgreSQL https://aka.ms/azure-postgres Azure Postgres Blog https://aka.ms/azure-postgres-blog

© Copyright Microsoft Corporation. All rights reserved. danke schön dank
u merci dziękuję धन्यवाद teşekkürler thank you grazie gracias tack @StiepanTrofimo @AzureDBPostgres Alicja Kucharczyk Sushant Pandey

The story about the migration: Oracle to Postgr...

The story about the migration: Oracle to Postgres | Postges Build 2020 | Alicja Kucharczyk & Sushant Pandey

Azure Database for PostgreSQL

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Featured

Transcript

The story about the migration Sushant Pandey Engineering Architect Postgres

Overview

Overview Oracle Exadata to Azure Database for PostgreSQL – Single

PoC Scope 2 packages from 1 schema + single objects

Success Criteria • Comparable performance to the existing Oracle Exadata

Migration Tooling Schema and Data migration

Results  Schema and Data migration

Test Set Test 1 Node 1 duration Node 2 duration

Test 1 - Tree Traversal Node Oracle (ms) Postgres (ms)

Test 2 – Cursor Free Execution Node Oracle (ms) Postgres

Test 3 – Cursor Free Execution – datatype mapping scenario

Code Migration

SYS_CONNECT_BY_PATH

SYS_CONNECT_BY_PATH valid only in hierarchical queries. It returns the path

CREATE OR REPLACE TYPE FOO_TYPE IS TABLE OF INTEGER; CREATE

ora2pg -i SysConnectByPath.sql -t PROCEDURE -c config/ora2pg.conf CREATE OR REPLACE

Let’s get rid of WITH RECURSIVE DO $$ DECLARE bar_ids_tab

Postgres WINS! DO $$ DECLARE bar_ids_tab NUMERIC[]; bar_ids_string TEXT; BEGIN

BULK COLLECTS + CURSORS + FOR LOOPS

BULK COLLECT With the BULK COLLECT clause, each of the

Oracle snippet CREATE OR REPLACE PROCEDURE BulkCollect(p_version IN NUMBER) IS

ora2pg -i BulkCollect.sql -t PROCEDURE –c config/ora2pg.conf CREATE OR REPLACE

We don’t like cursors, right? CREATE OR REPLACE FUNCTION get_foo(p_version

PostgreSQL snippet CREATE PROCEDURE BulkCollect(p_version INT) LANGUAGE plpgsql AS $body$

Postgres WINS! completed in 14 ms completed in 41 ms

PostgreSQL snippet (another set replaced the cursor and used with

Cursor free execution – datatype mapping SELECT c.m_h_n_d_id FROM m_h

Migrate for Performance General Rules

General Rules Existing code Nested Loops Dynamic Queries Refcursor as

Our Azure Postgres service page—and our blog! Azure Database for

Migrations https://aka.ms/postgres-migration-tutorial Wealth of documentation resources, too Azure Postgres Quickstart

Migrations https://aka.ms/postgres-migration-tutorial [email protected] Citus open source packages on GitHub—also, Email

© Copyright Microsoft Corporation. All rights reserved. danke schön dank