Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The story about the migration: Oracle to Postgres | Postges Build 2020 | Alicja Kucharczyk & Sushant Pandey

The story about the migration: Oracle to Postgres | Postges Build 2020 | Alicja Kucharczyk & Sushant Pandey

The story about a challenging PoC that proved that Postgres can achieve the same performance as Oracle Exadata. The schema that was migrated wasn’t the simplest one you might see.

Azure Database for PostgreSQL

December 09, 2020
Tweet

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. The story about the migration
    Sushant Pandey Engineering Architect
    Postgres Build 2020, Dec 9.
    Alicja Kucharczyk EMEA Global Black Belt
    OSS Data Tech Specialist

    View Slide

  2. Overview

    View Slide

  3. Overview
    Oracle Exadata
    to
    Azure Database for PostgreSQL – Single Server, v11
    4vCores, 20GB (Postgres) <–> 4vCores, 32GB (Oracle)

    View Slide

  4. PoC Scope
    2 packages from 1 schema
    + single objects and schemas from dependent packages, e.g. for
    logging

    View Slide

  5. Success Criteria
    • Comparable performance to the existing Oracle Exadata instance
    • Based on test script provided by customer which tests performance
    of 2 packages.

    View Slide

  6. Migration Tooling
    Schema and Data migration

    View Slide

  7. Results
     Schema and Data migration

    View Slide

  8. Test Set
    Test 1
    Node 1
    duration
    Node 2
    duration
    [...]

    View Slide

  9. Test 1 - Tree Traversal
    Node Oracle
    (ms)
    Postgres
    (ms)
    Postgres vs. Oracle %
    Node 1 1 16 6,25
    Node 2
    1 16 6,25
    Node 3 0 0 100
    Node 4 0 0 100
    Node 5 0 0 100
    Node 6 0 0 100
    Node 7 2 0 200
    Total
    4 32 12,5

    View Slide

  10. Test 2 – Cursor Free Execution
    Node Oracle
    (ms)
    Postgres
    (ms)
    Postgres vs. Oracle %
    Node 1 738 62 1190
    Node 2 738 47 1570
    Node 3 1427 47 3036
    Node 4 1424 63 2260
    Node 5 1132 31 3651
    Node 6 1154 31 3722
    Node 7 2340 344 680
    Node 8 2256 375 601
    Node 9 2240 422 530
    Node 10 2242 375 597
    Total 15691 1797 873

    View Slide

  11. Test 3 – Cursor Free Execution – datatype mapping scenario
    Node Oracle
    (ms)
    Postgres
    (ms)
    Postgres vs. Oracle %
    Node 1 28 31 90
    Node 2 27 16 168,75
    Total 55 47 117

    View Slide

  12. Code Migration

    View Slide

  13. SYS_CONNECT_BY_PATH

    View Slide

  14. SYS_CONNECT_BY_PATH
    valid only in hierarchical
    queries. It returns the path
    of a column value from root
    to node, with column values
    separated by char for each
    row returned by CONNECT
    BY condition.
    SELECT LPAD(' ', 2*level-1)||SYS_CONNECT_BY_PATH(last_name, '/')
    "Path"
    FROM employees
    START WITH last_name = 'Kochhar'
    CONNECT BY PRIOR employee_id = manager_id;
    Path
    ---------------------------------------------------------------
    /Kochhar
    /Kochhar/Greenberg
    /Kochhar/Greenberg/Faviet
    /Kochhar/Greenberg/Chen
    /Kochhar/Greenberg/Sciarra
    /Kochhar/Greenberg/Urman
    /Kochhar/Greenberg/Popp
    /Kochhar/Whalen
    /Kochhar/Mavris
    /Kochhar/Baer
    /Kochhar/Higgins
    /Kochhar/Higgins/Gietz
    https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions164.htm

    View Slide

  15. CREATE OR REPLACE TYPE FOO_TYPE IS
    TABLE OF INTEGER;
    CREATE OR REPLACE PROCEDURE SysConnectByPath
    IS
    bar_ids_tab FOO_TYPE := foo_type(111, 20, 3, 4, 5);
    bar_ids_string VARCHAR(1000);
    BEGIN
    SELECT SUBSTR(SYS_CONNECT_BY_PATH(column_value, ','), 2) csv
    INTO bar_ids_string
    FROM (SELECT column_value,
    ROW_NUMBER() OVER (ORDER BY column_value ) rn,
    COUNT(*) OVER () cnt
    FROM TABLE (bar_ids_tab))
    WHERE rn = cnt
    START WITH rn = 1
    CONNECT BY rn = PRIOR rn + 1;
    DBMS_OUTPUT.PUT_LINE('out ' || bar_ids_string);
    END;
    completed in 45 ms
    out 3,4,5,20,111

    View Slide

  16. ora2pg -i SysConnectByPath.sql -t PROCEDURE -c config/ora2pg.conf
    CREATE OR REPLACE FUNCTION sysconnectbypath() RETURNS VOID AS
    $body$
    DECLARE
    bar_ids_tab FOO_TYPE := foo_type(111, 20, 3, 4, 5);
    bar_ids_string VARCHAR(1000);WITH RECURSIVE cte AS (
    BEGIN
    SELECT SUBSTR(column_value, 2) CSV
    INTO STRICT bar_ids_string
    FROM (SELECT column_value,
    ROW_NUMBER() OVER (ORDER BY column_value ) rn,
    COUNT(*) OVER () cnt
    FROM TABLE(bar_ids_tab) alias6) alias7
    WHERE rn = 1
    UNION ALL
    BEGIN
    SELECT C.bar_ids_string || ',' || SUBSTR(column_value, 2) CSV
    INTO STRICT bar_ids_string
    FROM (SELECT column_value,
    ROW_NUMBER() OVER (ORDER BY column_value ) rn,
    COUNT(*) OVER () cnt
    FROM TABLE(bar_ids_tab) alias6) JOIN cte C ON (C.rn + 1 = alias7.rn)
    ) SELECT * FROM cte WHERE rn = cnt;
    ;
    RAISE NOTICE 'out %', bar_ids_string;
    END;
    $body$
    LANGUAGE PLPGSQL;

    View Slide

  17. Let’s get rid of WITH RECURSIVE
    DO
    $$
    DECLARE
    bar_ids_tab NUMERIC[];
    bar_ids_string TEXT;
    BEGIN
    bar_ids_tab := '{111, 20, 3, 4, 5}';
    SELECT string_agg(x::TEXT, ',')
    INTO bar_ids_string
    FROM (
    SELECT unnest(bar_ids_tab) AS x
    ORDER BY x) a;
    RAISE NOTICE '%', bar_ids_string;
    END;
    $$
    [00000] 3,4,5,20,111
    completed in 3 ms

    View Slide

  18. Postgres WINS!
    DO
    $$
    DECLARE
    bar_ids_tab NUMERIC[];
    bar_ids_string TEXT;
    BEGIN
    bar_ids_tab := '{111, 20, 3, 4, 5}';
    SELECT string_agg(x::TEXT, ',')
    INTO bar_ids_string
    FROM (
    SELECT unnest(bar_ids_tab) AS x
    ORDER BY x) a;
    RAISE NOTICE '%', bar_ids_string;
    END;
    $$
    CREATE OR REPLACE TYPE FOO_TYPE IS
    TABLE OF INTEGER;
    CREATE OR REPLACE PROCEDURE SysConnectByPath
    IS
    bar_ids_tab FOO_TYPE := foo_type(111, 20, 3, 4, 5);
    bar_ids_string VARCHAR(1000);
    BEGIN
    SELECT SUBSTR(SYS_CONNECT_BY_PATH(column_value, ','), 2) csv
    INTO bar_ids_string
    FROM (SELECT column_value,
    ROW_NUMBER() OVER (ORDER BY column_value ) rn,
    COUNT(*) OVER () cnt
    FROM TABLE (bar_ids_tab))
    WHERE rn = cnt
    START WITH rn = 1
    CONNECT BY rn = PRIOR rn + 1;
    DBMS_OUTPUT.PUT_LINE('out ' || bar_ids_string);
    END;
    [00000] 3,4,5,20,111
    completed in 3 ms
    out 3,4,5,20,111
    completed in 45 ms

    View Slide

  19. BULK COLLECTS + CURSORS + FOR LOOPS

    View Slide

  20. BULK COLLECT
    With the BULK COLLECT clause, each of the preceding statements retrieves an
    entire result set and stores it in one or more collection variables in a single
    operation (which is more efficient than using a loop statement to retrieve one
    result row at a time).
    https://docs.oracle.com/database/121/LNPLS/tuning.htm#LNPLS891

    View Slide

  21. Oracle snippet
    CREATE OR REPLACE PROCEDURE BulkCollect(p_version IN NUMBER)
    IS
    CURSOR get_foo (p_version number) IS
    SELECT * FROM secret_data
    WHERE foo_id = p_version
    ORDER BY bar_id;
    TYPE my_type IS TABLE of get_foo%ROWTYPE INDEX BY pls_integer;
    my_tab my_type;
    some_string VARCHAR2(5000);
    BEGIN
    OPEN get_foo(1);
    -- Process one level at a time.
    LOOP
    FETCH get_foo
    BULK COLLECT INTO my_tab LIMIT 1000;
    EXIT WHEN my_tab.COUNT = 0;
    FOR indx IN 1 .. my_tab.COUNT
    LOOP
    some_string := '';
    FOR prnt_indx IN REVERSE 2..indx - 1
    LOOP
    some_string := some_string || ', tab.' ||
    TO_CHAR(my_tab(prnt_indx).my_flag);
    END LOOP;
    END LOOP;
    END LOOP;
    CLOSE get_foo;
    END;
    CALL BulkCollect(1)
    completed in 41 ms

    View Slide

  22. ora2pg -i BulkCollect.sql -t PROCEDURE –c config/ora2pg.conf
    CREATE OR REPLACE FUNCTION bulkcollect(p_version BIGINT) RETURNS VOID AS
    $body$
    DECLARE
    get_foo CURSOR (p_version BIGINT) FOR
    SELECT *
    FROM secret_data
    WHERE foo_id = p_version
    ORDER BY bar_id;
    TYPE MY_TYPE IS TABLE OF RECORD INDEX BY INTEGER;
    my_tab MY_TYPE;
    some_string VARCHAR(5000);
    BEGIN
    OPEN get_foo(1);
    -- Process one level at a time.
    LOOP
    FETCH get_foo BULK COLLECT INTO my_tab LIMIT 1000;
    EXIT WHEN my_tab.COUNT = 0;
    FOR indx IN 1 .. my_tab.COUNT
    LOOP
    some_string := '';
    FOR prnt_indx IN REVERSE indx..2 - 1
    LOOP
    some_string := some_string || ', tab.' ||
    my_tab[prnt_indx].my_flag::VARCHAR;
    END LOOP;
    END LOOP;
    END LOOP;
    CLOSE get_foo;
    END;
    $body$
    LANGUAGE PLPGSQL
    ;

    View Slide

  23. We don’t like cursors, right?
    CREATE OR REPLACE FUNCTION
    get_foo(p_version INTEGER)
    RETURNS SETOF SECRET_DATA AS
    $body$
    SELECT * FROM secret_data
    WHERE foo_id = p_version
    ORDER BY bar_id;
    $body$
    LANGUAGE SQL STABLE;
    CURSOR get_foo (p_version number) IS
    SELECT * FROM secret_data
    WHERE foo_id = p_version
    ORDER BY bar_id;

    View Slide

  24. PostgreSQL snippet
    CREATE PROCEDURE BulkCollect(p_version INT)
    LANGUAGE plpgsql AS
    $body$
    DECLARE
    some_string TEXT;
    get_foo_row RECORD;
    i INT;
    BEGIN
    DROP TABLE IF EXISTS temp_get_foo;
    CREATE TEMPORARY TABLE temp_get_foo ON COMMIT DROP AS
    SELECT row_number() OVER () as rnum, * FROM get_foo(p_version);
    DELETE FROM temp_get_foo WHERE rnum = 1;
    ANALYZE temp_get_foo;
    i := 1;
    FOR get_foo_row IN SELECT * FROM get_foo(p_version)
    -- Process one level at a time.
    LOOP
    SELECT 'tab.' || string_agg(my_flag, ', tab.' ORDER BY rnum DESC)
    INTO some_string
    FROM temp_get_foo
    WHERE rnum < (
    SELECT rnum
    FROM temp_get_foo
    WHERE bar_id = get_foo_row.bar_id)
    AND rnum <> (SELECT max(rnum) FROM temp_get_foo);
    i := i + 1;
    END LOOP;
    END;
    $body$;
    CALL BulkCollect(1)
    completed in 14 ms

    View Slide

  25. Postgres WINS!
    completed in 14 ms completed in 41 ms
    CREATE PROCEDURE BulkCollect(p_version INT)
    LANGUAGE plpgsql AS
    $body$
    DECLARE
    some_string TEXT;
    get_foo_row RECORD;
    i INT;
    BEGIN
    DROP TABLE IF EXISTS temp_get_foo;
    CREATE TEMPORARY TABLE temp_get_foo ON COMMIT DROP AS
    SELECT row_number() OVER () as rnum, * FROM get_foo(p_version);
    DELETE FROM temp_get_foo WHERE rnum = 1;
    ANALYZE temp_get_foo;
    i := 1;
    FOR get_foo_row IN SELECT * FROM get_foo(p_version)
    -- Process one level at a time.
    LOOP
    SELECT 'tab.' || string_agg(my_flag, ', tab.' ORDER BY rnum DESC)
    INTO some_string
    FROM temp_get_foo
    WHERE rnum < (
    SELECT rnum
    FROM temp_get_foo
    WHERE bar_id = get_foo_row.bar_id)
    AND rnum <> (SELECT max(rnum) FROM temp_get_foo);
    i := i + 1;
    END LOOP;
    END;
    $body$;
    CREATE OR REPLACE PROCEDURE BulkCollect(p_version IN NUMBER)
    IS
    CURSOR get_foo (p_version number) IS
    SELECT * FROM secret_data
    WHERE foo_id = p_version
    ORDER BY bar_id;
    TYPE my_type IS TABLE of get_foo%ROWTYPE INDEX BY pls_integer;
    my_tab my_type;
    some_string VARCHAR2(5000);
    BEGIN
    OPEN get_foo(1);
    -- Process one level at a time.
    LOOP
    FETCH get_foo
    BULK COLLECT INTO my_tab LIMIT 1000;
    EXIT WHEN my_tab.COUNT = 0;
    FOR indx IN 1 .. my_tab.COUNT
    LOOP
    some_string := '';
    FOR prnt_indx IN REVERSE 2..indx - 1
    LOOP
    some_string := some_string || ', tab.' ||
    TO_CHAR(my_tab(prnt_indx).my_flag);
    END LOOP;
    END LOOP;
    END LOOP;
    CLOSE get_foo;
    END;

    View Slide

  26. PostgreSQL snippet (another set replaced the cursor and used with dynamic queries)
    completed in 31 ms

    View Slide

  27. Cursor free execution – datatype mapping
    SELECT c.m_h_n_d_id
    FROM m_h e
    JOIN m_h_l d ON d.m_h_v_id = e.current_m_h_v_id
    JOIN m_h_n c ON c.M_H_V_ID = d.M_H_V_ID AND c.m_h_l_id = d.m_h_l_id
    JOIN m_h_l_t g ON d.m_h_l_t_id = g.m_h_l_t_id
    WHERE UPPER(g.m_h_l_t_c) = ('L')
    AND e.M_H_ID = 71
    I/O Timings: read=9706.450
    -> Hash Join (cost=1.10..138954.78 rows=7446 width=25) (actual time=1447.797..4278.543 rows=13005 loops=3)
    Hash Cond: ((c_1.m_h_v_id)::numeric = e.current_m_h_v_id)
    Buffers: shared hit=6929 read=102289
    I/O Timings: read=9706.450
    -> Parallel Append (cost=0.00..131433.18 rows=1489208 width=20) (actual time=0.279..3902.550 rows=1191353 loops=3)
    Buffers: shared hit=6796 read=102289
    I/O Timings: read=9706.450
    -> Parallel Seq Scan on m_h_n_sys_p2816 c_1 (cost=0.00..123976.91 rows=1489191 width=20) (actual time=0.277..3808.935 rows=1191353 loops=3)
    Buffers: shared hit=6796 read=102289
    I/O Timings: read=9706.450
    -> Parallel Seq Scan on m_h_n_p0 c (cost=0.00..10.24 rows=24 width=20) (actual time=0.000..0.001 rows=0 loops=1)
    -> Hash (cost=1.09..1.09 rows=1 width=5) (actual time=0.138..0.138 rows=1 loops=3)
    Buckets: 1024 Batches: 1 Memory Usage: 9kB
    Buffers: shared hit=3
    -> Seq Scan on m_h e (cost=0.00..1.09 rows=1 width=5) (actual time=0.117..0.119 rows=1 loops=3)
    Filter: (m_h_id = '71'::numeric)
    Rows Removed by Filter: 6
    Buffers: shared hit=3
    -> Append (cost=0.14..7247.57 rows=1272 width=20) (actual time=0.021..16.859 rows=38383 loops=1)
    Buffers: shared hit=2477
    -> Index Scan using m_h_n_p0_m_h_v_id_m_h_l_id_idx on m_h_n_p0 c (cost=0.14..0.30 rows=1 width=20) (never executed)
    Index Cond: (m_h_v_id = d_1.m_h_v_id)
    Filter: (d_1.m_h_l_id = (m_h_l_id)::numeric)
    -> Index Scan using m_h_n_sys_p2816_m_h_v_id_idx on m__n_sys_p2816 c_1 (cost=0.43..7240.91 rows=1271 width=20) (actual time=0.019..15.490 rows=38383 loops=1)
    Index Cond: (m_h_v_id = d_1.m_h_v_id)
    Filter: (d_1.m_h_l_id = (m_h_l_id)::numeric)
    Rows Removed by Filter: 631
    Buffers: shared hit=2477
    Incorrect datatype mapping captured in
    EXECUTION PLAN – 9329 ms overall
    Correct datatype mapping captured in
    EXECUTION PLAN – 31 ms overall

    View Slide

  28. Migrate for Performance
    General Rules

    View Slide

  29. General Rules
    Existing code
    Nested Loops
    Dynamic Queries
    Refcursor as output along with other output params
    Replacements
    Reduced loop nesting with improvements
    SETOF Record as output
    Improved logic for dynamic queries to make them performant

    View Slide

  30. Our Azure Postgres service page—and our blog!
    Azure Database for PostgreSQL
    https://aka.ms/azure-postgres
    Azure Postgres Blog
    https://aka.ms/azure-postgres-blog

    View Slide

  31. Migrations
    https://aka.ms/postgres-migration-tutorial
    Wealth of documentation resources, too
    Azure Postgres Quickstart Docs
    https://aka.ms/azure-postgres-quickstart
    Azure Database for PostgreSQL
    https://aka.ms/azure-postgres
    Azure Postgres Blog
    https://aka.ms/azure-postgres-blog

    View Slide

  32. Migrations
    https://aka.ms/postgres-migration-tutorial
    [email protected]
    Citus open source packages on GitHub—also, Email
    https://aka.ms/citus
    Azure Postgres Quickstart Docs
    https://aka.ms/azure-postgres-quickstart
    Azure Database for PostgreSQL
    https://aka.ms/azure-postgres
    Azure Postgres Blog
    https://aka.ms/azure-postgres-blog

    View Slide

  33. © Copyright Microsoft Corporation. All rights reserved.
    danke schön
    dank u
    merci
    dziękuję
    धन्यवाद
    teşekkürler
    thank you
    grazie
    gracias
    tack
    @StiepanTrofimo
    @AzureDBPostgres
    Alicja Kucharczyk
    Sushant Pandey

    View Slide