Slide 1

Slide 1 text

©EDB 2024 — ALL RIGHTS RESERVED. Playing Nice with Your Friends Database Diversity with Postgres FDWs Richard Yen COSCUP 2024 2024-08-03

Slide 2

Slide 2 text

© EDB 2024 - All Rights Reserved 2 ● Software Developer/Support Engineer @ EDB since 2015 ● Previously a DBA and Web Developer ● Been using PostgreSQL since v. 7.4 About Me

Slide 3

Slide 3 text

© EDB 2024 - All Rights Reserved 3 ● Not ready to go 100% PostgreSQL ● 100% PostgreSQL is not working out ● Want to leverage features in other engines ● Organizational structure does not make 100% PostgreSQL feasible ● Don't want to put all your eggs into one basket ● Want to leverage more hardware Is this you?

Slide 4

Slide 4 text

© EDB 2024 - All Rights Reserved 4 ● What are Foreign Data Wrappers? ● Varieties and use cases for FDWs ● Setting up FDWs ● Demo ● Things to consider when implementing FDWs into your infrastructure ● How FDWs help achieve diversity ● Creating your own FDW ● Demo Our Roadmap

Slide 5

Slide 5 text

© EDB 2024 - All Rights Reserved 5 ● Extensions that allow access to data outside of a PostgreSQL database cluster ● Could be another PostgreSQL database ● Another RDBMS/NoSQL database ● Flat files, REST APIs, etc. ● Data from the external data source is represented as a table (called a foreign table) ● Foreign tables can be used in SQL queries just like ordinary tables ● In many cases, foreign tables can receive DML What are Foreign Data Wrappers?

Slide 6

Slide 6 text

© EDB 2024 - All Rights Reserved 6 ● Oracle DBLink ● PG 7.2 implemented its own version of DBLink ● Access to data was not very intuitive (used dblink-specific functions) A bit of history postgres=# create view foreign_view as select * from dblink('host=pg1 user=postgres dbname=postgres', 'select aid,bid,abalance, fi ller from pgbench_accounts') as t1 (aid int,bid int,abalance int, fi ller text); CREATE VIEW postgres=# select * from foreign_view where aid = 1; aid | bid | abalance | fi ller -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 3 | (1 row) postgres=# select * from dblink('host=pg1 user=postgres dbname=postgres', 'select aid,bid,abalance, fi ller from pgbench_accounts') as t1 (aid int,bid int,abalance int, fi ller text) where aid = 1; aid | bid | abalance | fi ller -----+-----+----------+-------------------------------------------------------------------------------------- 1 | 1 | 3 | (1 row)

Slide 7

Slide 7 text

© EDB 2024 - All Rights Reserved 7 ● Oracle DBLink ● PG 7.2 implemented its own version of DBLink ● Access to data was not very intuitive (used dblink-specific functions) ● SQL/MED (SQL Management of External Data) standard introduced in 2003 ● PG 9.1 introduced file_fdw ● PG 9.3 introduced postgres_fdw ● Very few database engines implement SQL/MED (and Oracle is not one of them!) ● Most other database engines don't support as many external sources as PG A bit of history

Slide 8

Slide 8 text

© EDB 2024 - All Rights Reserved 8 FDW is Smarter and Faster postgres=# explain (analyze, verbose) select * from dblink('host=pg1 user=postgres dbname=postgres', 'select aid,bid,abalance, fi ller from pgbench_accounts') as t1 (aid int,bid int,abalance int, fi ller text) where aid = 1; QUERY PLAN ---------------------------------------------------------------------------------------------- Function Scan on public.dblink t1 (cost=0.00..12.50 rows=5 width=44) (actual time=276.996..315.074 rows=1 loops=1) Output: aid, bid, abalance, fi ller Function Call: dblink('host=pg1 user=postgres dbname=postgres'::text, 'select aid,bid,abalance, fi ller from pgbench_accounts'::text) Filter: (t1.aid = 1) Rows Removed by Filter: 99999 (...) Time: 319.799 ms postgres=# explain (analyze, verbose) select * from foreign_table where aid = 1; QUERY PLAN ----------------------------------------------------------------------------------------------------- Foreign Scan on public.foreign_table (cost=100.00..4629.02 rows=1 width=97) (actual time=1.220..1.222 rows=1 loops=1) Output: aid, bid, abalance, fi ller Remote SQL: SELECT aid, bid, abalance, fi ller FROM public.pgbench_accounts WHERE ((aid = 1)) (...) Time: 4.259 ms

Slide 9

Slide 9 text

© EDB 2024 - All Rights Reserved 9 FDW Today

Slide 10

Slide 10 text

© EDB 2024 - All Rights Reserved 10 ● Install the extension ● Create a foreign server ● Create a user mapping (if necessary) ● Create foreign tables (or import foreign schema) Getting Started with FDW postgres=# CREATE EXTENSION postgres_fdw; CREATE EXTENSION postgres=# CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'pg1', port '5432', dbname 'postgres'); CREATE SERVER postgres=# CREATE USER MAPPING FOR postgres SERVER foreign_server OPTIONS (user 'postgres', password 'password'); CREATE USER MAPPING postgres=# CREATE FOREIGN TABLE foreign_table (aid integer NOT NULL, bid integer, abalance integer, fi ller text) SERVER foreign_server OPTIONS (schema_name 'public', table_name 'pgbench_accounts'); CREATE FOREIGN TABLE -- Another option for creating foreign tables postgres=# IMPORT FOREIGN SCHEMA public FROM SERVER foreign_server INTO public; IMPORT FOREIGN SCHEMA

Slide 11

Slide 11 text

© EDB 2024 - All Rights Reserved 11 ● Useful commands: ● \des -- describe external (foreign) servers ● \deu -- describe eternal user mappings ● \det -- describe external tables Getting Started with FDW

Slide 12

Slide 12 text

© EDB 2024 - All Rights Reserved 12 Demo

Slide 13

Slide 13 text

© EDB 2024 - All Rights Reserved 13 Tips ● Be mindful of what gets pushed down to the remote server postgres=# explain (verbose, analyze) select * from foreign_table ft join pgbench_accounts a on a.aid=ft.aid where a.abalance = 10 or ft.abalance = 9; QUERY PLAN ------------------------------------------------------------------------------------------------------------ -------------- Hash Join (cost=7192.00..18422.50 rows=14243 width=194) (actual time=109.947..963.111 rows=14502 loops=1) (...) -> Foreign Scan on public.foreign_table ft (cost=100.00..6379.00 rows=100000 width=97) (actual time=3.742..612.295 rows=100000 loops=1) Output: ft.aid, ft.bid, ft.abalance, ft. fi ller Remote SQL: SELECT aid, bid, abalance, fi ller FROM public.pgbench_accounts -> Hash (cost=4279.00..4279.00 rows=100000 width=97) (actual time=105.742..105.744 rows=100000 loops=1) (...) Planning Time: 0.402 ms Execution Time: 966.560 ms (16 rows)

Slide 14

Slide 14 text

© EDB 2024 - All Rights Reserved 14 Tips ● Be mindful of what gets pushed down to the remote server postgres=# explain (verbose, analyze) select * from foreign_table ft join pgbench_accounts a on a.aid=ft.aid where ft.abalance = 10 or ft.abalance = 9; QUERY PLAN ------------------------------------------------------------------------------------------------------ ------------- Hash Join (cost=5352.20..10151.80 rows=14560 width=194) (actual time=130.925..193.293 rows=15156 loops=1) (...) -> Hash (cost=5170.20..5170.20 rows=14560 width=97) (actual time=128.999..129.003 rows=15156 loops=1) Output: ft.aid, ft.bid, ft.abalance, ft. fi ller Buckets: 16384 Batches: 1 Memory Usage: 2038kB -> Foreign Scan on public.foreign_table ft (cost=100.00..5170.20 rows=14560 width=97) (actual time=3.912..123.474 rows=15156 loops=1) Output: ft.aid, ft.bid, ft.abalance, ft. fi ller Remote SQL: SELECT aid, bid, abalance, fi ller FROM public.pgbench_accounts WHERE (((abalance = 10) OR (abalance = 9))) Planning Time: 0.437 ms Execution Time: 196.240 ms (13 rows)

Slide 15

Slide 15 text

© EDB 2024 - All Rights Reserved 15 Tips ● Network will always be your biggest bottleneck, so minimize the amount of data sent from the remote server ● Performance can depend on foreign server's hardware ● DDL changes are not automatically propagated ● Upgrades need more care and attention, in case of any incompatibilities ● User mappings require thoughtfulness regarding security ● Backups of database do not include data in foreign tables TL;DR - remember that you're no longer dealing with just one machine, but two or more

Slide 16

Slide 16 text

© EDB 2024 - All Rights Reserved 16 Data Diversity ● Because FDWs allow you to access external data for 50+ sources, you can leverage PostgreSQL to connect two other sources ● Good for ETL Pipelines ● Good for varied infrastructure ● Prevents having to migrate data from one place to another ● Demo: MySQL + SQLite

Slide 17

Slide 17 text

© EDB 2024 - All Rights Reserved 17 Building your own FDW ● Native ● Requires C programming knowledge ● See the PostgreSQL docs ● Multicorn (https://multicorn.org/) ● Wrapper interface for Python apps ● Not actively developed, but still works ● FDW APIs exist for other languages (i.e., Holycorn for Ruby")

Slide 18

Slide 18 text

© EDB 2024 - All Rights Reserved 18 Building your own FDW

Slide 19

Slide 19 text

© EDB 2024 - All Rights Reserved 19 Demo

Slide 20

Slide 20 text

©EDB 2024 — ALL RIGHTS RESERVED. Thank you for attending COSCUP 2024!