Slide 1

Slide 1 text

An introduction to ScalarDB & ScalarDB Analytics 2023-12-26 @PostgreSQL Unconference Akihiro Okuno 1

Slide 2

Slide 2 text

About me ● Akihiro Okuno ● @choplin ● Software Engineer @Scalar Inc. ● Recent Interests: Zig, Wasm 2

Slide 3

Slide 3 text

Contents ● Introduction to ScalarDB ● Introduction to ScalarDB Analytics ○ ScalarDB FDW ○ Schema Importer 3

Slide 4

Slide 4 text

ScalarDB: A Polystore Transaction Manager ● Achieves global transactions across multiple disparate databases. ● Supports various kinds of databases, such as RDBs and NoSQLs. ● ScalarDB has been used by some of Fortune Global 500 companies. Relational databases NoSQLs 4 Ref: https://speakerdeck.com/scalar/scalardb-universal-transaction-manager-for-polystores-vldb23

Slide 5

Slide 5 text

What is ScalarDB Analytics? ScalarDB, as a universal transaction manager, targets mainly transactional workloads and therefore supports limited subsets of relational queries. ScalarDB Analytics with PostgreSQL extends the functionality of ScalarDB to process analytical queries on ScalarDB-managed data by using PostgreSQL and its foreign data wrapper (FDW) extension. https://github.com/scalar-labs/scalardb-analytics-postgresql/blob/main/docs/getting-started.md Online Transaction Processing (OLTP) Online Analytical Processing (OLAP) 5

Slide 6

Slide 6 text

Requirements for ScalarDB Analytics User ScalarDB ScalarDB Analytics Backend Storage 6 ● Users can queries on all ScalarDB-managed Data ○ Shared data source ○ ScalarDB Analytics must support all storage supported by ScalarDB ● Users can read consistent data ○ Read-Committed isolation ○ ScalarDB Analytics must read only committed data Transactional Workload Analytical Workload

Slide 7

Slide 7 text

ScalarDB Analytics with PostgreSQL ● The first implementation of ScalarDB Analytics using PostgreSQL and its Foreign Data Wrapper feature. ○ FDW is a type of PostgreSQL extension to enable users to read external data 7 User ScalarDB PostgreSQL Backend Storage Transactional Workload Analytical Workload FDW

Slide 8

Slide 8 text

ScalarDB Analytics with PostgreSQL : Architecture 8 User Analytica Query with PostgreSQL Client PostgreSQL ScalarDB Backend Storages Oracle Cassandra DynamoDB PostgreSQL postgres_fdw jdbc_fdw cassandra_fdw scalardb_fdw ScalarDB Analytics with PostgreSQL Foreign Tables WAL-Interpreted Views ScalarDB Config Schema Importer Setup Analytical Environment

Slide 9

Slide 9 text

ScalarDB Analytics with PostgreSQL : Architecture 9 User Analytica Query with PostgreSQL Client PostgreSQL ScalarDB Backend Storages Oracle Cassandra DynamoDB PostgreSQL postgres_fdw jdbc_fdw cassandra_fdw scalardb_fdw ScalarDB Analytics with PostgreSQL Foreign Tables WAL-Interpreted Views ScalarDB Config Schema Importer Setup Analytical Environment

Slide 10

Slide 10 text

PostgreSQL Extension ● PostgreSQL is designed to be extensible ○ Thanks to the catalog-driven design ■ Built-ins also work via the catalog ○ Developers can implement extensions and register them in the system catalog ● Extensible Points ○ SQL Function (Scalar, Aggregate, Window, Table, etc.) ○ Type ○ Operator ○ Index Access Method ○ Foreign Data Wrapper ○ Table Access Method (Since PG12) ○ Planner, Executor Hooks ○ etc. ● See https://www.postgresql.org/docs/15/contrib.html 10

Slide 11

Slide 11 text

Catalog Example: bool type 11 =# select * from pg_type where typname = 'bool'; -[ RECORD 1 ]--+--------- oid | 16 typname | bool typnamespace | 11 typowner | 10 typlen | 1 typbyval | t typtype | b typcategory | B typispreferred | t typisdefined | t typdelim | , typrelid | 0 typsubscript | - typelem | 0 typarray | 1000 typinput | boolin typoutput | boolout typreceive | boolrecv typsend | boolsend typmodin | - typmodout | - typanalyze | - typalign | c typstorage | p typnotnull | f typbasetype | 0 typtypmod | -1 typndims | 0 typcollation | 0 typdefaultbin | =# select * from pg_operator where oprcode = 'booleq'::regproc; -[ RECORD 1 ]+---------- oid | 91 oprname | = oprnamespace | 11 oprowner | 10 oprkind | b oprcanmerge | t oprcanhash | t oprleft | 16 oprright | 16 oprresult | 16 oprcom | 91 oprnegate | 85 oprcode | booleq oprrest | eqsel oprjoin | eqjoinsel Datum booleq(PG_FUNCTION_ARGS) { bool arg1 = PG_GETARG_BOOL(0); bool arg2 = PG_GETARG_BOOL(1); PG_RETURN_BOOL(arg1 == arg2); OID (Object IDentifier) Text I/O functions Binary I/O functions src/backend/utils/adt/bool.c pg_operator pg_type

Slide 12

Slide 12 text

Foreign Data Wrapper ● Implementation of SQL/MED defined in the SQL standard ○ Since PostgreSQL 9 ● Register a set of callback functions for Planner and Executor 12 https://www.interdb.jp/pg/pgsql03.html ● GetForeignRelSize ● GetForeignPaths ● GetForeignPlan ● BeginForeignScan ● IterateForeignScan ● EndForeignScan

Slide 13

Slide 13 text

scalardb_fdw ● FDW implementation that read data using ScalarDB library ● Intended to be used as a fall-back option ○ We can inherently cover all storage supported by ScalarDB 13 CREATE SERVER scalardb FOREIGN DATA WRAPPER scalardb_fdw OPTIONS ( config_file_path '/path/to/scalardb.properties' ); CREATE FOREIGN TABLE sample_table ( pk int, ck1 int, ck2 int, boolean_col boolean, bigint_col bigint, float_col double precision, double_col double precision, text_col text, blob_col bytea ) SERVER scalardb OPTIONS ( namespace 'ns', table_name 'sample_table' ); Server Definition Table Definition

Slide 14

Slide 14 text

Example callback: IterateForeignScan 14 Common representation of row value Slot of HeapTuple Retrieve one row by calling Scanner.one() Construct HeapTuple from the retrieved row Put HeapTuple into TupleTableSlot

Slide 15

Slide 15 text

Example callback: make_tuple_from_result 15 List of columns to retrieve (= target list in SELECT) Get table schema metadata in PostgreSQL Initialize with NULLs For each retrieved columns Convert the result value to Datum Generic value type

Slide 16

Slide 16 text

scalardb_fdw ● FDW implementation that read data using ScalarDB library ● Intended to be used as a fall-back option ○ We can inherently cover all storage supported by ScalarDB 16 CREATE SERVER scalardb FOREIGN DATA WRAPPER scalardb_fdw OPTIONS ( config_file_path '/path/to/scalardb.properties' ); CREATE FOREIGN TABLE sample_table ( pk int, ck1 int, ck2 int, boolean_col boolean, bigint_col bigint, float_col double precision, double_col double precision, text_col text, blob_col bytea ) SERVER scalardb OPTIONS ( namespace 'ns', table_name 'sample_table' ); Question: How can we call ScalarDB library, implemented in Java, from C? Java Native Interface (JNI) Server Definition Table Definition

Slide 17

Slide 17 text

Java Native Interface (JNI) 17 ● FFI framework to call native code from Java and vice versa ○ The most conventional FFI framework of Java ○ Other FFI frameworks do not support calling Java from C… ● Cumbersome… ○ Context (i.e. JavaVM) must be passed everywhere ○ Complex memory scope (when object is garbage collected?) ○ Exception handling ○ Weak typing (everything is Object other than primitive types) ○ etc. Wrappers of JNI calls FDW callback implementation

Slide 18

Slide 18 text

ScalarDB Analytics with PostgreSQL : Architecture 18 User Analytica Query with PostgreSQL Client PostgreSQL ScalarDB Backend Storages Oracle Cassandra DynamoDB PostgreSQL postgres_fdw jdbc_fdw cassandra_fdw scalardb_fdw ScalarDB Analytics with PostgreSQL Foreign Tables WAL-Interpreted Views ScalarDB Config Schema Importer Setup Analytical Environment

Slide 19

Slide 19 text

Schema Importer ● Import table schema in ScalarDB into PostgreSQL by running a series of SQL commands. a. Foreign Data Wrapper b. Foreign Server c. User Mapping d. Schema (Namespace) e. Foreign Table f. WAL-Interpreted view 19

Slide 20

Slide 20 text

WAL-Interpreted View ● Responsible for guaranteeing Read-Committed isolation level. ● Interpreting the WAL metadata ○ ScalarDB writes the WAL metadata in each record 20 It is COMMITTED now It has been COMMITTED in the past Use the current value if COMMITTED Use the value in the before image otherwise

Slide 21

Slide 21 text

Summary ● Introduction to ScalarDB ○ Universal transaction manager ○ Enable serializable transaction among various databases ● Introduction to ScalarDB Analytics ○ Enable analytical queries on ScalarDB-managed data using FDW 21