Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to ScalarDB and ScalarDB Analytics

Introduction to ScalarDB and ScalarDB Analytics

A technical talk introducing ScalarDB and ScalarDB Analytics presented at the PostgreSQL Unconference Japan #44.

Akihiro Okuno

December 27, 2023
Tweet

More Decks by Akihiro Okuno

Other Decks in Programming

Transcript

  1. About me • Akihiro Okuno • @choplin • Software Engineer

    @Scalar Inc. • Recent Interests: Zig, Wasm 2
  2. ScalarDB: A Polystore Transaction Manager • Achieves global transactions across

    multiple disparate databases. • Supports various kinds of databases, such as RDBs and NoSQLs. • ScalarDB has been used by some of Fortune Global 500 companies. Relational databases NoSQLs 4 Ref: https://speakerdeck.com/scalar/scalardb-universal-transaction-manager-for-polystores-vldb23
  3. What is ScalarDB Analytics? ScalarDB, as a universal transaction manager,

    targets mainly transactional workloads and therefore supports limited subsets of relational queries. ScalarDB Analytics with PostgreSQL extends the functionality of ScalarDB to process analytical queries on ScalarDB-managed data by using PostgreSQL and its foreign data wrapper (FDW) extension. https://github.com/scalar-labs/scalardb-analytics-postgresql/blob/main/docs/getting-started.md Online Transaction Processing (OLTP) Online Analytical Processing (OLAP) 5
  4. Requirements for ScalarDB Analytics User ScalarDB ScalarDB Analytics Backend Storage

    6 • Users can queries on all ScalarDB-managed Data ◦ Shared data source ◦ ScalarDB Analytics must support all storage supported by ScalarDB • Users can read consistent data ◦ Read-Committed isolation ◦ ScalarDB Analytics must read only committed data Transactional Workload Analytical Workload
  5. ScalarDB Analytics with PostgreSQL • The first implementation of ScalarDB

    Analytics using PostgreSQL and its Foreign Data Wrapper feature. ◦ FDW is a type of PostgreSQL extension to enable users to read external data 7 User ScalarDB PostgreSQL Backend Storage Transactional Workload Analytical Workload FDW
  6. ScalarDB Analytics with PostgreSQL : Architecture 8 User Analytica Query

    with PostgreSQL Client PostgreSQL ScalarDB Backend Storages Oracle Cassandra DynamoDB PostgreSQL postgres_fdw jdbc_fdw cassandra_fdw scalardb_fdw ScalarDB Analytics with PostgreSQL Foreign Tables WAL-Interpreted Views ScalarDB Config Schema Importer Setup Analytical Environment
  7. ScalarDB Analytics with PostgreSQL : Architecture 9 User Analytica Query

    with PostgreSQL Client PostgreSQL ScalarDB Backend Storages Oracle Cassandra DynamoDB PostgreSQL postgres_fdw jdbc_fdw cassandra_fdw scalardb_fdw ScalarDB Analytics with PostgreSQL Foreign Tables WAL-Interpreted Views ScalarDB Config Schema Importer Setup Analytical Environment
  8. PostgreSQL Extension • PostgreSQL is designed to be extensible ◦

    Thanks to the catalog-driven design ▪ Built-ins also work via the catalog ◦ Developers can implement extensions and register them in the system catalog • Extensible Points ◦ SQL Function (Scalar, Aggregate, Window, Table, etc.) ◦ Type ◦ Operator ◦ Index Access Method ◦ Foreign Data Wrapper ◦ Table Access Method (Since PG12) ◦ Planner, Executor Hooks ◦ etc. • See https://www.postgresql.org/docs/15/contrib.html 10
  9. Catalog Example: bool type 11 =# select * from pg_type

    where typname = 'bool'; -[ RECORD 1 ]--+--------- oid | 16 typname | bool typnamespace | 11 typowner | 10 typlen | 1 typbyval | t typtype | b typcategory | B typispreferred | t typisdefined | t typdelim | , typrelid | 0 typsubscript | - typelem | 0 typarray | 1000 typinput | boolin typoutput | boolout typreceive | boolrecv typsend | boolsend typmodin | - typmodout | - typanalyze | - typalign | c typstorage | p typnotnull | f typbasetype | 0 typtypmod | -1 typndims | 0 typcollation | 0 typdefaultbin | =# select * from pg_operator where oprcode = 'booleq'::regproc; -[ RECORD 1 ]+---------- oid | 91 oprname | = oprnamespace | 11 oprowner | 10 oprkind | b oprcanmerge | t oprcanhash | t oprleft | 16 oprright | 16 oprresult | 16 oprcom | 91 oprnegate | 85 oprcode | booleq oprrest | eqsel oprjoin | eqjoinsel Datum booleq(PG_FUNCTION_ARGS) { bool arg1 = PG_GETARG_BOOL(0); bool arg2 = PG_GETARG_BOOL(1); PG_RETURN_BOOL(arg1 == arg2); OID (Object IDentifier) Text I/O functions Binary I/O functions src/backend/utils/adt/bool.c pg_operator pg_type
  10. Foreign Data Wrapper • Implementation of SQL/MED defined in the

    SQL standard ◦ Since PostgreSQL 9 • Register a set of callback functions for Planner and Executor 12 https://www.interdb.jp/pg/pgsql03.html • GetForeignRelSize • GetForeignPaths • GetForeignPlan • BeginForeignScan • IterateForeignScan • EndForeignScan
  11. scalardb_fdw • FDW implementation that read data using ScalarDB library

    • Intended to be used as a fall-back option ◦ We can inherently cover all storage supported by ScalarDB 13 CREATE SERVER scalardb FOREIGN DATA WRAPPER scalardb_fdw OPTIONS ( config_file_path '/path/to/scalardb.properties' ); CREATE FOREIGN TABLE sample_table ( pk int, ck1 int, ck2 int, boolean_col boolean, bigint_col bigint, float_col double precision, double_col double precision, text_col text, blob_col bytea ) SERVER scalardb OPTIONS ( namespace 'ns', table_name 'sample_table' ); Server Definition Table Definition
  12. Example callback: IterateForeignScan 14 Common representation of row value Slot

    of HeapTuple Retrieve one row by calling Scanner.one() Construct HeapTuple from the retrieved row Put HeapTuple into TupleTableSlot
  13. Example callback: make_tuple_from_result 15 List of columns to retrieve (=

    target list in SELECT) Get table schema metadata in PostgreSQL Initialize with NULLs For each retrieved columns Convert the result value to Datum Generic value type
  14. scalardb_fdw • FDW implementation that read data using ScalarDB library

    • Intended to be used as a fall-back option ◦ We can inherently cover all storage supported by ScalarDB 16 CREATE SERVER scalardb FOREIGN DATA WRAPPER scalardb_fdw OPTIONS ( config_file_path '/path/to/scalardb.properties' ); CREATE FOREIGN TABLE sample_table ( pk int, ck1 int, ck2 int, boolean_col boolean, bigint_col bigint, float_col double precision, double_col double precision, text_col text, blob_col bytea ) SERVER scalardb OPTIONS ( namespace 'ns', table_name 'sample_table' ); Question: How can we call ScalarDB library, implemented in Java, from C? Java Native Interface (JNI) Server Definition Table Definition
  15. Java Native Interface (JNI) 17 • FFI framework to call

    native code from Java and vice versa ◦ The most conventional FFI framework of Java ◦ Other FFI frameworks do not support calling Java from C… • Cumbersome… ◦ Context (i.e. JavaVM) must be passed everywhere ◦ Complex memory scope (when object is garbage collected?) ◦ Exception handling ◦ Weak typing (everything is Object other than primitive types) ◦ etc. Wrappers of JNI calls FDW callback implementation
  16. ScalarDB Analytics with PostgreSQL : Architecture 18 User Analytica Query

    with PostgreSQL Client PostgreSQL ScalarDB Backend Storages Oracle Cassandra DynamoDB PostgreSQL postgres_fdw jdbc_fdw cassandra_fdw scalardb_fdw ScalarDB Analytics with PostgreSQL Foreign Tables WAL-Interpreted Views ScalarDB Config Schema Importer Setup Analytical Environment
  17. Schema Importer • Import table schema in ScalarDB into PostgreSQL

    by running a series of SQL commands. a. Foreign Data Wrapper b. Foreign Server c. User Mapping d. Schema (Namespace) e. Foreign Table f. WAL-Interpreted view 19
  18. WAL-Interpreted View • Responsible for guaranteeing Read-Committed isolation level. •

    Interpreting the WAL metadata ◦ ScalarDB writes the WAL metadata in each record 20 It is COMMITTED now It has been COMMITTED in the past Use the current value if COMMITTED Use the value in the before image otherwise
  19. Summary • Introduction to ScalarDB ◦ Universal transaction manager ◦

    Enable serializable transaction among various databases • Introduction to ScalarDB Analytics ◦ Enable analytical queries on ScalarDB-managed data using FDW 21