Slide 1

Slide 1 text

ScalarDB: Universal Transaction Manager for Polystores Hiroyuki Yamada, Toshihiro Suzuki, Yuji Ito, Jun Nemoto Scalar, Inc.

Slide 2

Slide 2 text

Motivation: Era of Managing Multiple Disparate Databases ● “One size does not fit all” era ○ E.g., 10+ purpose-built database products in AWS ● Separated databases in microservices ● Siloed databases in large enterprises Polystore Transaction Manager Our Goal: Simplify the complexity of managing multiple disparate databases. 2 Complex and inconsistent Simplified and consistent

Slide 3

Slide 3 text

ScalarDB: A Polystore Transaction Manager ● Achieves global transactions across multiple disparate databases. ● Supports various kinds of databases, such as RDBs and NoSQLs. ● ScalarDB has been used by some of Fortune Global 500 companies. Relational databases NoSQLs 3

Slide 4

Slide 4 text

Design Goals of ScalarDB ● Database Agnosticism ○ Global transactions should span various kinds of databases. ○ The top-priority goal based on our customers’ demands. ● Strong Correctness ○ Global transactions should guarantee ACID with strict serializability. ● Reasonable Performance ○ Managing global transactions should not be a limiting factor of overall transaction performance. ● High Scalability ○ Transaction performance should scale as the performance of underlying databases scale. ● High Availability ○ Managing global transactions should be achievable without a SPOF. 4

Slide 5

Slide 5 text

Design Choices Multi-level Transaction Management (e.g., Oracle Tuxedo, Atomikos, Seata XA) Single-level Transaction Management (e.g., Deuteronomy [CIDR’11], Cherry Garcia [ICDE’15]) Pros: Several real-world products Pros: Easy to improve performance Cons: Too dependent on specific DB capabilities Cons: Invasive (DBs require enhancements) Pros: Not very dependent on specific DB capabilities Pros: Non-invasive (DBs don’t require enhancements) Cons: Weak isolation guarantee Cons: Hard to improve performance TM (Coordinator) Abstraction Abstraction TM Abstraction TM DB1 Global Transactions Local Transactions DB2 TM: Transaction Manager TM (Coordinator) Abstraction DB1 Global Transactions Local Transactions DB2 CC: Concurrency Control No CC is required No CC is required 5

Slide 6

Slide 6 text

Design Choices: Our Approach Pros: Several real-world products Pros: Easy to improve performance Cons: Too dependent on specific DB capabilities Cons: Invasive (DBs require enhancements) Pros: Not very dependent on specific DB capabilities Pros: Non-invasive (DBs don’t require enhancements) Cons: Weak isolation guarantee Cons: Hard to improve performance TM (Coordinator) Abstraction Abstraction TM Abstraction TM DB1 Global Transactions Local Transactions DB2 TM: Transaction Manager TM (Coordinator) Abstraction DB1 Global Transactions Local Transactions DB2 CC: Concurrency Control No CC is required No CC is required ScalarDB approach ScalarDB addresses them 6 Multi-level Transaction Management (e.g., Oracle Tuxedo, Atomikos, Seata XA) Single-level Transaction Management (e.g., Deuteronomy [CIDR’11], Cherry Garcia [ICDE’15])

Slide 7

Slide 7 text

Challenges with ScalarDB ● Achieve our design goals by using the single-level TM approach (We chose the Cherry Garcia protocol [ICDE’15] and extended it): ○ Achieve database agnosticism. ■ This is achieved by the single-level TM approach. ○ Guarantee strict serializability while achieving high scalability and high availability. ○ Enhance transaction performance without sacrificing correctness. ● Clarify and fill the critical missing pieces of the single-level TM approach for productization. ○ Take correct backups from multiple disparate databases. ○ Route transactions spanning multiple microservices. ○ Run analytical queries over ScalarDB-managed databases. 7

Slide 8

Slide 8 text

Challenges with ScalarDB ● Achieve our design goals by using the single-level TM approach (We chose the Cherry Garcia protocol [ICDE’15] and extended it): ○ Achieve database agnosticism. ■ This is achieved by the single-level TM approach. ○ Guarantee strict serializability while achieving high scalability and high availability. ○ Enhance transaction performance without sacrificing correctness. ● Clarify and fill the critical missing pieces of the single-level TM approach for productization. ○ Take correct backups from multiple disparate databases. ○ Route transactions spanning multiple microservices. ○ Run analytical queries over ScalarDB-managed databases. 8

Slide 9

Slide 9 text

ScalarDB Architecture ● ScalarDB abstracts underlying databases with its own abstraction. ● Abstraction requires each underlying database to provide minimal capabilities. ○ E.g., linearizable read/write on a single record, durability of written database records. … Transaction Manager Database Abstraction DB1 Shim App App App TM TM TM App App App TM TM TM DB1 DB2 DB3 DB1 DB2 DB3 CRUD Interface SQL GraphQL DB2 Shim DB3 Shim DB-specific Protocols gRPC (HTTP/2) gRPC (HTTP/2) DB-specific Protocols TxID TxID ScalarDB ScalarDB core component (Apache 2) Note: You can directly use ScalarDB core through its library. 9

Slide 10

Slide 10 text

Transaction Protocol: Overview ● Two-phase commit (2PC) over records (similar to Cherry Garcia [ICDE’15]) ○ Treats a single record as a small database and do 2PC over multiple records. ○ Manages WAL information in each record. (Disaggregated WAL) ● Single-version optimistic concurrency control (OCC) ○ Conflicts are detected by using linearizable conditional writes. User-defined tables Version TxID Before Col Before Version Before TxID Before Image TxStatus Before TxStatus Col PK After Image 1. Prepare Records w/ W-set 1. Prepare Records w/ W-set 3. Commit Records 3. Commit Records 2. Commit Status TxID Metadata TxStatus Application data managed by users Transaction metadata managed by ScalarDB R/W sets Coordinator User-defined tables 10

Slide 11

Slide 11 text

Transaction Protocol: Removing Reliable Clock Dependency ● Removes reliable clock (e.g., TrueTime) dependency that the Cherry Garcia protocol depends on for better applicability and scalability. This leads ScalarDB to: ○ Provide single-version OCC instead of the original two-version OCC/MVCC. ○ Provide weaker isolation than snapshot isolation (read-committed SI / RCSI). ■ Read-skew would happen in addition to SI anomalies (e.g., write-skew). ■ This does not meet our design goals. ● Does not employ a hybrid-logical clock (HLC). ○ ScalarDB with HLC would introduce an additional database write for each record read to keep track of happened-before relation due to the architecture. 11

Slide 12

Slide 12 text

Transaction Protocol: Making Transactions Strict Serializable ● Basic strategy: keep track of anti-dependencies implicitly. ○ Explicit anti-dependency tracking (e.g., SSI [SIGMOD’08], SSN [VLDBJ’15]) cannot be efficiently done in the ScalarDB architecture. ○ More conservative than SSI/SSN, but works well with the ScalarDB architecture. TM TM R/W sets R-set Would not fit well with the client-coordinated protocol TM TM R/W sets Write R-set Would issue too many database reads/writes R-set Read R/W sets Re-read R-set Works efficiently without having too many additional reads/writes ScalarDB approach 12

Slide 13

Slide 13 text

Two MariaDB / two PostgreSQL / PostgreSQL & MariaDB Evaluation: Experimental Setup ● Each DB instance: AWS c5d.9xlarge (18 cores, 72GB DRAM, NVMe SSD) ● Client: c5.4xlarge ● Workloads: YCSB (100M records), TPC-C (200-1,000 warehouses) ● Compared systems: Atomikos (XA), Seata (XA), ScalarDB C* DL … C* DL C* DL Clients Cassandra (for scalability) PostgreSQL PostgreSQL Scalar DL Clients PostgreSQL Scalar DL Clients PostgreSQL & Cassandra C* DL … C* DL C* DL 13

Slide 14

Slide 14 text

Evaluation: Performance of Global Transactions Achieved database-agnostic global transactions with reasonable performance. YCSB Workload F 14 MariaDB x 2 MariaDB & PostgreSQL PostgreSQL & Cassandra

Slide 15

Slide 15 text

Evaluation: Overhead for Strict Serializability Achieved strict serializability without much overhead. TPC-C 15 MariaDB PostgreSQL 15% slowdown at most 11% slowdown at most

Slide 16

Slide 16 text

Evaluation: Scalability Throughput scaled near-linearly as the number of nodes increased. 16 TPC-C 92% scalability

Slide 17

Slide 17 text

Summary ● ScalarDB is universal transaction manager for polystores. ● ScalarDB provides database-agnostic global transactions while achieving strong correctness and reasonable performance. ● ScalarDB has been used by some of Fortune Global 500 companies. ● Please read the paper for more details we couldn’t cover in this presentation. ○ Performance optimization techniques. ○ Critical mechanisms for productization; e.g., mechanisms for taking backups and handling analytical queries. ○ We talk about how it handles analytical queries in the POLY workshop. 17 https://github.com/scalar-labs/scalardb

Slide 18

Slide 18 text

Productization: Taking Transactionally Consistent Backups ● Pauses ScalarDB servers to create a state where no active transactions exist. ● Creates transactionally consistent backups by using database-specific backup mechanisms (e.g., point-in-time snapshots and restore). ● Employs an OCC technique to ensure a paused state is not broken (in a managed Kubernetes environment). Pod (container) New pods might be created by auto-healing and scaling out. ・・・ 1. Identifies ScalarDB servers. 2. Pauses the servers (for a short while) after draining active transactions. 3. Takes backups. 4. Identifies servers again and checks if servers’ states have not been changed. 18 Kubernetes cluster

Slide 19

Slide 19 text

Towards an HTAP Engine ● Extend ScalarDB to run read-only analytical queries over multiple disparate databases. 19 … Transaction Manager Database Abstraction DB1 Shim CRUD Interface SQL GraphQL DB2 Shim DB3 Shim Analytics SQL

Slide 20

Slide 20 text

ScalarDB Analytics with PostgreSQL 20 … Transaction Manager Database Abstraction DB1 Shim CRUD Interface SQL GraphQL DB2 Shim DB3 Shim Analytics SQL Community FDWs Foreign Tables ScalarDB FDW WAL-interpreted Views PostgreSQL record = …; If (record.txStatus == COMMITTED) { // use after image } else { // use before image } WAL interpretation (pseudo code): Version TxID Before Col Before Version Before TxID Before Image TxStatus Before TxStatus Col PK After Image ● ScalarDB Analytics utilizes PostgreSQL FDW*. ○ Create ScalarDB FDW to support various databases. *FDW: Foreign Data Wrapper