Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ScalarDB: Universal Transaction Manager for Polystores (VLDB'23)

ScalarDB: Universal Transaction Manager for Polystores (VLDB'23)

ScalarDB is a universal transaction manager achieving distributed transactions across disparate databases. This is a slide deck presented at VLDB'23.

For more details about ScalarDB, please check out the paper and our GitHub site.

- https://dl.acm.org/doi/10.14778/3611540.3611563
- https://github.com/scalar-labs/scalardb

Scalar, Inc.

October 10, 2023
Tweet

More Decks by Scalar, Inc.

Other Decks in Technology

Transcript

  1. Motivation: Era of Managing Multiple Disparate Databases • “One size

    does not fit all” era ◦ E.g., 10+ purpose-built database products in AWS • Separated databases in microservices • Siloed databases in large enterprises Polystore Transaction Manager Our Goal: Simplify the complexity of managing multiple disparate databases. 2 Complex and inconsistent Simplified and consistent
  2. ScalarDB: A Polystore Transaction Manager • Achieves global transactions across

    multiple disparate databases. • Supports various kinds of databases, such as RDBs and NoSQLs. • ScalarDB has been used by some of Fortune Global 500 companies. Relational databases NoSQLs 3
  3. Design Goals of ScalarDB • Database Agnosticism ◦ Global transactions

    should span various kinds of databases. ◦ The top-priority goal based on our customers’ demands. • Strong Correctness ◦ Global transactions should guarantee ACID with strict serializability. • Reasonable Performance ◦ Managing global transactions should not be a limiting factor of overall transaction performance. • High Scalability ◦ Transaction performance should scale as the performance of underlying databases scale. • High Availability ◦ Managing global transactions should be achievable without a SPOF. 4
  4. Design Choices Multi-level Transaction Management (e.g., Oracle Tuxedo, Atomikos, Seata

    XA) Single-level Transaction Management (e.g., Deuteronomy [CIDR’11], Cherry Garcia [ICDE’15]) Pros: Several real-world products Pros: Easy to improve performance Cons: Too dependent on specific DB capabilities Cons: Invasive (DBs require enhancements) Pros: Not very dependent on specific DB capabilities Pros: Non-invasive (DBs don’t require enhancements) Cons: Weak isolation guarantee Cons: Hard to improve performance TM (Coordinator) Abstraction Abstraction TM Abstraction TM DB1 Global Transactions Local Transactions DB2 TM: Transaction Manager TM (Coordinator) Abstraction DB1 Global Transactions Local Transactions DB2 CC: Concurrency Control No CC is required No CC is required 5
  5. Design Choices: Our Approach Pros: Several real-world products Pros: Easy

    to improve performance Cons: Too dependent on specific DB capabilities Cons: Invasive (DBs require enhancements) Pros: Not very dependent on specific DB capabilities Pros: Non-invasive (DBs don’t require enhancements) Cons: Weak isolation guarantee Cons: Hard to improve performance TM (Coordinator) Abstraction Abstraction TM Abstraction TM DB1 Global Transactions Local Transactions DB2 TM: Transaction Manager TM (Coordinator) Abstraction DB1 Global Transactions Local Transactions DB2 CC: Concurrency Control No CC is required No CC is required ScalarDB approach ScalarDB addresses them 6 Multi-level Transaction Management (e.g., Oracle Tuxedo, Atomikos, Seata XA) Single-level Transaction Management (e.g., Deuteronomy [CIDR’11], Cherry Garcia [ICDE’15])
  6. Challenges with ScalarDB • Achieve our design goals by using

    the single-level TM approach (We chose the Cherry Garcia protocol [ICDE’15] and extended it): ◦ Achieve database agnosticism. ▪ This is achieved by the single-level TM approach. ◦ Guarantee strict serializability while achieving high scalability and high availability. ◦ Enhance transaction performance without sacrificing correctness. • Clarify and fill the critical missing pieces of the single-level TM approach for productization. ◦ Take correct backups from multiple disparate databases. ◦ Route transactions spanning multiple microservices. ◦ Run analytical queries over ScalarDB-managed databases. 7
  7. Challenges with ScalarDB • Achieve our design goals by using

    the single-level TM approach (We chose the Cherry Garcia protocol [ICDE’15] and extended it): ◦ Achieve database agnosticism. ▪ This is achieved by the single-level TM approach. ◦ Guarantee strict serializability while achieving high scalability and high availability. ◦ Enhance transaction performance without sacrificing correctness. • Clarify and fill the critical missing pieces of the single-level TM approach for productization. ◦ Take correct backups from multiple disparate databases. ◦ Route transactions spanning multiple microservices. ◦ Run analytical queries over ScalarDB-managed databases. 8
  8. ScalarDB Architecture • ScalarDB abstracts underlying databases with its own

    abstraction. • Abstraction requires each underlying database to provide minimal capabilities. ◦ E.g., linearizable read/write on a single record, durability of written database records. … Transaction Manager Database Abstraction DB1 Shim App App App TM TM TM App App App TM TM TM DB1 DB2 DB3 DB1 DB2 DB3 CRUD Interface SQL GraphQL DB2 Shim DB3 Shim DB-specific Protocols gRPC (HTTP/2) gRPC (HTTP/2) DB-specific Protocols TxID TxID ScalarDB ScalarDB core component (Apache 2) Note: You can directly use ScalarDB core through its library. 9
  9. Transaction Protocol: Overview • Two-phase commit (2PC) over records (similar

    to Cherry Garcia [ICDE’15]) ◦ Treats a single record as a small database and do 2PC over multiple records. ◦ Manages WAL information in each record. (Disaggregated WAL) • Single-version optimistic concurrency control (OCC) ◦ Conflicts are detected by using linearizable conditional writes. User-defined tables Version TxID Before Col Before Version Before TxID Before Image TxStatus Before TxStatus Col PK After Image 1. Prepare Records w/ W-set 1. Prepare Records w/ W-set 3. Commit Records 3. Commit Records 2. Commit Status TxID Metadata TxStatus Application data managed by users Transaction metadata managed by ScalarDB R/W sets Coordinator User-defined tables 10
  10. Transaction Protocol: Removing Reliable Clock Dependency • Removes reliable clock

    (e.g., TrueTime) dependency that the Cherry Garcia protocol depends on for better applicability and scalability. This leads ScalarDB to: ◦ Provide single-version OCC instead of the original two-version OCC/MVCC. ◦ Provide weaker isolation than snapshot isolation (read-committed SI / RCSI). ▪ Read-skew would happen in addition to SI anomalies (e.g., write-skew). ▪ This does not meet our design goals. • Does not employ a hybrid-logical clock (HLC). ◦ ScalarDB with HLC would introduce an additional database write for each record read to keep track of happened-before relation due to the architecture. 11
  11. Transaction Protocol: Making Transactions Strict Serializable • Basic strategy: keep

    track of anti-dependencies implicitly. ◦ Explicit anti-dependency tracking (e.g., SSI [SIGMOD’08], SSN [VLDBJ’15]) cannot be efficiently done in the ScalarDB architecture. ◦ More conservative than SSI/SSN, but works well with the ScalarDB architecture. TM TM R/W sets R-set Would not fit well with the client-coordinated protocol TM TM R/W sets Write R-set Would issue too many database reads/writes R-set Read R/W sets Re-read R-set Works efficiently without having too many additional reads/writes ScalarDB approach 12
  12. Two MariaDB / two PostgreSQL / PostgreSQL & MariaDB Evaluation:

    Experimental Setup • Each DB instance: AWS c5d.9xlarge (18 cores, 72GB DRAM, NVMe SSD) • Client: c5.4xlarge • Workloads: YCSB (100M records), TPC-C (200-1,000 warehouses) • Compared systems: Atomikos (XA), Seata (XA), ScalarDB C* DL … C* DL C* DL Clients Cassandra (for scalability) PostgreSQL PostgreSQL Scalar DL Clients PostgreSQL Scalar DL Clients PostgreSQL & Cassandra C* DL … C* DL C* DL 13
  13. Evaluation: Performance of Global Transactions Achieved database-agnostic global transactions with

    reasonable performance. YCSB Workload F 14 MariaDB x 2 MariaDB & PostgreSQL PostgreSQL & Cassandra
  14. Evaluation: Overhead for Strict Serializability Achieved strict serializability without much

    overhead. TPC-C 15 MariaDB PostgreSQL 15% slowdown at most 11% slowdown at most
  15. Summary • ScalarDB is universal transaction manager for polystores. •

    ScalarDB provides database-agnostic global transactions while achieving strong correctness and reasonable performance. • ScalarDB has been used by some of Fortune Global 500 companies. • Please read the paper for more details we couldn’t cover in this presentation. ◦ Performance optimization techniques. ◦ Critical mechanisms for productization; e.g., mechanisms for taking backups and handling analytical queries. ◦ We talk about how it handles analytical queries in the POLY workshop. 17 https://github.com/scalar-labs/scalardb
  16. Productization: Taking Transactionally Consistent Backups • Pauses ScalarDB servers to

    create a state where no active transactions exist. • Creates transactionally consistent backups by using database-specific backup mechanisms (e.g., point-in-time snapshots and restore). • Employs an OCC technique to ensure a paused state is not broken (in a managed Kubernetes environment). Pod (container) New pods might be created by auto-healing and scaling out. ・・・ 1. Identifies ScalarDB servers. 2. Pauses the servers (for a short while) after draining active transactions. 3. Takes backups. 4. Identifies servers again and checks if servers’ states have not been changed. 18 Kubernetes cluster
  17. Towards an HTAP Engine • Extend ScalarDB to run read-only

    analytical queries over multiple disparate databases. 19 … Transaction Manager Database Abstraction DB1 Shim CRUD Interface SQL GraphQL DB2 Shim DB3 Shim Analytics SQL
  18. ScalarDB Analytics with PostgreSQL 20 … Transaction Manager Database Abstraction

    DB1 Shim CRUD Interface SQL GraphQL DB2 Shim DB3 Shim Analytics SQL Community FDWs Foreign Tables ScalarDB FDW WAL-interpreted Views PostgreSQL record = …; If (record.txStatus == COMMITTED) { // use after image } else { // use before image } WAL interpretation (pseudo code): Version TxID Before Col Before Version Before TxID Before Image TxStatus Before TxStatus Col PK After Image • ScalarDB Analytics utilizes PostgreSQL FDW*. ◦ Create ScalarDB FDW to support various databases. *FDW: Foreign Data Wrapper