Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalar DL: Scalable and Practical Byzantine Fau...

Scalar, Inc.
September 26, 2022

Scalar DL: Scalable and Practical Byzantine Fault Detection for Transactional Database Systems (VLDB'22)

Scalar DL is a scalable and practical Byzantine fault detection middleware for transactional database systems that achieves correctness, scalability, and database agnosticism. This is a slide deck presented at VLDB'22.

For more details about Scalar DL, please check out the paper and our GitHub site.

- https://dl.acm.org/doi/abs/10.14778/3523210.3523212
- https://github.com/scalar-labs/scalardl

Scalar, Inc.

September 26, 2022
Tweet

More Decks by Scalar, Inc.

Other Decks in Technology

Transcript

  1. Scalar DL: Scalable and Practical Byzantine Fault Detection for Transactional

    Database Systems Hiroyuki Yamada, Jun Nemoto Scalar, Inc.
  2. Towards a reliable database system • We live in a

    data-driven / data-centric world. ◦ Data needs to be reliable and trustful. ◦ Database systems need to be reliable and trustful. • Dealing with Byzantine faults in a database system is one of the key factors. ◦ Byzantine faults: software errors, data tampering, (internal) malicious attacks. Our Goal: A database system that deals with Byzantine faults in a practical and scalable way.
  3. Dealing with Byzantine faults • Basic principle: find discrepancies between

    replicas. • Byzantine fault tolerance (BFT). ◦ N > 3f, N: # of replicas, f: # of faulty replicas. ◦ SMR: PBFT [OSDI’99], BFT-SMaRt [DSN’14], HotStuff [PODC’19] … ◦ Database: HRDB [SOSP’07], Byzantium [EuroSys’11], Hyperledger fabric [EuroSys’18], Basil [SOSP’21] • Byzantine fault detection (BFD). ◦ N > f, N: # of replicas, f: # of faulty replicas. ◦ SMR: PeerReview [SOSP’07] Are existing solutions practical and scalable enough for a database system?
  4. BFT is ideal, but may not be practical for database

    systems • At least 4 administrative domains (ADs) are required for correctness. ◦ Malicious attacks are likely to be dependent in an AD. • BFT might not fit well with enterprise database systems. ◦ Many enterprise database systems are managed by a single AD or a few ADs. An AD is a collection of nodes and networks operated by a single organization or administrative authority.
  5. BFT is ideal, but may not be practical for database

    systems • At least 4 administrative domains (ADs) are required for correctness. ◦ Malicious attacks are likely to be dependent in an AD. • BFT might not fit well with enterprise database systems. ◦ Many enterprise database systems are managed by a single AD or a few ADs. An AD is a collection of nodes and networks operated by a single organization or administrative authority.
  6. BFT is ideal, but may not be practical for database

    systems • At least 4 administrative domains (ADs) are required for correctness. ◦ Malicious attacks are likely to be dependent in an AD. • BFT might not fit well with enterprise database systems. ◦ Many enterprise database systems are managed by a single AD or a few ADs. AD-1 AD-2 AD-3 AD-4 An AD is a collection of nodes and networks operated by a single organization or administrative authority.
  7. BFT is ideal, but may not be practical for database

    systems • At least 4 administrative domains (ADs) are required for correctness. ◦ Malicious attacks are likely to be dependent in an AD. • BFT might not fit well with enterprise database systems. ◦ Many enterprise database systems are managed by a single AD or a few ADs. AD-1 AD-2 AD-3 AD-4 4 ADs is at least required to mask 1 fault. An AD is a collection of nodes and networks operated by a single organization or administrative authority.
  8. BFD is a promising approach for database systems • Require

    only 2 ADs for correctness. ◦ 2 is the lower bound for the number of replicas in dealing with Byzantine faults. • Many use cases that require only BFD or tamper evidence. ◦ Regulations on data protection and privacy (e.g., GDPR and CCPA), prior user right for IP, and vehicle regulations around software updates with OTA in WP.29. • Existing solutions are not designed for transactional database systems. ◦ Cannot run transactions in parallel (i.e., not scalable) 1 faulty AD can be detected as long as there are 2 ADs. AD-1 AD-2
  9. Challenge: Scalable BFD for a database system deployed to a

    2-AD environment BFT BFD SMR (run transactions sequentially) DB (run transactions concurrently) BFT SMR PBFT, BFT-SMaRt, HotStuff, Tendermint BFD SMR PeerReview BFT DB HRDB, Byzantium, Basil, Hyperledger Fabric BFD DB No existing work
  10. Challenge: Scalable BFD for a database system deployed to a

    2-AD environment BFT BFD SMR (run transactions sequentially) DB (run transactions concurrently) BFT SMR PBFT, BFT-SMaRt, HotStuff, Tendermint BFD SMR PeerReview BFT DB HRDB, Byzantium, Basil, Hyperledger Fabric BFD DB No existing work Not practical from an administrative perspective
  11. Challenge: Scalable BFD for a database system deployed to a

    2-AD environment BFT BFD SMR (run transactions sequentially) DB (run transactions concurrently) BFT SMR PBFT, BFT-SMaRt, HotStuff, Tendermint BFD SMR PeerReview BFT DB HRDB, Byzantium, Basil, Hyperledger Fabric BFD DB No existing work Not practical from an administrative perspective Not designed for database transactions
  12. Challenge: Scalable BFD for a database system deployed to a

    2-AD environment BFT BFD SMR (run transactions sequentially) DB (run transactions concurrently) BFT SMR PBFT, BFT-SMaRt, HotStuff, Tendermint BFD SMR PeerReview BFT DB HRDB, Byzantium, Basil, Hyperledger Fabric BFD DB No existing work Not practical from an administrative perspective Not designed for database transactions
  13. Challenge: Scalable BFD for a database system deployed to a

    2-AD environment BFT BFD SMR (run transactions sequentially) DB (run transactions concurrently) BFT SMR PBFT, BFT-SMaRt, HotStuff, Tendermint BFD SMR PeerReview BFT DB HRDB, Byzantium, Basil, Hyperledger Fabric BFD DB No existing work Not practical from an administrative perspective Not designed for database transactions
  14. BFT DB => BFD DB • Can we realize BFD

    by splitting up replicas into 2 ADs? ◦ No. • 1 Byzantine-faulty replica will exceed the predefined threshold for correctness because Byzantine faults are dependent in an AD. ◦ Need to accept the fault, i.e., data will be tampered.
  15. BFT DB => BFD DB • Can we realize BFD

    by splitting up replicas into 2 ADs? ◦ No. • 1 Byzantine-faulty replica will exceed the predefined threshold for correctness because Byzantine faults are dependent in an AD. ◦ Need to accept the fault, i.e., data will be tampered.
  16. BFT DB => BFD DB • Can we realize BFD

    by splitting up replicas into 2 ADs? ◦ No. • 1 Byzantine-faulty replica will exceed the predefined threshold for correctness because Byzantine faults are dependent in an AD. ◦ Need to accept the fault, i.e., data will be tampered. AD-1 AD-2
  17. BFT DB => BFD DB • Can we realize BFD

    by splitting up replicas into 2 ADs? ◦ No. • 1 Byzantine-faulty replica will exceed the predefined threshold for correctness because Byzantine faults are dependent in an AD. ◦ Need to accept the fault, i.e., data will be tampered. AD-1 AD-2
  18. BFT DB => BFD DB • Can we realize BFD

    by splitting up replicas into 2 ADs? ◦ No. • 1 Byzantine-faulty replica will exceed the predefined threshold for correctness because Byzantine faults are dependent in an AD. ◦ Need to accept the fault, i.e., data will be tampered. AD-1 AD-2
  19. BFT DB => BFD DB • Can we realize BFD

    by splitting up replicas into 2 ADs? ◦ No. • 1 Byzantine-faulty replica will exceed the predefined threshold for correctness because Byzantine faults are dependent in an AD. ◦ Need to accept the fault, i.e., data will be tampered. BFT DB cannot trivially be extended to realize BFD DB AD-1 AD-2 N=4, f=2 => N>3f
  20. BFD SMR => BFD DB • Can we make BFD

    SMR (PeerReview) run transactions concurrently? ◦ Yes, but only partially. ◦ We could apply a concurrency control in a primary-side processing. • Require sequential execution of hash-chained log in a witness-side for correctness (i.e., strict serializability), which limits the overall scalability. ◦ Running transactions in parallel could cause time-travel anomalies. AD-1 AD-2 T1 T2 T2 T1 hash-chained log Primary Witness (Auditor) Witness-side execution has to be sequential for correctness.
  21. Challenge: Scalable BFD for a database system deployed to a

    2-AD environment BFT BFD SMR (run transactions sequentially) DB (run transactions concurrently) BFT SMR PBFT, BFT-SMaRt, HotStuff, Tendermint BFD SMR PeerReview BFT DB HRDB, Byzantium, Basil, Hyperledger Fabric BFD DB NONE Not possible (as it is) Possible but not scalable
  22. Scalar DL: A scalable and practical BFD approach • Scalable

    and practical BFD middleware for transactional database systems. ◦ Manage two types of servers and databases in separate ADs internally. ◦ Database-agnostic by depending only on common database operations. • Execute non-conflicting transactions in parallel while guaranteeing correctness. Primary Secondary Scalar DL Primary Servers Primary Database AD1 Scalar DL Clients Applications Scalar DL Secondary Servers Secondary Database AD2 Database System • Provide safety (strict serializability) and liveness if no fault. • Provide safety (correct clients can detect a Byzantine fault) if one AD is faulty. Correctness:
  23. The BFD protocol - Overview • Key idea: Make an

    agreement on the partial ordering of transactions in a decentralized and concurrent way ◦ Either primary or secondary cannot selfishly order/commit transactions. • 3-phase protocol: Ordering -> Commit -> Validation. ◦ The protocol assumes one-shot request model. Client Secondary Primary Ordering Commit Validation
  24. The BFD protocol - Ordering phase • Order transactions in

    a strict serializable manner with a variant of 2PL. ◦ Simulate a transaction and identify the read/write sets of the transaction. ◦ Acquire R/W locks using underlying database’s linearizable operations. ◦ Go to the commit phase once all the required locks are acquired. • Why not using multi-version concurrency control (MVCC)? ◦ A primary and a secondary could derive different serialization orders without sharing explicit order dependencies (e.g., conflict graph). Primary key Version Lock count Lock mode Lock holders (TxIDs) Input dependencies Lock entry: A set of <primary-key, version>. Client Secondary Primary Ordering Commit Validation Indicate the partial order of transactions
  25. The BFD protocol - Commit phase • Execute transactions in

    an ACID way in an arbitrary order. ◦ Also write a transaction status with a transaction ID as a key for recovery. ◦ This is where a transaction is regarded as committed or aborted. • Create proofs that indicate what records are read and written. • The input dependencies indicate the partial order of transactions Primary key Version TxID Input dependencies MAC Proof entry: Client Secondary Primary Ordering Commit Validation Indicate the partial order of transactions
  26. The BFD protocol - Validation phase • Validate if the

    commit order is the same as the one the secondary expects. ◦ Compare the lock entries and proofs. • Execute transactions in the secondary once validated and create proofs. • A client compares the results and proofs from the primary and the secondary to find discrepancies (i.e., Byzantine faults). Primary Secondary Result Proofs Result Proofs 2. Commit phase 3. Validation phase Compare =? Compare lock table =? Pre-validation Client Client Secondary Primary Ordering Commit Validation
  27. Evaluation - Benchmarked systems and workloads • Benchmarked Systems: ◦

    PeerReviewTx: an extended version of PeerReview, which runs TXs in parallel in a primary side. ◦ Scalar DL: use Scalar DB to execute transactions on non-transactional databases. ◦ Both PeerReviewTx and Scalar DL servers are placed in database instances. ◦ PostgreSQL and Cassandra as backend database systems. • Workloads ◦ YCSB: F and C. 100M records with 100 bytes payload and uniform distribution. ◦ TPC-C: 50/50 ratio of NewOrder and Payment. 100 - 1000 warehouses.
  28. Evaluation - Experimental setup • Environment ◦ AWS. c5d.4xlarge for

    each database instance (8 cores, 32GB DRAM, NVMe SSD). c5.9xlarge for a client. ◦ 2 ADs in different VPCs. PostgreSQL Scalar DL C* DL … PostgreSQL Scalar DL C* DL C* DL C* DL … C* DL C* DL Clients Clients AD AD AD AD
  29. Throughput on PostgreSQL YCSB-F TPC-C (NP) Scalar DL scaled as

    the number of client threads increased, whereas PeerReviewTx didn’t scale as much. The benefit of Scalar DL comes from its concurrency control.
  30. Throughput on Cassandra (3 nodes per AD, RF=3) YCSB-F TPC-C

    (NP) The results were similar results to the one on PostgreSQL. The database-agnostic property was also verified.
  31. Summary • Scalar DL is scalable and practical BFD middleware

    for transactional database systems. • Key contribution: Byzantine fault detection protocol that executes non- conflicting transactions in parallel while guaranteeing correctness. • Achieve up to 10 times speedup compared to the state-of-the-art BFD approach and near-linear (91%) node scalability. • Scalar DL is a real product, not a research prototype. ◦ See https://github.com/scalar-labs/scalardl