Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to build a serverless database cloud service

Anyscale
November 02, 2023

How to build a serverless database cloud service

Relational databases have long been the core component of application systems, and their reliability and performance are critical to the stability and availability of applications. Distributed SQL, as the evolution direction of the next-generation database, offers built-in features such as horizontal scaling and high availability.

​In this talk, Li Shen will introduce the architecture and key technologies of the open-source distributed SQL database - TiDB, and how we utilize the capabilities provided by the Public Cloud to build a cloud-native Serverless database service.

​Agenda:
* Introduction to TiDB
* The design goals of TiDB Serverless
* ​Key considerations and architecture
* Demo of TiDB Serverless

Anyscale

November 02, 2023
Tweet

More Decks by Anyscale

Other Decks in Programming

Transcript

  1. TiDB: Inside the Journey to
    Build a Cloud-Native,
    Distributed SQL Database
    Li Shen@PingCAP

    View full-size slide

  2. “Future users of large data
    banks must be protected from
    having to know how the data is
    organized in the machine.”
    Edgar Codd, 1970
    ACM Turing Award (1981)
    Relational databases are still
    critical to most applications
    after 50 years

    View full-size slide

  3. Challenges from Real World
    Character.ai web app was topping
    200 million visits per month,
    Character.AI claims, and users were
    spending on average 29 minutes
    per visit
    From TechCrunch.com
    “They also use Cloud Spanner to
    ingest terabytes of data every day
    with reliability and global scale.”

    View full-size slide

  4. Efforts to Address These Challenges
    NoSQL
    Primary Secondary Replication Read-Write Separation
    Sharding

    View full-size slide

  5. ACID Transaction
    Query/Index
    Data Integrity
    Scalability
    High Availability
    Flexibility
    RDBMS NoSQL
    Distributed SQL Database
    Distributed SQL melds the scalability and reliability of NoSQL with the
    familiar experience of traditional SQL databases

    View full-size slide

  6. Building a Distributed SQL
    Database

    View full-size slide

  7. SQL AST
    Execution
    Plan
    SQL Layer
    TiDB v0.1
    BoltDB LevelDB RocksDB Memory
    KV API
    Local Storage Engines

    View full-size slide

  8. SQL AST Logical Plan
    Physical Plan
    TiDB v0.5
    KV API + Coprocessor API
    Storage Engines
    HBase
    SQL Layer

    View full-size slide

  9. ● ACID Transaction
    ● High Performance
    ● Scalable
    ● Reliable
    TiKV: a Distributed KV Storage

    View full-size slide

  10. Data is “Auto-Sharded”
    ● SQL data to KV data
    ● Split data into small chunks/regions
    ○ 96MB by default
    ● Each region has multiple replicas as
    a consensus group

    View full-size slide

  11. Data is Replicated
    ● Raft
    ○ Strong consistency
    ○ Leader election
    ○ Efficient and Understandable
    ● Replica placement
    ○ Spread the replicas among TiKV nodes
    ○ Consider data safety and load balancing
    ● Auto healing
    ○ Short-term failures: Raft leader re-election
    ○ Longer-term failures : Rebuild the replicas on the missing nodes

    View full-size slide

  12. Partial Aggregate
    COUNT(*)
    Filter
    c2 = 'foo'
    TableScan
    Range: id(10, +∞)
    Physical Plan on TiKV Coprocessor
    Final Aggregate
    SUM(COUNT(*))
    DistSQL Scan
    Physical Plan on TiDB
    COUNT(*)
    COUNT(*)
    TiKV
    TiKV
    TiKV
    COUNT(*)
    COUNT(*)
    SELECT COUNT(*) FROM t WHERE id > 10 AND c2 = 'foo';
    Distributed SQL - Distributed Computing

    View full-size slide

  13. Scale-out easily
    TiDB
    TiKV TiKV TiKV
    TiDB TiDB
    TiKV TiKV TiKV
    TiDB
    TiKV
    TiKV
    TiKV
    TiDB
    TiDB
    App App App App App
    Traffic More Traffic
    Scale-out

    View full-size slide

  14. Rich Features
    ● Online Schema Change (DDL)
    ● Columnar Storage Engine
    ● Massively Parallel Processing (MPP)
    ● Vectorized Computing Engine
    ● Data Placement Control
    ● CDC (Data Change Capture)
    ● Flash Back / PiTR
    ● Auto Hotspot Detecting and Eliminating
    ● SQL Index Advisor
    ● Zero Downtime Rolling Upgrading
    ● … …

    View full-size slide

  15. Distributed SQL Database in
    the Real World
    In the past 8 years, the most diverse workloads from tens of
    thousands of clusters

    View full-size slide

  16. Dashboard screenshot of a TiDB cluster
    ● 200+ nodes
    ● > 500TB data
    ● 1.8 trillion records

    View full-size slide

  17. ● 1m read/sec
    ● 12k write/sec

    View full-size slide

  18. Challenges Yet To Be Fully Addressed
    ● How to improve stability at scale?
    ○ Copying data, compaction may slow down your workloads
    ○ Indexing a large table (10 billion rows): trade-off between speed and stability
    ● How to take scalability to the next level?
    ○ Exploding data volume
    ○ Multiple applications share a single cluster for greater efficiency and lower costs
    ● How to make it easier and more cost effective for users?
    ○ Maintenance burden of a distributed system, Trade-offs become “Knobs”
    ○ Scale down when the workload reduces

    View full-size slide

  19. These are perfect problems for Cloud
    ● Better (and more varied) capabilities
    ● Elasticity and resilience
    ● Hiding the complexity of a distributed system
    Cloud Provides

    View full-size slide

  20. A cloud-native database is not just the
    capability of running on cloud infra
    ● Truly integrate the database with the capabilities of cloud
    ● Truly leverage cloud’s promise of elasticity and resilience
    ● Cloud Augmented Database -> Cloud-Native Database

    View full-size slide

  21. TiDB Serverless Architecture
    Multi-tenant architecture
    Resource Pool
    :
    Virtual Cluster - Tenant 1
    SQL SQL
    Storage
    Cloud Storage (S3)
    Virtual Cluster - Tenant n
    SQL SQL
    DDL worker
    MPP Compute service
    Data Ingestion service
    Storage
    Storage
    SQL Layer
    Storage
    Cache
    Layer
    Shared
    Storage
    Pool

    Shared resource pool for
    background or heavy compute
    New storage engine
    built on top of cloud
    storage

    View full-size slide

  22. Cloud-Based Storage Engine
    Remote Storage and
    Services
    Before
    After

    View full-size slide

  23. Separated Frontend/Background Compute
    Resource Pool
    :
    Virtual Cluster - Tenant 1
    SQL SQL
    Storage
    Cloud Storage (S3)
    Virtual Cluster - Tenant n
    SQL SQL
    DDL service
    MPP Compute service
    Data Ingestion service
    Storage
    Storage
    Isolated
    SQL Layer
    (For
    OLTP)
    Storage
    Cache
    Layer
    (For OLTP)
    Shared
    Storage
    Pool

    Elasticity of Cloud
    ● Shared resource pool
    ● Elastic resource allocation
    ● Spot instance
    Benefit
    ● More stable
    ● Faster
    ● Lower cost

    View full-size slide

  24. Indexing a Large Table
    DDL worker
    Shared
    Storage
    Pool
    Cloud Storage (S3)
    DDL worker
    DDL worker
    TiDB / TiKV /……
    ● Distributed indexing with resource pool
    ○ Linear performance scaling
    ○ Lower impact on production workloads
    Resource Pool
    Records Index
    Index
    Data

    View full-size slide

  25. From 0 to 1,000,000 QPS

    View full-size slide

  26. AI Makes Things Even Easier
    ● Monitoring & Diagnosis
    ● Parameters Tuning
    ○ Not One-size-fits-all
    ○ Example: 20% improvement compared with the best TiDB expert
    ● Write SQL Queries
    ○ Talk to database with natural language

    View full-size slide

  27. Wire
    Protocol
    SQL
    Hotspot
    Network
    Partition
    Schema
    Transaction
    Data
    Corruption
    VM Failure
    Data
    Rebalancing
    AI Augmented
    Latency/Performan
    ce Trade-off
    Data
    Replication
    Runaway
    Query
    Scale-out/in
    Index
    Happy app developer

    View full-size slide

  28. Distributed SQL Database for
    AI Applications

    View full-size slide

  29. Distributed SQL Database for AI
    Operational Apps
    User Management
    Order Management
    Messaging
    Marketing
    Real-time Data
    Serving Layer
    End Users
    Data Lake
    Machine Learning
    (Training)
    Consuming
    Operational Apps
    Data Warehouse
    Online Systems Offline Systems
    Write Back
    ETL/CDC
    Machine Learning
    (Serving)
    AI-enriched
    Apps
    Inference Store
    Customer 360
    Real-time Data
    3rd-party Data
    3rd-Party
    API
    Refresh,
    Right Data
    Storage
    Inference

    View full-size slide