Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to build a serverless database cloud service

November 02, 2023

How to build a serverless database cloud service

Relational databases have long been the core component of application systems, and their reliability and performance are critical to the stability and availability of applications. Distributed SQL, as the evolution direction of the next-generation database, offers built-in features such as horizontal scaling and high availability.

​In this talk, Li Shen will introduce the architecture and key technologies of the open-source distributed SQL database - TiDB, and how we utilize the capabilities provided by the Public Cloud to build a cloud-native Serverless database service.

* Introduction to TiDB
* The design goals of TiDB Serverless
* ​Key considerations and architecture
* Demo of TiDB Serverless


November 02, 2023

More Decks by Anyscale

Other Decks in Programming


  1. “Future users of large data banks must be protected from

    having to know how the data is organized in the machine.” Edgar Codd, 1970 ACM Turing Award (1981) Relational databases are still critical to most applications after 50 years
  2. Challenges from Real World Character.ai web app was topping 200

    million visits per month, Character.AI claims, and users were spending on average 29 minutes per visit From TechCrunch.com “They also use Cloud Spanner to ingest terabytes of data every day with reliability and global scale.”
  3. ACID Transaction Query/Index Data Integrity Scalability High Availability Flexibility RDBMS

    NoSQL Distributed SQL Database Distributed SQL melds the scalability and reliability of NoSQL with the familiar experience of traditional SQL databases
  4. SQL AST Execution Plan SQL Layer TiDB v0.1 BoltDB LevelDB

    RocksDB Memory KV API Local Storage Engines
  5. SQL AST Logical Plan Physical Plan TiDB v0.5 KV API

    + Coprocessor API Storage Engines HBase SQL Layer
  6. Data is “Auto-Sharded” • SQL data to KV data •

    Split data into small chunks/regions ◦ 96MB by default • Each region has multiple replicas as a consensus group
  7. Data is Replicated • Raft ◦ Strong consistency ◦ Leader

    election ◦ Efficient and Understandable • Replica placement ◦ Spread the replicas among TiKV nodes ◦ Consider data safety and load balancing • Auto healing ◦ Short-term failures: Raft leader re-election ◦ Longer-term failures : Rebuild the replicas on the missing nodes
  8. Partial Aggregate COUNT(*) Filter c2 = 'foo' TableScan Range: id(10,

    +∞) Physical Plan on TiKV Coprocessor Final Aggregate SUM(COUNT(*)) DistSQL Scan Physical Plan on TiDB COUNT(*) COUNT(*) TiKV TiKV TiKV COUNT(*) COUNT(*) SELECT COUNT(*) FROM t WHERE id > 10 AND c2 = 'foo'; Distributed SQL - Distributed Computing
  9. Scale-out easily TiDB TiKV TiKV TiKV TiDB TiDB TiKV TiKV

    TiKV TiDB TiKV TiKV TiKV TiDB TiDB App App App App App Traffic More Traffic Scale-out
  10. Rich Features • Online Schema Change (DDL) • Columnar Storage

    Engine • Massively Parallel Processing (MPP) • Vectorized Computing Engine • Data Placement Control • CDC (Data Change Capture) • Flash Back / PiTR • Auto Hotspot Detecting and Eliminating • SQL Index Advisor • Zero Downtime Rolling Upgrading • … …
  11. Distributed SQL Database in the Real World In the past

    8 years, the most diverse workloads from tens of thousands of clusters
  12. Dashboard screenshot of a TiDB cluster • 200+ nodes •

    > 500TB data • 1.8 trillion records
  13. Challenges Yet To Be Fully Addressed • How to improve

    stability at scale? ◦ Copying data, compaction may slow down your workloads ◦ Indexing a large table (10 billion rows): trade-off between speed and stability • How to take scalability to the next level? ◦ Exploding data volume ◦ Multiple applications share a single cluster for greater efficiency and lower costs • How to make it easier and more cost effective for users? ◦ Maintenance burden of a distributed system, Trade-offs become “Knobs” ◦ Scale down when the workload reduces
  14. These are perfect problems for Cloud • Better (and more

    varied) capabilities • Elasticity and resilience • Hiding the complexity of a distributed system Cloud Provides
  15. A cloud-native database is not just the capability of running

    on cloud infra • Truly integrate the database with the capabilities of cloud • Truly leverage cloud’s promise of elasticity and resilience • Cloud Augmented Database -> Cloud-Native Database
  16. TiDB Serverless Architecture Multi-tenant architecture Resource Pool : Virtual Cluster

    - Tenant 1 SQL SQL Storage Cloud Storage (S3) Virtual Cluster - Tenant n SQL SQL DDL worker MPP Compute service Data Ingestion service Storage Storage SQL Layer Storage Cache Layer Shared Storage Pool … Shared resource pool for background or heavy compute New storage engine built on top of cloud storage
  17. Separated Frontend/Background Compute Resource Pool : Virtual Cluster - Tenant

    1 SQL SQL Storage Cloud Storage (S3) Virtual Cluster - Tenant n SQL SQL DDL service MPP Compute service Data Ingestion service Storage Storage Isolated SQL Layer (For OLTP) Storage Cache Layer (For OLTP) Shared Storage Pool … Elasticity of Cloud • Shared resource pool • Elastic resource allocation • Spot instance Benefit • More stable • Faster • Lower cost
  18. Indexing a Large Table DDL worker Shared Storage Pool Cloud

    Storage (S3) DDL worker DDL worker TiDB / TiKV /…… • Distributed indexing with resource pool ◦ Linear performance scaling ◦ Lower impact on production workloads Resource Pool Records Index Index Data
  19. AI Makes Things Even Easier • Monitoring & Diagnosis •

    Parameters Tuning ◦ Not One-size-fits-all ◦ Example: 20% improvement compared with the best TiDB expert • Write SQL Queries ◦ Talk to database with natural language
  20. Wire Protocol SQL Hotspot Network Partition Schema Transaction Data Corruption

    VM Failure Data Rebalancing AI Augmented Latency/Performan ce Trade-off Data Replication Runaway Query Scale-out/in Index Happy app developer
  21. Distributed SQL Database for AI Operational Apps User Management Order

    Management Messaging Marketing Real-time Data Serving Layer End Users Data Lake Machine Learning (Training) Consuming Operational Apps Data Warehouse Online Systems Offline Systems Write Back ETL/CDC Machine Learning (Serving) AI-enriched Apps Inference Store Customer 360 Real-time Data 3rd-party Data 3rd-Party API Refresh, Right Data Storage Inference