Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing TiDB

Avatar for Kevin Xu Kevin Xu
September 27, 2018

Introducing TiDB

This talk was delivered with PingCAP co-founder and CTO, Ed Huang, at the SQL NYC, NoSQL & NewSQL Data Group to introduce TiDB, an open source NewSQL database, and its use case with Mobike, one of the largest dockless bike sharing platforms in the world.

A video of the talk can be viewed here: https://www.youtube.com/watch?v=fIVUWPtWQ4o

Avatar for Kevin Xu

Kevin Xu

September 27, 2018
Tweet

More Decks by Kevin Xu

Other Decks in Technology

Transcript

  1. • Company History and Community • Architecture Overview • Deep

    dive into MySQL compatibility • Use Case with Mobike • Demo: ◦ TiDB in the cloud ◦ TiDB Syncer • Q&A + Raffle Agenda
  2. • Ed Huang, CTO & Co-founder • Microsoft Research Asia

    / Netease / Wandou Labs / PingCAP • Codis / TiDB / TiKV • 10+ years as an infrastructure engineer, open source fanatic • [email protected] A Little About Ed...
  3. • General Manager of U.S. at PingCAP • Studied CS

    and Law at Stanford • Program in Javascript, Python, and (more recently) in Rust • [email protected] A Little About Kevin...
  4. A Little About PingCAP... • Founded in April 2015 by

    3 infrastructure engineers • TiDB platform: (Ti = Titanium) ◦ TiDB (stateless SQL layer compatible with MySQL) ◦ TiKV (distributed transactional key-value store) ◦ TiSpark (Apache Spark plug-in on top of TiKV) ◦ Placement Driver (metadata storage; replica scheduling; timestamp allocator) • Open source from Day 1 ◦ Inspired by Google Spanner / F1 ◦ 2.0 GA in April 2018
  5. • Hybrid OLTP & OLAP (Minimize ETL) • Horizontal Scalability

    • MySQL Compatible (wire protocal + tools) • Distributed Transaction (ACID Compliant) • High Availability • Cloud-Native TiDB Core Features
  6. 2018 PingCAP Stars • TiDB: 15,000+ • TiKV: 3700+ Contributors

    • TiDB: 200+ • TiKV: 100+ Community Stats
  7. 2018 PingCAP 1. MySQL Scalability 2. Hybrid OLTP/OLAP Architecture 3.

    Unifying Storage Layer Three Major Use Cases
  8. Architecture TiDB TiDB Worker Spark Driver TiKV Cluster (Storage) Metadata

    TiKV TiKV TiKV Data location Job TiSpark DistSQL API TiKV TiDB TSO/Data location Worker Worker Spark Cluster TiDB Cluster TiDB DistSQL API PD PD Cluster TiKV TiKV TiDB KV API MySQL MySQL SparkSQL PD PD SparkSQL
  9. TiDB: OLTP + Ad Hoc OLAP Node1 Node2 Node3 Node4

    MySQL Network Protocol SQL Parser Cost-based Optimizer Coprocessor API ODBC/JDBC MySQL Client Any ORM which supports MySQL TiDB TiKV
  10. Static sharding is :( • Dealing with hotspot ◦ Choose

    a wrong sharding key • Inefficient usage ◦ Some are busy ◦ Some are idle • Caused by the nature of RDBMS
  11. TiKV: the Foundation RocksDB Raft Transaction Txn KV API Coprocessor

    API RocksDB Raft Transaction Txn KV API Coprocessor API RocksDB Raft Transaction Txn KV API Coprocessor API Raft Group Client gRPC TiKV Instance TiKV Instance TiKV Instance gRPC gRPC PD Cluster
  12. TiSpark: Complex OLAP Spark Exec Spark Exec Spark Driver Spark

    Exec TiKV TiKV TiKV TiKV TiSpark TiSpark TiSpark TiSpark TiKV Placement Driver (PD) gRPC Distributed Storage Layer gRPC retrieve data location retrieve real data from TiKV
  13. • Operator pattern inspired by CoreOS • Leverages Kubernetes to

    simplify and automate: ◦ Deployment ◦ Management ◦ Upgrade ◦ Maintenance • Open Source ◦ https://github.com/pingcap/tidb-operator TiDB Operator
  14. Why? • MySQL has a big community, everyone knows it

    ◦ Besides sharding, there’s no good scale-out solution, but sharding will introduce extra limits to application layer. • We want to reduce users’ migration cost ◦ Target: if you’re already using MySQL, you don’t have to refactor your application code • TiDB complements MySQL; Not a replacement • Re-using tests and tool chain from MySQL itself and other software rely on MySQL
  15. At the beginning… MySQL New Engine KV KV KV KV

    KV MySQL New Engine TiDB v0.001
  16. At the beginning... Pros: • 100% MySQL Compatibility (Syntax /

    Wire protocol) • Quick and dirty Cons: • SQL layer (optimizer and executor) isn’t aware of data locality • Very hard to push down computing logic (predicate pushdown, aggregation pushdown) • Handling locks across different nodes will be tricky • I don’t want to maintain such a huge C project...
  17. Lexer / Parser • Yacc ◦ cznic/goyacc ◦ cznic/golex •

    100% homemade ◦ Why not use MySQL’s yacc file? ◦ Pros and Cons? • Nothing fancy… • Yacc file: ◦ https://github.com/pingcap/tidb/blob/ master/parser/parser.y
  18. TiDB SQL Layer + Coprocessor • Stateless ◦ Client can

    connect to any existing tidb-server instance • Full-featured SQL Layer ◦ RBO & CBO ◦ Secondary index support ◦ DML & DDL • Pure Go implementation SQL AST Logical Plan Optimized Logical Plan Cost Model Selected Physical Plan TiKV TiKV TiKV tidb-server Statistics TiKV TiKV TiKV Cluster
  19. Table mapping • What happens behind: CREATE TABLE user (

    id INT PRIMARY KEY, name TEXT, email TEXT ); Applications MySQL Drivers(e.g. JDBC) tidb-server(s) TiKV Cluster MySQL Protocol RPC
  20. Table -> KV Every Row: Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned

    by TiDB, all int64) Value: [col1, col2, col3, col4] Index: Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID Value: [null] Keys are ordered by byte array in TiKV, so can support SCAN Every key has a timestamp, issued by Placement Driver (Timestamp Oracle)
  21. Table mapping Key Value user/1 bob | [email protected] user/2 tom

    | [email protected] ... ... INSERT INTO user VALUES (1, “bob”, “[email protected]”); INSERT INTO user VALUES (2, “tom”, “[email protected]”);
  22. Table mapping - Secondary Index • Global index ◦ All

    indexes in TiDB are transactional and fully consistent ◦ Stored as separate key-value pairs in TiKV • Keyed by a concatenation of the index prefix and primary key in TiKV ◦ For example: table := {id, name, email} , id is primary key. If we want to build an index on the name column, ◦ for example we have a row: (2, ‘tom’, ‘[email protected]’), we could store another kv pair just like: ▪ idx:name/tom_2 => nil ▪ idx:name/tom_3 => nil ◦ For unique index ▪ idx:name/tom => 2
  23. CBO 101 Network cost Memory cost CPU cost In TiDB,

    the default memory factor is 5 and CPU factor is 0.8. For example: Operator Sort(r), its cost would be: TiDB will maintain histogram of data
  24. • Hash Join (fastest; if table <= 50 million rows)

    • Sort Merge Join (join on indexed column or ordered data source) • Index Lookup Join (join on indexed column; ideally after filter, result < 10,000 rows) Chosen based on Cost-base Optimizer Notice: TiDB will not re-shuffle the data across different tidb-servers Join Support
  25. Hash Join Small Table Big Table Join Worker Join Worker

    Join Worker Hash Table Output tidb-server 1 2 3 Streaming tikv
  26. Index Lookup Join Small Table Large Table Join Worker Rows

    Output tidb-server 2 3 Batch Fecth data by Index tikv 1 4
  27. • Timestamp Oracle service (from Google’s Percolator paper) • 2-Phase

    commit protocol (2PC) • Problem: Is it a Single Point Of Failure? • Solution: Placement Driver HA cluster ◦ Replicated using Raft ◦ Embedded etcd Transaction Model
  28. • A must-have feature, especially for large tables(billions of rows)!

    • But you don’t want to lock the whole table while changing schema. ◦ Usually distributed database stores tons of data spanning multiple machines • We need a non-blocking schema change algorithm • Inspired by F1 ◦ “Online, Asynchronous Schema Change in F1” - VLDB 2013 Google Online DDL
  29. TiDB Syncer MySQL (master) Syncer Save Point (disk) Rule Filter

    MySQL TiDB Cluster TiDB Cluster TiDB Cluster Syncer Syncer binlog Fake slave Syncer
  30. TiDB Lightning How we import data into TiDB before? •

    Mydumper => INSERT INTO table VALUES (...) => Myloader / Loader => TiDB INSERT INTO table VALUES (...) means: 1. AST / Logical Plan / Physical Plan / Executor 2. Start a transaction (with a start timestamp) 3. Check constraints (PRIMARY KEY, UNIQUE INDEX …) 4. Encode record/index KVs 5. Prewrite 6. Commit (with a commit timestamp)
  31. TiDB Lightning • Skip transaction (Convert SQL to KV directly)

    • Pre-Split Region / Scatter Region • Generate and dispatch SST files • Ingest SST file into RocksDB instance • Source format: CSV or MyDumper
  32. Mobike + TiDB • 200 million users • 200 cities

    • 9 milllion smart bikes • ~30 TB / day
  33. • Locking and unlocking of smart bikes generate massive data

    • Smooth experience is key to user retention • TiDB supports this system by alerting administrators when success rate of locking/unlocking drops, within minutes • Quickly find malfunctioning bikes Scenario #1: Locking/Unlocking
  34. • Synchronize TiDB with MySQL instances using Syncer (proprietary tool)

    • TiDB + TiSpark empower real-time analysis with horizontal scalability • No need for Hadoop + Hive Scenario #2: Real-Time Analysis
  35. • An innovative loyalty program that must be on 24

    x 7 x 356 • TiDB handles: ◦ High-concurrency for peak or promotional season ◦ Permanent storage ◦ Horizontal scalability • No interruption as business evolves Scenario #3: Mobike Store
  36. • Complex calculation pushdown • Key-range pruning • Index support:

    ◦ Clustered index / non-clustered index ◦ Index-only query optimization • Cost-based optimization: ◦ Stats gathered from TiDB in histogram TiSpark: Features
  37. TiKV + PD: Dynamic Split & Merge Region A Region

    A Region B Region A Region A Region B Split Region A Region A Region B Merge TiKV_1 TiKV_2 TiKV_2 TiKV_1
  38. TiKV + PD: Hotspot Removal *Region A* *Region B* Region

    A Region B Workload *Region A* Region B Region A *Region B* Workload Workload Hotspot Schedule (Raft leader transfer) TiKV_1 TiKV_2 TiKV_2 TiKV_1
  39. ID Name Email 1 Edward [email protected] 2 Tom [email protected] ...

    user/1 Edward,[email protected] user/2 Tom,[email protected] ... In TiKV -∞ +∞ (-∞, +∞) Sorted map “User” Table TiDB: Relational -> KV Some region...
  40. • Formal proof using TLA+ ◦ a formal specification and

    verification language to reason about and prove aspects of complex systems • Multi-Raft (Raft group split, merge) • TSO/Percolator • 2PC • See details: https://github.com/pingcap/tla-plus Guaranteeing Correctness