Slide 1

Slide 1 text

Scaling Database by Sharding Using Maghead Database Framework Recca Tsai @recca0120 Yo-An Lin @c9s

Slide 2

Slide 2 text

Outline • Maghead Overview • Why? And the history • The look of Maghead • Performance • Sharding • When do you Shard? • What's Sharding? • How to Shard? • Consistent Hashing • Chunks and Chunk Migration • Shard Balancing • Commands and the APIs

Slide 3

Slide 3 text

Why ORM?

Slide 4

Slide 4 text

Actually when we mention ORM

Slide 5

Slide 5 text

Usually it refers to tools including ORM and many other things

Slide 6

Slide 6 text

DBAL | ORM | SQL Builder | Migration | Database Management

Slide 7

Slide 7 text

Some people say ...

Slide 8

Slide 8 text

DON'T USE ORM

Slide 9

Slide 9 text

Because of Performance

Slide 10

Slide 10 text

Use Hand-Written SQL Queries...

Slide 11

Slide 11 text

Actually if you ....

Slide 12

Slide 12 text

Actually if you • need to finish 2+ site projects per month • have different modules shared between different projects with some customization • need to maintain legacy projects without db query tests • have frequently schema changes from different team members • need to remember more than the changes from more than 10, 20, 30 tables

Slide 13

Slide 13 text

Trust me, You Won't like it

Slide 14

Slide 14 text

Writing basic SQL in every projects is really painful

Slide 15

Slide 15 text

ORM helps you ... • Have the domain logics well-organized. • Perform lightweight automatic migration when you have schema changes. • Generate the common, basic SQL queries for the most use cases without the need of manually updating the SQL queries. • Manage the database connections with a centralized simple config file. • !! Actually you can still do hand-written query if you want

Slide 16

Slide 16 text

And it saves your Life

Slide 17

Slide 17 text

What's Maghead

Slide 18

Slide 18 text

Maghead Database Framework

Slide 19

Slide 19 text

Object Relation Mapper

Slide 20

Slide 20 text

DataBase Abstraction Layer

Slide 21

Slide 21 text

SQL Builder

Slide 22

Slide 22 text

Connection Manager

Slide 23

Slide 23 text

Database Manager

Slide 24

Slide 24 text

Table Definition Parser

Slide 25

Slide 25 text

Automatic Migration

Slide 26

Slide 26 text

Migration Manager

Slide 27

Slide 27 text

Repository Pattern

Slide 28

Slide 28 text

ActiveRecord Pattern

Slide 29

Slide 29 text

And many components, utilities

Slide 30

Slide 30 text

The project was started since 2010

Slide 31

Slide 31 text

7 years

Slide 32

Slide 32 text

The project was created to have the dynamic schema in different projects

Slide 33

Slide 33 text

How does it looks like?

Slide 34

Slide 34 text

YAML configuration

Slide 35

Slide 35 text

--- cli: bootstrap: vendor/autoload.php schema: auto_id: true finders: - { name: ComposerSchemaFinder, args: ["composer.json"] }

Slide 36

Slide 36 text

--- ... instance: local: dsn: 'mysql:host=localhost' user: root driver: mysql host: localhost password: null query_options: { } connection_options: 1002: 'SET NAMES utf8' ...

Slide 37

Slide 37 text

--- ... databases: master: dsn: 'mysql:host=localhost;dbname=testing' host: localhost user: root driver: mysql database: testing password: null query_options: { } connection_options: 1002: 'SET NAMES utf8'

Slide 38

Slide 38 text

Simple bootstrapping

Slide 39

Slide 39 text

Bootstrap Code require 'vendor/autoload.php'; use Maghead\Runtime\Config\FileConfigLoader; use Maghead\Runtime\Bootstrap; $config = FileConfigLoader::load( __DIR__ . '/db/config/ database.yml'); Bootstrap::setup($config);

Slide 40

Slide 40 text

Bootstrap Code require 'vendor/autoload.php'; use Maghead\Runtime\Config\FileConfigLoader; use Maghead\Runtime\Bootstrap; $config = FileConfigLoader::load('db/config/database.yml'); Bootstrap::setup($config); 2 lines only!

Slide 41

Slide 41 text

Database Management Commands

Slide 42

Slide 42 text

maghead db create master

Slide 43

Slide 43 text

maghead db recreate master

Slide 44

Slide 44 text

maghead db create node1

Slide 45

Slide 45 text

maghead db drop node1

Slide 46

Slide 46 text

Dynamic Schema In PHP

Slide 47

Slide 47 text

namespace Todos\Model; use Maghead\Schema\DeclareSchema; class TodoSchema extends DeclareSchema { public function schema() { $this->column('title') ->varchar(128) ->required() ; $this->column('done') ->boolean() ->default(false); $this->column('description') ->text(); $this->column('created_at') ->timestamp() ->default(function() { return date('c'); }); } }

Slide 48

Slide 48 text

And can be compiled into the static schema class

Slide 49

Slide 49 text

• Maghead\Schema\DeclareSchema • Maghead\Schema\DeclareColumn • Maghead\Schema\RuntimeSchema • Maghead\Schema\RuntimeColumn Compile lightweight implementation declarative implementation

Slide 50

Slide 50 text

maghead schema build

Slide 51

Slide 51 text

namespace Todos\Model; class TodoSchemaProxy extends RuntimeSchema { const SCHEMA_CLASS = 'Todos\\Model\\TodoSchema'; const LABEL = 'Todo'; const MODEL_NAME = 'Todo'; const MODEL_NAMESPACE = 'Todos\\Model'; const MODEL_CLASS = 'Todos\\Model\\Todo'; const REPO_CLASS = 'Todos\\Model\\TodoRepoBase'; const COLLECTION_CLASS = 'Todos\\Model\\TodoCollection'; const TABLE = 'todos'; const PRIMARY_KEY = 'id'; const GLOBAL_PRIMARY_KEY = NULL; const LOCAL_PRIMARY_KEY = 'id'; public static $column_hash = array ( 'id' => 1, 'title' => 1, 'done' => 1, 'description' => 1, 'created_at' => 1, ); public $columnNames = array ( 0 => 'id', 1 => 'title', 2 => 'done', 3 => 'description', 4 => 'created_at', ); ............ cut

Slide 52

Slide 52 text

And This Reduces The Runtime Overhead

Slide 53

Slide 53 text

Pre-Generated SQL Queries

Slide 54

Slide 54 text

class ProductRepoBase extends Repo { const FIND_BY_PRIMARY_KEY_SQL = 'SELECT * FROM products WHERE id = ? LIMIT 1'; const DELETE_BY_PRIMARY_KEY_SQL = 'DELETE FROM products WHERE id = ?'; const FETCH_CREATED_BY_SQL = 'SELECT * FROM users WHERE id = ? LIMIT 1'; const FETCH_UPDATED_BY_SQL = 'SELECT * FROM users WHERE id = ? LIMIT 1'; const FETCH_PRODUCT_FEATURES_SQL = 'SELECT * FROM product_feature_junction WHERE product_id = ?'; const FETCH_PRODUCT_PRODUCTS_SQL = 'SELECT * FROM product_products WHERE product_id = ?'; const FETCH_IMAGES_SQL = 'SELECT * FROM product_images WHERE product_id = ?'; const FETCH_PROPERTIES_SQL = 'SELECT * FROM product_properties WHERE product_id = ?'; const FETCH_TYPES_SQL = 'SELECT * FROM product_types WHERE product_id = ?'; const FETCH_RESOURCES_SQL = 'SELECT * FROM product_resources WHERE product_id = ?'; const FETCH_FILES_SQL = 'SELECT * FROM product_files WHERE product_id = ?'; const FETCH_PRODUCT_RECIPES_SQL = 'SELECT * FROM recipe_to_products WHERE product_id = ?'; const FETCH_CATEGORY_SQL = 'SELECT * FROM product_categories WHERE id = ? LIMIT 1'; const FETCH_PRODUCT_TAGS_SQL = 'SELECT * FROM product_tag_junction WHERE product_id = ?';

Slide 55

Slide 55 text

Pre-Generated PDO Queries

Slide 56

Slide 56 text

Statements are prepared and cached in the Repository of each Model

Slide 57

Slide 57 text

Same as your hand-written queries

Slide 58

Slide 58 text

class ProductRepoBase extends Repo { public function deleteByPrimaryKey($pkId) { if (!$this->deleteStm) { $this->deleteStm = $this->write->prepare(self::DELETE_BY_PRIMARY_KEY_SQL); } return $this->deleteStm->execute([$pkId]); } public function fetchCreatedByOf(Model $record) { if (!$this->fetchCreatedByStm) { $this->fetchCreatedByStm = $this->read->prepare(self::FETCH_CREATED_BY_SQL); $this->fetchCreatedByStm->setFetchMode(PDO::FETCH_CLASS, \UserBundle\Model\User::class, [$this]); } $this->fetchCreatedByStm->execute([$record->created_by]); $obj = $this->fetchCreatedByStm->fetch(); $this->fetchCreatedByStm->closeCursor(); return $obj; }

Slide 59

Slide 59 text

Building Table Schema

Slide 60

Slide 60 text

maghead sql --rebuild node1

Slide 61

Slide 61 text

One Schema to Rule Them All

Slide 62

Slide 62 text

$create = $product->asCreateAction();

Slide 63

Slide 63 text

$create = $product->asCreateAction(); {{ RecordAction.renderSignatureWidget|raw }} {{ RecordAction.renderCSRFTokenWidget|raw }} {% if Record.hasKey %} {{RecordAction.renderKeyWidget|raw}} {% endif %} {{RecordAction.renderField('account') |raw }} {{RecordAction.renderField('password') |raw }}

Slide 64

Slide 64 text

Relationships

Slide 65

Slide 65 text

$this->belongsTo('book', BookSchema::class, 'id', 'book_id') ->onDelete('CASCADE') ->onUpdate('CASCADE') ;

Slide 66

Slide 66 text

$this->many('author_books', AuthorBookSchema::class, 'author_id', 'id'); $this->manyToMany('books', 'author_books', 'book');

Slide 67

Slide 67 text

$this->many('author_books', AuthorBookSchema::class, 'author_id', 'id'); $this->manyToMany('books', 'author_books', 'book'); foreach ($author->books as $book) { ... }

Slide 68

Slide 68 text

ActiveRecord pattern

Slide 69

Slide 69 text

Most of ActiveRecord patterns are implemented in this way

Slide 70

Slide 70 text

$record = new User; $record->account = "c9s"; $record->password = sha1("mypassword"); $record->save(); Different States:
 1. existing record 2. non-existing record (before inserting the row)

Slide 71

Slide 71 text

function updatePassword($user) { // check the existence because the object // might be not able to update the // password if (!$user->hasKey()) { throw new LogicException; } // update ... }

Slide 72

Slide 72 text

function updatePassword($user) { // check the existence because the object // might be not able to update the // password if (!$user->hasKey()) { throw new LogicException; } // update ... } The user record might be not created yet.

Slide 73

Slide 73 text

function updatePassword($user) { // check the existence because the object // might be not able to update the // password if (!$user->hasKey()) { throw new LogicException; } // update ... } Check the key to verify the existence

Slide 74

Slide 74 text

The Maghead Way

Slide 75

Slide 75 text

$record = User::load([ "account" => "timcook" ]); If the returned $record is not false, then it must be an existing row.

Slide 76

Slide 76 text

$user = User::createAndLoad([ "account" => "timcook" ]); If the returned $record is not false, then it must be an existing row.

Slide 77

Slide 77 text

$user->update([ "password" => sha1("newpw") ]); Safely Update the row without the concern

Slide 78

Slide 78 text

Repository pattern for Multiple Database Connections

Slide 79

Slide 79 text

The Doctrine Way

Slide 80

Slide 80 text

use Doctrine\DBAL\DriverManager; $conn = DriverManager::getConnection(array( 'wrapperClass' => 'Doctrine\DBAL\Connections\MasterSlaveConnection', 'driver' => 'pdo_mysql', 'keepSlave' => true, 'master' => array( 'user' => 'ideato', 'password' => 'ideato', 'dbname' => 'db_ideato' ), 'slaves' => array( array( 'user' => 'ideato', 'password' => 'ideato', 'dbname' => 'db_ideato_slave' ) ) ) ); $entityManager = EntityManager::create($conn, $config);

Slide 81

Slide 81 text

// https://gist.github.com/ricfrank/d6f6317a1a1434cdc364 $entityManager = EntityManager::create($conn, $config); $productRepository = $entityManager->getRepository('Product'); $masterSlaveConn->connect('slave'); $productRepository = $entityManager->getRepository('Product'); $masterSlaveConn->connect('master'); $productRepository = $entityManager->getRepository('Product');

Slide 82

Slide 82 text

$entityManager = EntityManager::create($conn, $config); $productRepository = $entityManager->getRepository('Product'); $masterSlaveConn->connect('slave'); $product = new Product(); $product->setName("nuovo_prod1"); $product->setCategory("oeeooeoe"); $entityManager->persist($product); $entityManager->flush(); $productRepository = $entityManager->getRepository('Product'); $masterSlaveConn->connect('master'); $productRepository = $entityManager->getRepository('Product');

Slide 83

Slide 83 text

The Maghead Way

Slide 84

Slide 84 text

$ret = Product::repo("master") ->create([ "name" => "Samsung S3" ]); $ret = Product::masterRepo() ->create([ "name" => "Samsung S3" ]);

Slide 85

Slide 85 text

$ret = Product::repo("node2") ->create([ "name" => "iPhone 7" ]);

Slide 86

Slide 86 text

$ret = Product::repo(new Connection("mysql:host=db1")) ->create([ "name" => "Samsung S3" ]);

Slide 87

Slide 87 text

Automatic Migration Known as lightweight migration in iOS app development

Slide 88

Slide 88 text

maghead diff

Slide 89

Slide 89 text

No content

Slide 90

Slide 90 text

D: 欄欄位刪除

Slide 91

Slide 91 text

A: 欄欄位新增

Slide 92

Slide 92 text

M: 欄欄位修改

Slide 93

Slide 93 text

maghead m auto

Slide 94

Slide 94 text

全⾃自動產⽣生 Alter Table Query

Slide 95

Slide 95 text

Database Support

Slide 96

Slide 96 text

SQLite MySQL PostgreSQL

Slide 97

Slide 97 text

How Fast?

Slide 98

Slide 98 text

0 125 250 375 500 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent Benchmark: score -> lower is better

Slide 99

Slide 99 text

0 125 250 375 500 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent Insertion

Slide 100

Slide 100 text

0 125 250 375 500 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent findPK

Slide 101

Slide 101 text

0 125 250 375 500 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent Complex Query

Slide 102

Slide 102 text

0 100 200 300 400 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent Benchmark: Insert lower is better

Slide 103

Slide 103 text

0 15 30 45 60 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent Benchmark: Find PK lower is better

Slide 104

Slide 104 text

0 125 250 375 500 PDO Maghead LessQL YiiM Yii2M Propel20 DoctrineM Eloquent Benchmark: Complex Query lower is better

Slide 105

Slide 105 text

0 100 200 300 400 Insert findPk Complex 373 54 367 56 1 91 Maghead Eloquent

Slide 106

Slide 106 text

The fastest pure PHP ORM implementation

Slide 107

Slide 107 text

Sharding

Slide 108

Slide 108 text

When do you Shard?

Slide 109

Slide 109 text

You started from one single database server

Slide 110

Slide 110 text

When rows grows up to million...

Slide 111

Slide 111 text

queries become slow ...

Slide 112

Slide 112 text

more read queries start hitting on the database

Slide 113

Slide 113 text

then you split read/write to master/replicas

Slide 114

Slide 114 text

finally the write load increases, master is crying

Slide 115

Slide 115 text

And rows grows up to 10M+...

Slide 116

Slide 116 text

Covering index doesn't work well anymore

Slide 117

Slide 117 text

Any query without covering index won't come back in 2-5 minutes

Slide 118

Slide 118 text

What NOW?

Slide 119

Slide 119 text

Upgrade the hardware? $$$$

Slide 120

Slide 120 text

What's Sharding?

Slide 121

Slide 121 text

Vertical Partitioning

Slide 122

Slide 122 text

Horizontal Partitioning

Slide 123

Slide 123 text

10,000,000 records

Slide 124

Slide 124 text

100,000 records 100,000 records 100,000 records 100,000 records 100,000 records 100,000 records 100,000 records 100,000 records 100,000 records 100,000 records 10,000,000 records

Slide 125

Slide 125 text

No content

Slide 126

Slide 126 text

Sounds Easy?

Slide 127

Slide 127 text

Then, How to Shard ?

Slide 128

Slide 128 text

Shards

Slide 129

Slide 129 text

Server #1 app_db

Slide 130

Slide 130 text

app_db_3 Server #1 app_db_2 app_db_1 app_db_0

Slide 131

Slide 131 text

app_db_3 Server #1 app_db_2 app_db_1 app_db_0 Shard #1 Shard #2 Shard #3 Shard #4

Slide 132

Slide 132 text

app_db_3 Server #1 app_db_2 app_db_1 app_db_0 Shard #1 (user id 0~100) Shard #2 (user id 101~200) Shard #3 (user id 201~300) Shard #4 (user id 301~400)

Slide 133

Slide 133 text

app_db_3 Server #1 app_db_2 app_db_1 app_db_0 Shard #1 Shard #2 Shard #3 Shard #4 app_db_7 Server #2 app_db_6 app_db_5 app_db_4 Shard #5 Shard #6 Shard #7 Shard #8 Multiple Shards per Server

Slide 134

Slide 134 text

Server #1 app_db_0 Shard #1 Server #2 app_db_1 Shard #2 Server #3 app_db_2 Shard #3 Server #4 app_db_3 Shard #4 One Shard per Server

Slide 135

Slide 135 text

Shard Key

Slide 136

Slide 136 text

Shard Key is the key you use to split the data

Slide 137

Slide 137 text

Key

Slide 138

Slide 138 text

Key -> Compute Shard

Slide 139

Slide 139 text

-> Shard #

Slide 140

Slide 140 text

user_id -> hash -> shard #5

Slide 141

Slide 141 text

store_id -> hash -> shard #2

Slide 142

Slide 142 text

Types of Shard Keys • Hash • Range • User-defined (Tags in MongoDB)

Slide 143

Slide 143 text

How to choose the Shard Key ?

Slide 144

Slide 144 text

It depends on how you find / query the data

Slide 145

Slide 145 text

Example #1 • Company (id) • Store (id, company_id) • Order (id, store_id, amount ...) • OrderItem (order_id, quantity, ...) • OrderPayment (order_id...)

Slide 146

Slide 146 text

Then you need to analysis across different stores

Slide 147

Slide 147 text

SELECT SUM(amount) FROM orders WHERE store_id IN (2,3,4,5) GROUP BY store_id

Slide 148

Slide 148 text

Your key is the ID of the stores

Slide 149

Slide 149 text

Example #1 • Company (id) • Store (id, company_id) • Order (id, company_id, store_id, amount ...) • OrderItem (company_id, store_id, order_id, quantity, ...) • OrderPayment (company_id, store_id, order_id ...)

Slide 150

Slide 150 text

And then you can split the tables by store_id

Slide 151

Slide 151 text

The store_id could also be a part of the UUID of the order

Slide 152

Slide 152 text

And so you can extract the store_id from the UUID to lookup the shard

Slide 153

Slide 153 text

Different Sharding Approaches • Database level sharding • SQL level sharding • Middleware-based sharding • Application level sharding

Slide 154

Slide 154 text

RDBMS Solutions • Vitas (from YouTube, Written in Go) • MySQL Cluster (NDB) • MySQL Fabric (Middleware-based sharding in Python) • MariaDB Spider Engine • GPDB (PostgreSQL)

Slide 155

Slide 155 text

Vitas • Written in Go • Use sql parser to distribute the queries • Only for MySQL • protocol gRPC • Used by YouTube

Slide 156

Slide 156 text

MySQL Cluster • Limited to 48 hosts • Shard Key MUST BE in primary key and can't be changed. Similar to MySQL partitions. • Primary key could be compound primary key • NDB Engine • When query doesn't include shard key, query will be scattered query.

Slide 157

Slide 157 text

MySQL Fabric • Middleware-based sharding • Written in Python • Has it's own connector API binding for different languages (Java, Python) • Manually set the host to operate from the application code. • Bottleneck will be the middleware.

Slide 158

Slide 158 text

by @gslin

Slide 159

Slide 159 text

Consistent Hashing

Slide 160

Slide 160 text

0 2^32

Slide 161

Slide 161 text

0 2^32 DB1 DB2

Slide 162

Slide 162 text

0 2^32 DB1 DB2 User(UUID=7fd7ed08-5da8-11e7-9bb5-3c15c2cb5a5a)

Slide 163

Slide 163 text

0 2^32 DB1 DB2 User(UUID=7fd7ed08-5da8-11e7-9bb5-3c15c2cb5a5a) Shard Key Hash => 33086 selected

Slide 164

Slide 164 text

0 2^32 DB1 DB2 Shard Key Hash => 420860

Slide 165

Slide 165 text

0 2^32 DB1 DB2 Shard Key Hash => 420860 selected

Slide 166

Slide 166 text

PHP CRC32 returns 2^32 positive integer on 64bit machine

Slide 167

Slide 167 text

And so we use the native PHP array to record them

Slide 168

Slide 168 text

foreach ($mapping->chunks as $i => $c) { $x = $c['index']; $this->buckets[$x] = $i; $this->targetIndexes[$i][] = $x; } ksort($this->buckets, SORT_REGULAR);

Slide 169

Slide 169 text

Chunks

Slide 170

Slide 170 text

Split the key space of the shard key into chunks

Slide 171

Slide 171 text

Chunks

Slide 172

Slide 172 text

And each chunk is assigned to a shard

Slide 173

Slide 173 text

• Chunk #1 • Chunk #2 • Chunk #3 • Chunk #4 • ........

Slide 174

Slide 174 text

• Key(0~100) Chunk #1 • Key(101~200) Chunk #2 • Key(201~300) Chunk #3 • Key(301~400) Chunk #4 • ........

Slide 175

Slide 175 text

• Key(0~100) Chunk #1 -> Shard #1 • Key(101~200) Chunk #2 -> Shard #1 • Key(201~300) Chunk #3 -> Shard #2 • Key(301~400) Chunk #4 -> Shard #2 • ........

Slide 176

Slide 176 text

$mapping = new ShardMapping('store_keyspace', [ 'key' => 'store_id', 'shards' => ['node1', 'node2', 'node3'], 'chunks' => [ ["from" => 0, "index" => 536870912, "shard" => "node1" ], ["from" => 536870912, "index" => 1073741824, "shard" => "node1" ], ["from" => 1073741824, "index" => 1610612736, "shard" => "node1" ], ["from" => 1610612736, "index" => 2147483648, "shard" => "node2" ], ["from" => 2147483648, "index" => 2684354560, "shard" => "node2" ], ["from" => 2684354560, "index" => 3221225472, "shard" => "node2" ], ["from" => 3221225472, "index" => 3758096384, "shard" => "node3" ], ["from" => 3758096384, "index" => 4294967296, "shard" => "node3" ], ] ], $dataSourceManager); $hasher = new FastHasher($mapping); $hash = $hasher->hash($key); // return 3221225472

Slide 177

Slide 177 text

Shard Balancing

Slide 178

Slide 178 text

Shard Migration

Slide 179

Slide 179 text

• Server #1 (10 shards) • Server #2 (10 shards) • Server #3 (10 shards)

Slide 180

Slide 180 text

• Server #1 (10 shards) • Server #2 (10 shards) • Server #3 (10 shards) Server load increases because server #1 has more rows than other servers.

Slide 181

Slide 181 text

• Server #1 (9 shards) • Server #2 (10 shards) • Server #3 (10 shards) • Server #4 (1 shard) Migrate the bottleneck shard from Server #1 to a new server.

Slide 182

Slide 182 text

The underlying tool mysqldbcopy

Slide 183

Slide 183 text

mysqldbcopy --replication=slave --locking=lock-all --source=root:pass@host1 --destination=root:pass@host2 database replication mode=slave: copy a database from one slave to another attached to the same master

Slide 184

Slide 184 text

No content

Slide 185

Slide 185 text

use Maghead\Sharding\Operations\CloneShard; $op = new CloneShard($config); // public function clone($mappingId, $instanceId, $newNodeId, $srcNodeId) $op->clone("store_keyspace", "server3", "shard1000", "shard0003");

Slide 186

Slide 186 text

maghead shard clone --drop-first --mapping store_key --instance server2 a01 a11 clone shard "a01" to the a new shard on server2 named "a11"

Slide 187

Slide 187 text

maghead shard allocate --mapping store_key --instance server2 a02 allocate an empty shard "a02" on server 2 and initialize all the tables

Slide 188

Slide 188 text

Shard Balancing API

Slide 189

Slide 189 text

$balancer = new ShardBalancer( new ConservativeShardBalancerPolicy(1, false, 1.3) ); $migrateInfo = $balancer->balance($mapping, [new OrderSchema]);

Slide 190

Slide 190 text

$balancer = new ShardBalancer( new ConservativeShardBalancerPolicy(1, false, 1.3) ); $migrateInfo = $balancer->balance($mapping, [new OrderSchema]); 保守式平衡規則

Slide 191

Slide 191 text

$balancer = new ShardBalancer( new ConservativeShardBalancerPolicy(1, false, 1.3) ); $migrateInfo = $balancer->balance($mapping, [new OrderSchema]); ⼀一次只找出⼀一個 Chunk 做 Migration

Slide 192

Slide 192 text

$balancer = new ShardBalancer( new ConservativeShardBalancerPolicy(1, false, 1.3) ); $migrateInfo = $balancer->balance($mapping, [new OrderSchema]); 找出 rows 總量量超過平均 1.3 倍的 Shard

Slide 193

Slide 193 text

Chunk Migration

Slide 194

Slide 194 text

• Key(0~100) Chunk #1 -> Shard #1 • Key(101~200) Chunk #2 -> Shard #1 • Key(201~300) Chunk #3 -> Shard #2 • Key(301~400) Chunk #4 -> Shard #2 • ........

Slide 195

Slide 195 text

• Key(0~100) Chunk #1 -> Shard #1 • Key(101~200) Chunk #2 -> Shard #1 • Key(201~300) Chunk #3 -> Shard #2 • Key(301~400) Chunk #4 -> Shard #2 • ........ When the load of shard #2 increases, we can assign some chunks to the other shard.

Slide 196

Slide 196 text

• Key(0~100) Chunk #1 -> Shard #1 • Key(101~200) Chunk #2 -> Shard #1 • Key(201~300) Chunk #3 -> Shard #2 • Key(301~400) Chunk #4 -> Shard #2 • ........ When the rows in Chunk #4 grow up too much, We can split the Chunk #4 into 2+ chunks and assign them to the other shards.

Slide 197

Slide 197 text

Chunk #1 Chunk #2 Chunk #3 Chunk #4 Chunk #5 Chunk #6 Chunk #7 Chunk #8 Shard #1 Shard #2 Chunk #4 Copy

Slide 198

Slide 198 text

Chunk #1 Chunk #2 Chunk #3 Chunk #4 Chunk #5 Chunk #6 Chunk #7 Chunk #8 Shard #1 Shard #2 Chunk #4 Verify

Slide 199

Slide 199 text

Chunk #1 Chunk #2 Chunk #3 Chunk #5 Chunk #6 Chunk #7 Chunk #8 Shard #1 Shard #2 Chunk #4 Remove

Slide 200

Slide 200 text

ChunkManager API

Slide 201

Slide 201 text

namespace Maghead\Sharding\Manager; class ChunkManager { public function __construct(ShardMapping $mapping) { } public function distribute(array $shardIds, $numberOfChunks = 32) { } public function verify(Chunk $chunk, Shard $dstShard, array $schemas) { } public function remove(Chunk $chunk, array $schemas) { } public function clone(Chunk $chunk, Shard $dstShard, array $schemas) { } public function migrate(Chunk $chunk, Shard $dstShard, array $schemas) { } public function move(Chunk $chunk, Shard $dstShard, array $schemas) { } public function split(Chunk $chunk, $n = 2) { } }

Slide 202

Slide 202 text

$result = $chunkManager->migrate($chunk2, $shards['node3'], $schemas); if (!$result->isSuccessful()) { // ... }

Slide 203

Slide 203 text

Jumbo Chunk

Slide 204

Slide 204 text

Chunks contain a large number of rows

Slide 205

Slide 205 text

Migrating Jumbo Chunks requires more time

Slide 206

Slide 206 text

Migration should start from the small chunks

Slide 207

Slide 207 text

And in the same zone

Slide 208

Slide 208 text

Start Sharding in Maghead

Slide 209

Slide 209 text

Define Shard Mapping or Keyspace

Slide 210

Slide 210 text

maghead shard mapping add --hash -s node1 -s node2 -s node3 --key store_id M_store_id

Slide 211

Slide 211 text

--- sharding: mappings: M_store_id: key: "store_id" shards: - node1 - node2 - node3 hash: true chunks: - { from: 0, index: 536870912, shard: node1 } - { from: 536870912, index: 1073741824, shard: node1 } - { from: 1073741824, index: 1610612736, shard: node1 } - { from: 1610612736, index: 2147483648, shard: node2 } - { from: 2147483648, index: 2684354560, shard: node2 } - { from: 2684354560, index: 3221225472, shard: node2 } - { from: 3221225472, index: 3758096384, shard: node3 } - { from: 3758096384, index: 4294967296, shard: node3 } ⾃自動分派的 Chunk 定義

Slide 212

Slide 212 text

class StoreSchema extends DeclareSchema { public function schema() { $this->globalTable("M_store_id"); Store table 需要在 Shards 中被 Join 所以被定義為 Global Table

Slide 213

Slide 213 text

namespace StoreApp\Model; use Maghead\Schema\DeclareSchema; class OrderSchema extends DeclareSchema { public function schema() { $this->column('uuid', 'uuid-pk'); {....cut....} $this->shardBy("M_store_id"); } } Order table 需要依照定義的 keyspace 做分片

Slide 214

Slide 214 text

namespace StoreApp\Model; use Maghead\Schema\DeclareSchema; class OrderSchema extends DeclareSchema { public function schema() { $this->column('uuid', 'uuid-pk'); {....cut....} $this->shardBy("M_store_id"); } } 快速定義 UUID primary key

Slide 215

Slide 215 text

maghead schema build

Slide 216

Slide 216 text

Use the model as usual

Slide 217

Slide 217 text

$ret = Order::create([ 'store_id' => $storeId, 'amount' => 600, ]); Automatically dispatch record to the shard by the store_id

Slide 218

Slide 218 text

$ret->shard; // Shard object The dispatched shard is returned in the result object.

Slide 219

Slide 219 text

Selecting Shards

Slide 220

Slide 220 text

$store = Store::masterRepo()->findByCode('TW002'); $shard = Order::shards()->dispatch($store->id); // $this->assertInstanceOf('Maghead\\Sharding\\Shard', $shard);

Slide 221

Slide 221 text

Deletion works as well

Slide 222

Slide 222 text

$order = Order::findByPrimaryKey($key); // $this->assertNotNull($order->repo, "found order repo"); // $this->assertNotNull($order->shard, "the shard"); $ret = $order->delete();

Slide 223

Slide 223 text

ShardCollection

Slide 224

Slide 224 text

$shards = Order::shards(); $shards->locateBy(function($repo, $shard) { return $repo->findByCode('X01'); });

Slide 225

Slide 225 text

$shards = Order::shards(); // ShardCollection $shards->map(function($repo, $shard) { // do something });

Slide 226

Slide 226 text

QueryWorker

Slide 227

Slide 227 text

• Maghead\Sharding\QueryMapper\GearmanQueryMapper • Maghead\Sharding\QueryMapper\PthreadQueryMapper

Slide 228

Slide 228 text

$query = new SelectQuery; $query->select(['SUM(amount)' => 'amount']); $query->from('orders'); $query->where()->in('store_id', [1,2]); $client = new GearmanClient; $client->addServer(); $mapper = new GearmanQueryMapper($client); $res = $mapper->map($shards, $query);

Slide 229

Slide 229 text

Reducer PHP7 Extension

Slide 230

Slide 230 text

Data Group By implemented in PHP Runtime

Slide 231

Slide 231 text

$rows = [ [ 'category' => 'Food', 'type' => 'pasta', 'amount' => 1 ], [ 'category' => 'Food', 'type' => 'pasta', 'amount' => 1 ], [ 'category' => 'Food', 'type' => 'juice', 'amount' => 1 ], [ 'category' => 'Food', 'type' => 'juice', 'amount' => 1 ], [ 'category' => 'Book', 'type' => 'programming', 'amount' => 5 ], [ 'category' => 'Book', 'type' => 'programming', 'amount' => 2 ], [ 'category' => 'Book', 'type' => 'cooking', 'amount' => 6 ], [ 'category' => 'Book', 'type' => 'cooking', 'amount' => 2 ], ]; $result = group_by($rows, ['category','type'], [ 'total_amount' => [ 'selector' => 'amount', 'aggregator' => REDUCER_AGGR_SUM, ], 'cnt' => REDUCER_AGGR_COUNT, ]); print_r($result);

Slide 232

Slide 232 text

See more on GitHub

Slide 233

Slide 233 text

Thank You

Slide 234

Slide 234 text

Join Us! https://github.com/maghead/maghead Follow me on twitter: @c9s