Slide 1

Slide 1 text

New features and enhancements of Spider Storage Engine for sharding MariaDB Corporation Kentoku SHIBA

Slide 2

Slide 2 text

1. what is SPIDER? 2. how long is SPIDER used in the big environment? 3. why SPIDER? what SPIDER can do for you? 4. when SPIDER is right for you? what cases should you use SPIDER? 5. SPIDER sharding architecture 6. how to get SPIDER working? 7. SPIDER’s other features 8. New features and enhancements Agenda

Slide 3

Slide 3 text

What is Spider

Slide 4

Slide 4 text

What is the Spider Storage Engine? Spider is a sharding solution and proxying solution. Spider Storage Engine is a plugin of MariaDB/MySQL. Spider tables can be used to federate from other servers MariaDB/MySQL/OracleDB tables as if they stand on local server. And Spider can create database sharding by using table partitioning feature.

Slide 5

Slide 5 text

What is the Spider Storage Engine? 1.request 2. Execute SQL 4.response AP All databases can be used as ONE database through Spider. AP AP AP AP SPIDER (MariaDB/MySQL) MariaDB tbl_a MySQL tbl_b SPIDER (MariaDB/MySQL) SPIDER (MariaDB/MySQL) OracleDB tbl_c 3. Distributed SQL 3. Distributed SQL 3. Distributed SQL

Slide 6

Slide 6 text

What is the Spider Storage Engine? Spider is bundled in MariaDB from 10.0 and all patches for MariaDB is applied in 10.3

Slide 7

Slide 7 text

How long is SPIDER used in the big environment?

Slide 8

Slide 8 text

How long is SPIDER used in the big environment? Siemens They handle 200 Billion records per quarter on 3 Spider nodes and 4 data nodes. They use this cluster for data quality analytics.

Slide 9

Slide 9 text

How long is SPIDER used in the big environment? Tencent Games They handle 100TB datas on 396 Spider nodes and 2800 data nodes. They use this cluster for produce online games.

Slide 10

Slide 10 text

Why SPIDER? What SPIDER can do for you?

Slide 11

Slide 11 text

Why Spider? What Spider can do for you? For federation You can attach tables from other servers or from local server by using Spider. For sharding You can divide huge tables and huge traffics to multiple servers by using Spider.

Slide 12

Slide 12 text

Why Spider? What Spider can do for you? Cross shard join You can join all tables by using Spider, even if tables are on different servers.

Slide 13

Slide 13 text

simple sharding solution Join operation with simple sharding solution (without Spider) DB1 tbl_a1 1.Request 2. Execute SQL with JOIN 3.Response DB2 AP Join operation requires that all joined tables are on same server. AP AP AP AP tbl_a2 tbl_b1 tbl_b2

Slide 14

Slide 14 text

Join operation with Spider 1.request 2. Execute SQL with JOIN 3.response AP You can JOIN all tables, even if tables are on different servers. AP AP AP AP SPIDER (MariaDB/MySQL) DB1 tbl_a1 DB2 tbl_a2 tbl_b1 tbl_b2

Slide 15

Slide 15 text

Why Spider? What Spider can do for you? Join push down If it is possible, Spider executes JOIN operation at data node directly.

Slide 16

Slide 16 text

JOIN push down 1.request 2. Execute SQL with JOIN 3.response AP If all tables are on same data node, Spider executes JOIN operation on data node directly. AP AP AP AP SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_c tbl_b tbl_d

Slide 17

Slide 17 text

JOIN push down Simple join operation are two times faster on simple JOIN pushdown test. Also, in this pushdown of JOIN, when aggregate functions are included in the query, since the aggregation processing is also executed at the data node, the amount of data transfer is greatly reduced and it becomes super high speed.

Slide 18

Slide 18 text

When SPIDER is right for you? What cases should you use SPIDER?

Slide 19

Slide 19 text

When SPIDER is right for you? What cases should you use SPIDER? You should better use Spider 1.when you have 2 or more services and the services needs to use data of other services. 2.when you need scaling out for huge data or write traffics.

Slide 20

Slide 20 text

When SPIDER is right for you? What cases should you use SPIDER? You should better use Spider 3.Unless some big data solutions you can benefit indexing on shards. 4.You need sharding using sharding key you want. 5.You need sharding and consistency.

Slide 21

Slide 21 text

SPIDER sharding architecture

Slide 22

Slide 22 text

SPIDER sharding architecture Spider stores partitions on different servers. This sharding design is done using the database native table partitioning. You can use all partitioning rules. (key, range, hash, and so on)

Slide 23

Slide 23 text

Sharding 1.request 2. Execute SQL 3.response AP Using table partitioning rules AP AP AP AP SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a DB3 tbl_a DB4 tbl_a

Slide 24

Slide 24 text

SPIDER sharding architecture You can federate multiple servers for the same partition to bring HA and load balancing per partition.

Slide 25

Slide 25 text

Duplicating 1.request 2. Execute SQL 3.response AP Duplicating for each partitions AP AP AP AP SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a DB3 tbl_a DB4 tbl_a DB5 tbl_a Duplicating Duplicating

Slide 26

Slide 26 text

SPIDER sharding architecture When writing multiple replicates or on multiple servers, Spider use 2 phase commit to preserve write consistency.

Slide 27

Slide 27 text

Write consistency 1.request 2. Execute SQL 3.response AP Using 2 phase commit AP AP AP AP SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a DB3 tbl_a DB4 tbl_a 2 phase commit

Slide 28

Slide 28 text

How to get SPIDER working?

Slide 29

Slide 29 text

How to get SPIDER working? (1/5) 1. Install Spider bundled with MariaDB/MySQL. 2. Login MariaDB/MySQL then install Spider as a plugin. (execute install_spider.sql) 3. Create Spider table.

Slide 30

Slide 30 text

How to get SPIDER working? (2/5) Create one to one Spider table. CREATE TABLE t1( c1 int, c2 varchar(100), PRIMARY KEY(c1) )ENGINE=spider DEFAULT CHARSET=utf8 COMMENT ' table "rt1", database "test", port "3306", host "host name of data node", user "user name for data node", password "password for data node" '; Set engine name to “Spider” and write connect information (and parameter) in the comment.

Slide 31

Slide 31 text

How to get SPIDER working? (3/5) You can create Spider tables without column definitions in MariaDB. In this case Spider gets the column definition from data node. CREATE TABLE t1 ENGINE=spider DEFAULT CHARSET=utf8 COMMENT ' table "rt1", database "test", port "3306", host "host name of data node", user "user name for data node", password "password for data node" ';

Slide 32

Slide 32 text

How to get SPIDER working? (4/5) Create one to many (sharding) Spider table CREATE TABLE t1( c1 int, c2 varchar(100), PRIMARY KEY(c1) )ENGINE=spider DEFAULT CHARSET=utf8 COMMENT 'table "rt1", database "test", port "3306", user "user name for data node", password "password for data node"' PARTITION BY RANGE(c1) ( PARTITION p0 VALUES LESS THAN (100000) COMMENT 'host "h1"', PARTITION p1 VALUES LESS THAN (200000) COMMENT 'host "h2"', PARTITION p2 VALUES LESS THAN (300000) COMMENT 'host "h3"', PARTITION p3 VALUES LESS THAN MAXVALUE COMMENT 'host "h4"' ); Write shared connect information to table comment, shard specific connect information to partition comment.

Slide 33

Slide 33 text

How to get SPIDER working? (5/5) You can use “CREATE SERVER” statement for defining connection information. CREATE SERVER srv1 FOREIGN DATA WRAPPER mysql HOST 'host name of data node', DATABASE 'test', USER 'user name for data node', PASSWORD 'password for data node', PORT 3306 ; You can use create server definition by writing “server” parameter into table/partition comment. CREATE TABLE t1( c1 int, c2 varchar(100), PRIMARY KEY(c1) )ENGINE=spider DEFAULT CHARSET=utf8 COMMENT 'table "rt1", server "srv1"';

Slide 34

Slide 34 text

Spider’s other features

Slide 35

Slide 35 text

Spider’s other features Fulltext/Geo search feature You can use backend Fulltext/Geo search feature transparently. NoSQL feature (not available for MariaDB yet) You can use HandlerSocket for Spider. OracleDB connecting You can use OracleDB for data node. Note: You need to build from source code for using this feature

Slide 36

Slide 36 text

Spider’s other features Parallel searching You can search sharded table by parallel. Direct updating Improve updating performance. Direct aggregating Improve aggregating(group by) performance.

Slide 37

Slide 37 text

Spider’s other features Engine condition pushdown Improve searching with full-scan performance. Multi Range Read (include Batched Key Access) Improve searching with join performance.

Slide 38

Slide 38 text

New features and enhancements

Slide 39

Slide 39 text

DDL(Create Table) Pushdown (working) Create Table When Spider gets DDL, Spider creates DDL for data nodes and pushes it. SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a Create Table Create Table

Slide 40

Slide 40 text

The setting of DDL pushdown Working…

Slide 41

Slide 41 text

Support more aggregate functions for direct join Support count(distinct), sum(distinct), avg() and avg(distinct). SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a Select count(distinct c2) from tbl_a where c1 = 1; Select count(distinct c2) from tbl_a where c1 = 1;

Slide 42

Slide 42 text

Support outer joins for direct join Spider can pushdown a query with outer joins if these tables are on same data node. SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a Select … from tbl_a left join tbl_b on … where tbl_a.c2 = tbl_b.c1; Select … from tbl_a left join tbl_b on … where tbl_a.c2 = tbl_b.c1;

Slide 43

Slide 43 text

Transaction isolation for slave sql threads Replication Spider changes transaction isolation level for slave sql threads. SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a Master tbl_a Repeatable Read Repeatable read for user threads Read committed only for slave sql threads (spider_slave_trx_isolation = 1)

Slide 44

Slide 44 text

Use disk by size of a resultset from a data node Spider uses a disk temporary table if a resultset from a data node is too big. SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a

Slide 45

Slide 45 text

Parallel checksum table Checksum Table SPIDER (MariaDB/MySQL) DB1 tbl_a DB2 tbl_a Checksum Table Checksum Table

Slide 46

Slide 46 text

Other new features and enhancements Adjusting wait_timeout A feature of keeping connections. Synchornizing sql_mode Support rules of sql_mode transparently. Support mixed charsets in a query Support multiple lingual use cases. Performance improvement of partition with a large amount of partitions (working) Improve performance of sharding environments.

Slide 47

Slide 47 text

Other new features and enhancements Bugfixes Thanks to Eric, Mattias, Simon at Booking.com Felix, Will at Tencent Games Michal at Fedora Project and others.

Slide 48

Slide 48 text

Thank you for taking your time!!