Slide 1

Slide 1 text

Sharding in modern [No]SQL databases Konstantin Osipov, CTO, Tarantool Novosibirsk, Russia, 01/04/2018

Slide 2

Slide 2 text

whoami ● MySQL core team member 2003-2010 ● Tarantool core team member 2010-now ● http://t.me/tarantoolru ● http://tarantool.org

Slide 3

Slide 3 text

The plan NoSQL системы всё больше смещаются в сторону NewSQL. Давайте посмотрим, как на это влияют ключевые технические решения в реализации шардинга, принятые на старте проекта. ● sharding introduction ● vBucket or not; hash or range! ● routing ● secondary keys are hard

Slide 4

Slide 4 text

Sharding: introduction Sharding - горизонтальное разбиение данных СУБД по нескольким машинам. Позволяет линейно масштабировать ёмкость и (возможно) пропускную способность СУБД. Требует решения 3 задач: ● выбрать принцип разбиения данных ● обеспечить хранение метаданных ● организовать роутинг запросов к данным

Slide 5

Slide 5 text

Sharding: shard function Три основных способа задания shard function: ● Hash ● Range ● Range* (Stonebraker) Shard-функция также влияет на алгоритм балансировки: может перемещаться ключ, диапазон (range) или использоваться концепция virtual buckets.

Slide 6

Slide 6 text

Sharding: consistent hash

Slide 7

Slide 7 text

Sharding: Guava

Slide 8

Slide 8 text

Sharding: hash + virtual buckets in Couchbase

Slide 9

Slide 9 text

Sharding: ranges and chunks in MongoDB

Slide 10

Slide 10 text

Sharding: replica sets in MongoDB

Slide 11

Slide 11 text

Sharding: chunk splits and migrations in MongoDB

Slide 12

Slide 12 text

Sharding: ranges in CockroachDB

Slide 13

Slide 13 text

mongodb For queries that don’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations. However, range based partitioning can result in an uneven distribution of data, which may negate some of the benefits of sharding. For example, if the shard key is a linearly increasing field, such as time, then all requests for a given time range will map to the same chunk, and thus the same shard. In this situation, a small set of shards may receive the majority of requests and the system would not scale very well.

Slide 14

Slide 14 text

Sharding: range* schema in VoltDB & Tarantool

Slide 15

Slide 15 text

Sharding: metadata & routing Метаданные шардинга: определения таблиц и таблица роутинга ● На выделенных узлах - e.g. mongodb config server ● Metadata is data - Couchbase, CockroachDB, Tarantool От решения задачи хранения метаданными зависит архитектура роутинга: ● на клиенте ● на выделенных узлах ● на storage узлах

Slide 16

Slide 16 text

Sharding: MongoDB example

Slide 17

Slide 17 text

Sharding: Couchbase example

Slide 18

Slide 18 text

Sharding: cluster topology in Couchbase

Slide 19

Slide 19 text

Summary: design choices Couchbase MongoDB CockroachDB VoltDB | Tarantool Shard function Hash + vbuckets Hash, range in 2.x, vbucket Range Range* + vbucket Metadata Metadata is data Config server Metadata is data* Metadata is data Routing Client and server Router Server Server Non-sharded tables No Yes No Yes Secondary keys Global and local Local Global only Local, Materialized views*

Slide 20

Slide 20 text

Summary: hash vs. range based sharding Hash Range Write запросы/time series Linear scaling Hotspots Primary key read Linear scaling Linear scaling Partial key read Map/reduce Linear scaling Range read Map/reduce Linear scaling Chunk splits and migration Easy Difficult Metadata size Small Large

Slide 21

Slide 21 text

Summary: which database to choose?

Slide 22

Slide 22 text

Links https://dzone.com/articles/principles-of-sharding-for-relational-databases http://www.couchbase.com/binaries/content/assets/website/docs/whitepapers/couchbase-server-under-the-hood.pdf http://www.oracle.com/technetwork/database/availability/oraclesharding-whitepaper-3675509.pdf https://github.com/VoltDB/voltdb/blob/master/src/frontend/org/voltdb/Buckets.java#L30 https://docs.voltdb.com/tutorial/Part3.php https://docs.voltdb.com/UsingVoltDB/DesignPartition.php https://www.cockroachlabs.com/blog/automated-rebalance-and-repair/ https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mapping-table-data-to-key-value-storage/ https://developer.couchbase.com/documentation/server/3.x/developer/dev-guide-3.0/buckets.html https://developer.couchbase.com/documentation/server/3.x/admin/Concepts/concept-vBucket.html https://developer.couchbase.com/documentation/server/4.1/concepts/buckets-vbuckets.html https://developer.couchbase.com/documentation/server/3.x/admin/Tasks/tasks-rebalance.html https://docs.mongodb.com/manual/sharding