CodeFest 2018. Константин Осипов (mail.ru) — Подходы к реализации шардинга в современных [No]SQL системах

Sharding in modern [No]SQL databases Konstantin Osipov, CTO, Tarantool Novosibirsk,
Russia, 01/04/2018

whoami • MySQL core team member 2003-2010 • Tarantool core
team member 2010-now • http://t.me/tarantoolru • http://tarantool.org

The plan NoSQL системы всё больше смещаются в сторону NewSQL.
Давайте посмотрим, как на это влияют ключевые технические решения в реализации шардинга, принятые на старте проекта. • sharding introduction • vBucket or not; hash or range! • routing • secondary keys are hard

Sharding: introduction Sharding - горизонтальное разбиение данных СУБД по нескольким
машинам. Позволяет линейно масштабировать ёмкость и (возможно) пропускную способность СУБД. Требует решения 3 задач: • выбрать принцип разбиения данных • обеспечить хранение метаданных • организовать роутинг запросов к данным

Sharding: shard function Три основных способа задания shard function: •
Hash • Range • Range* (Stonebraker) Shard-функция также влияет на алгоритм балансировки: может перемещаться ключ, диапазон (range) или использоваться концепция virtual buckets.

Sharding: consistent hash

Sharding: Guava

Sharding: hash + virtual buckets in Couchbase

Sharding: ranges and chunks in MongoDB

Sharding: replica sets in MongoDB

Sharding: chunk splits and migrations in MongoDB

Sharding: ranges in CockroachDB

mongodb For queries that don’t include the shard key, mongos
must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations. However, range based partitioning can result in an uneven distribution of data, which may negate some of the benefits of sharding. For example, if the shard key is a linearly increasing field, such as time, then all requests for a given time range will map to the same chunk, and thus the same shard. In this situation, a small set of shards may receive the majority of requests and the system would not scale very well.

Sharding: range* schema in VoltDB & Tarantool

Sharding: metadata & routing Метаданные шардинга: определения таблиц и таблица
роутинга • На выделенных узлах - e.g. mongodb config server • Metadata is data - Couchbase, CockroachDB, Tarantool От решения задачи хранения метаданными зависит архитектура роутинга: • на клиенте • на выделенных узлах • на storage узлах

Sharding: MongoDB example

Sharding: Couchbase example

Sharding: cluster topology in Couchbase

Summary: design choices Couchbase MongoDB CockroachDB VoltDB | Tarantool Shard
function Hash + vbuckets Hash, range in 2.x, vbucket Range Range* + vbucket Metadata Metadata is data Config server Metadata is data* Metadata is data Routing Client and server Router Server Server Non-sharded tables No Yes No Yes Secondary keys Global and local Local Global only Local, Materialized views*

Summary: hash vs. range based sharding Hash Range Write запросы/time
series Linear scaling Hotspots Primary key read Linear scaling Linear scaling Partial key read Map/reduce Linear scaling Range read Map/reduce Linear scaling Chunk splits and migration Easy Difficult Metadata size Small Large

Summary: which database to choose?

Links https://dzone.com/articles/principles-of-sharding-for-relational-databases http://www.couchbase.com/binaries/content/assets/website/docs/whitepapers/couchbase-server-under-the-hood.pdf http://www.oracle.com/technetwork/database/availability/oraclesharding-whitepaper-3675509.pdf https://github.com/VoltDB/voltdb/blob/master/src/frontend/org/voltdb/Buckets.java#L30 https://docs.voltdb.com/tutorial/Part3.php https://docs.voltdb.com/UsingVoltDB/DesignPartition.php https://www.cockroachlabs.com/blog/automated-rebalance-and-repair/ https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mapping-table-data-to-key-value-storage/ https://developer.couchbase.com/documentation/server/3.x/developer/dev-guide-3.0/buckets.html
https://developer.couchbase.com/documentation/server/3.x/admin/Concepts/concept-vBucket.html https://developer.couchbase.com/documentation/server/4.1/concepts/buckets-vbuckets.html https://developer.couchbase.com/documentation/server/3.x/admin/Tasks/tasks-rebalance.html https://docs.mongodb.com/manual/sharding

CodeFest 2018. Константин Осипов (mail.ru) — По...

CodeFest 2018. Константин Осипов (mail.ru) — Подходы к реализации шардинга в современных [No]SQL системах

CodeFest

More Decks by CodeFest

Other Decks in Programming

Featured

Transcript

Sharding in modern [No]SQL databases Konstantin Osipov, CTO, Tarantool Novosibirsk,

whoami • MySQL core team member 2003-2010 • Tarantool core

The plan NoSQL системы всё больше смещаются в сторону NewSQL.

Sharding: introduction Sharding - горизонтальное разбиение данных СУБД по нескольким

Sharding: shard function Три основных способа задания shard function: •