Slide 1

Slide 1 text

Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~ SATO Naoki (Neo) (@satonaoki) Azure Technologist, Microsoft

Slide 2

Slide 2 text

Agenda Overview Partitioning Strategies Global Distribution Indexing

Slide 3

Slide 3 text

Azure Cosmos DB Overview

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Partitioning Strategies

Slide 7

Slide 7 text

Overview of partitioning

Slide 8

Slide 8 text

Overview of partitioning + container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs Client application (write) Another client application (read)

Slide 9

Slide 9 text

Overview of partitioning Client application (write) Another client application (read) Application writes data and provides a partition key value with every item + container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs

Slide 10

Slide 10 text

Overview of partitioning Client application (write) Another client application (read) Cosmos DB uses partition key value to route data to a partition + container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs

Slide 11

Slide 11 text

Overview of partitioning + Client application (write) Another client application (read) Every partition can store up to 50GB of data and serve up to 10,000 RU/s container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs

Slide 12

Slide 12 text

Overview of partitioning + Client application (write) Another client application (read) The total throughput for the container will be divided evenly across all partitions container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs

Slide 13

Slide 13 text

Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) If more data or throughput is needed, Cosmos DB will add a new partition automatically physical partition 3 5,000 RUs

Slide 14

Slide 14 text

Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) The data will be redistributed as a result physical partition 3 5,000 RUs

Slide 15

Slide 15 text

Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) And the total throughput capacity will be divided evenly between all partitions physical partition 3 5,000 RUs

Slide 16

Slide 16 text

Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) To read data efficiently, the app must provide the partition key of the documents it is requesting physical partition 3 5,000 RUs

Slide 17

Slide 17 text

How is data distributed?

Slide 18

Slide 18 text

How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions Data with partition keys

Slide 19

Slide 19 text

How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions Data with partition keys Whenever a document is inserted, the partition key value will be checked and assigned to a physical partition pk = 1

Slide 20

Slide 20 text

How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions Data with partition keys The item will be assigned to a partition based on its partitioning key. pk = 1

Slide 21

Slide 21 text

How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions All partition key values will be distributed amongst the physical partitions Data with partition keys

Slide 22

Slide 22 text

How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions However, items with the exact same partition key value will be co-located pk = 1 pk = 1

Slide 23

Slide 23 text

How are partitions managed?

Slide 24

Slide 24 text

First scenario: Splitting partitions

Slide 25

Slide 25 text

Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1

Slide 26

Slide 26 text

Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 All partitions are almost full of data

Slide 27

Slide 27 text

Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 In order to insert this document, we need to increase the total capacity

Slide 28

Slide 28 text

Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 We have added a new empty partition for the new document

Slide 29

Slide 29 text

Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 And now we will take the largest partition and re-balance it with the new one

Slide 30

Slide 30 text

Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 Now that it's re-balanced, we can keep inserting new data

Slide 31

Slide 31 text

Second scenario: Adding more throughput

Slide 32

Slide 32 text

Cosmos DB Data Explorer

Slide 33

Slide 33 text

All scale settings can be modified using the Data Explorer

Slide 34

Slide 34 text

All scale settings can be modified using the Data Explorer They can also be modified programmatically via the SDK or Azure CLI

Slide 35

Slide 35 text

Throughput has a lower and upper limit

Slide 36

Slide 36 text

Throughput has a lower and upper limit Lower limit is determined by the current number of physical partitions

Slide 37

Slide 37 text

Throughput has a lower and upper limit Lower limit is determined by the current number of physical partitions Upper limit adds new partitions

Slide 38

Slide 38 text

When the limit is set beyond the current capacity, more physical partitions will be added This process can take a few to several minutes

Slide 39

Slide 39 text

Best practices

Slide 40

Slide 40 text

Best practices

Slide 41

Slide 41 text

Best practices

Slide 42

Slide 42 text

Best practices

Slide 43

Slide 43 text

Best practices

Slide 44

Slide 44 text

Best practices

Slide 45

Slide 45 text

Best practices

Slide 46

Slide 46 text

To do this, go to the Metrics blade in the Azure Portal

Slide 47

Slide 47 text

Then select the Storage tab and select your desired container

Slide 48

Slide 48 text

An efficient partitioning strategy has a close to even distribution

Slide 49

Slide 49 text

An efficient partitioning strategy has a close to even distribution An inefficient partitioning strategy is the main source of cost and performance challenges

Slide 50

Slide 50 text

An efficient partitioning strategy has a close to even distribution An inefficient partitioning strategy is the main source of cost and performance challenges A random partition key can provide an even data distribution

Slide 51

Slide 51 text

Best practices

Slide 52

Slide 52 text

Best practices

Slide 53

Slide 53 text

Best practices

Slide 54

Slide 54 text

How to deal with multi-tenancy?

Slide 55

Slide 55 text

Database Account (per tenant) Container w/ Dedicated Throughput (per tenant) Container w/ Shared Throughput (per tenant) Partition Key (per tenant) Isolation Knobs Independent geo-replication knobs Multiple throughput knobs (dedicated throughput – eliminating noisy neighbors) Independent throughput knobs (dedicated throughput – eliminating noisy neighbors) Group tenants within database account(s) based on regional needs Share throughput across tenants grouped by database (great for lowering cost on “spiky” tenants) Easy management of tenants (drop container when tenant leaves) Mitigate noisy-neighbor blast radius (group tenants by database) Share throughput across tenants grouped by container (great for lowering cost on “spiky” tenants) Enables easy queries across tenants (containers act as boundary for queries) Mitigate noisy-neighbor blast radius (group tenants by container) Throughput requirements >400 RUs per Tenant (> $24 per tenant) >400 RUs per Tenant (> $24 per tenant) >100 RUs per Tenant (> $6 per tenant) >0 RUs per Tenant (> $0 per tenant) T-Shirt Size Large Example: Premium offer for B2B apps Large Example: Premium offer for B2B apps Medium Example: Standard offer for B2B apps Small Example: B2C apps

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

Global Distribution

Slide 58

Slide 58 text

Consistency Latency Availability

Slide 59

Slide 59 text

A Atomicity C Consistency I Isolation D Durability

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

Master Replica

Slide 63

Slide 63 text

Master Replica

Slide 64

Slide 64 text

In the case of network Partitioning in a distributed computer system, one has to choose between Availability and Consistency, but Else, even when the system is running normally in the absence of partitions, one has to choose between Latency and Consistency.

Slide 65

Slide 65 text

Master Replica

Slide 66

Slide 66 text

Master Replica

Slide 67

Slide 67 text

Read Latency

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

Demo Read Latency with single region, vs multi-region

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

Write Latency

Slide 72

Slide 72 text

Region A Region B Region C Azure Traffic Manager Master (read/write) Master (read/write) Master (read/write) Master (read/write) Replica (read) Replica (read)

Slide 73

Slide 73 text

Demo Write latency for single-write vs. multi-write

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

Consistency

Slide 76

Slide 76 text

Strong Bounded-staleness Session Consistent prefix Eventual

Slide 77

Slide 77 text

No content

Slide 78

Slide 78 text

Consistency Level Quorum Reads Quorum Writes Strong Local Minority (2 RU) Global Majority (1 RU) Bounded Staleness Local Minority (2 RU) Local Majority (1 RU) Session Single replica using session token(1 RU) Local Majority (1 RU) Consistent Prefix Single replica (1 RU) Local Majority (1 RU) Eventual Single replica (1 RU) Local Majority (1 RU) forwarder follower follower

Slide 79

Slide 79 text

Demo Consistency vs. Latency Consistency vs. Throughput

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

No content

Slide 82

Slide 82 text

Availability

Slide 83

Slide 83 text

Internet Device Traffic Manager Mobile Browser West US 2 Cosmos DB Application Gateway Web Tier Middle Tier Load Balancer North Europe Cosmos DB Application Gateway Web Tier Middle Tier Load Balancer Southeast Asia Cosmos DB Application Gateway Web Tier Middle Tier Load Balancer

Slide 84

Slide 84 text

Time Lost Data Downtime RPO Disaster RTO

Slide 85

Slide 85 text

Time Lost Data Downtime RPO Disaster RTO Region(s) Mode Consistency RPO RTO 1 Any Any < 240 minutes < 1 week >1 Single Master Session, Consistent Prefix, Eventual < 15 minutes < 15 minutes >1 Single Master Bounded Staleness K & T* < 15 minutes >1 Single Master Strong 0 < 15 minutes >1 Multi Master Session, Consistent Prefix, Eventual < 15 minutes 0 >1 Multi Master Bounded Staleness K & T* 0 >1 Multi Master Strong N/A < 15 minutes Partition Yes Availability Consistency No Latency Consistency *Number of "K" updates of an item or "T" time. In >1 regions, K=100,000 updates or T=5 minutes.

Slide 86

Slide 86 text

Indexing

Slide 87

Slide 87 text

Azure Cosmos DB’s schema-less service automatically indexes all your data, regardless of the data model, to delivery blazing fast queries. Item Color Microwave safe Liquid capacity CPU Memory Storage Geek mug Graphite Yes 16ox ??? ??? ??? Coffee Bean mug Tan No 12oz ??? ??? ??? Surface book Gray ??? ??? 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD • Automatic index management • Synchronous auto-indexing • No schemas or secondary indices needed • Works across every data model GEEK

Slide 88

Slide 88 text

Custom Indexing Policies Though all Azure Cosmos DB data is indexed by default, you can specify a custom indexing policy for your collections. Custom indexing policies allow you to design and customize the shape of your index while maintaining schema flexibility. • Define trade-offs between storage, write and query performance, and query consistency • Include or exclude documents and paths to and from the index • Configure various index types { "automatic": true, "indexingMode": "Consistent", "includedPaths": [{ "path": "/*", "indexes": [{ "kind": “Range", "dataType": "String", "precision": -1 }, { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Spatial", "dataType": "Point" }] }], "excludedPaths": [{ "path": "/nonIndexedContent/*" }] }

Slide 89

Slide 89 text

{ "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium

Slide 90

Slide 90 text

{ "locations": [ { "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin", "dealers": [ { "name": "Hans" } ] }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 city city Berlin Italy dealers 0 name Hans

Slide 91

Slide 91 text

Athens locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 city city Berlin Italy dealers 0 name Hans locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium

Slide 92

Slide 92 text

locations headquarter exports 0 country city Germany Berlin revenue 200 0 1 city Athens city Berlin Italy dealers 0 name Hans Bonn 1 country city France Paris Belgium Moscow

Slide 93

Slide 93 text

{ "indexingMode": "none", "automatic": false, "includedPaths": [], "excludedPaths": [] } { "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/age/?", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, ] }, { "path": "/gender/?", "indexes": [ { "kind": "Range", "dataType": "String", "precision": -1 }, ] } ], "excludedPaths": [ { "path": "/*" } ] }

Slide 94

Slide 94 text

On-the-fly Index Changes In Azure Cosmos DB, you can make changes to the indexing policy of a collection on the fly. Changes can affect the shape of the index, including paths, precision values, and its consistency model. A change in indexing policy effectively requires a transformation of the old index into a new index.

Slide 95

Slide 95 text

Metrics Analysis The SQL APIs provide information about performance metrics, such as the index storage used and the throughput cost (request units) for every operation. You can use this information to compare various indexing policies, and for performance tuning. When running a HEAD or GET request against a collection resource, the x-ms-request-quota and the x-ms-request-usage headers provide the storage quota and usage of the collection. You can use this information to compare various indexing policies, and for performance tuning.

Slide 96

Slide 96 text

Understand query patterns – which properties are being used? Understand impact on write cost – index update RU cost scales with # properties

Slide 97

Slide 97 text

http://cosmosdb.com/ https://azure.microsoft.com/try/cosmosdb/ https://docs.microsoft.com/learn/paths/work-with-nosql-data-in- azure-cosmos-db/ Resources

Slide 98

Slide 98 text

© 2018 Microsoft Corporation. All rights reserved. 本情報の内容(添付文書、リンク先などを含む)は、作成日時点でのものであり、予告なく変更される場合があります。 © 2019 Microsoft Corporation. All rights reserved.