$30 off During Our Annual Pro Sale. View Details »

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

https://satonaoki.wordpress.com/2019/09/30/dbts2019-azure-cosmos-db-deep-dive/

SATO Naoki (Neo)

September 27, 2019
Tweet

More Decks by SATO Naoki (Neo)

Other Decks in Technology

Transcript

  1. Azure Cosmos DB Deep Dive
    ~ Partitioning, Global Distribution and Indexing ~
    SATO Naoki (Neo) (@satonaoki)
    Azure Technologist, Microsoft

    View Slide

  2. Agenda
    Overview
    Partitioning Strategies
    Global Distribution
    Indexing

    View Slide

  3. Azure Cosmos DB
    Overview

    View Slide

  4. View Slide

  5. View Slide

  6. Partitioning Strategies

    View Slide

  7. Overview of partitioning

    View Slide

  8. Overview of partitioning
    +
    container
    15,000 RUs
    physical
    partition 1
    7,500 RUs
    physical
    partition 2
    7,500 RUs
    Client application
    (write)
    Another client
    application
    (read)

    View Slide

  9. Overview of partitioning
    Client application
    (write)
    Another client
    application
    (read)
    Application writes data and
    provides a partition key value
    with every item
    +
    container
    15,000 RUs
    physical
    partition 1
    7,500 RUs
    physical
    partition 2
    7,500 RUs

    View Slide

  10. Overview of partitioning
    Client application
    (write)
    Another client
    application
    (read)
    Cosmos DB uses partition
    key value to route data to a
    partition
    +
    container
    15,000 RUs
    physical
    partition 1
    7,500 RUs
    physical
    partition 2
    7,500 RUs

    View Slide

  11. Overview of partitioning
    +
    Client application
    (write)
    Another client
    application
    (read)
    Every partition can store up
    to 50GB of data and serve
    up to 10,000 RU/s
    container
    15,000 RUs
    physical
    partition 1
    7,500 RUs
    physical
    partition 2
    7,500 RUs

    View Slide

  12. Overview of partitioning
    +
    Client application
    (write)
    Another client
    application
    (read)
    The total throughput for the
    container will be divided evenly
    across all partitions
    container
    15,000 RUs
    physical
    partition 1
    7,500 RUs
    physical
    partition 2
    7,500 RUs

    View Slide

  13. Overview of partitioning
    container
    15,000 RUs
    physical
    partition 1
    5,000 RUs
    physical
    partition 2
    5,000 RUs
    Client application
    (write)
    Another client
    application
    (read)
    If more data or throughput is
    needed, Cosmos DB will add a new
    partition automatically
    physical
    partition 3
    5,000 RUs

    View Slide

  14. Overview of partitioning
    container
    15,000 RUs
    physical
    partition 1
    5,000 RUs
    physical
    partition 2
    5,000 RUs
    Client application
    (write)
    Another client
    application
    (read)
    The data will be redistributed
    as a result
    physical
    partition 3
    5,000 RUs

    View Slide

  15. Overview of partitioning
    container
    15,000 RUs
    physical
    partition 1
    5,000 RUs
    physical
    partition 2
    5,000 RUs
    Client application
    (write)
    Another client
    application
    (read)
    And the total throughput
    capacity will be divided evenly
    between all partitions
    physical
    partition 3
    5,000 RUs

    View Slide

  16. Overview of partitioning
    container
    15,000 RUs
    physical
    partition 1
    5,000 RUs
    physical
    partition 2
    5,000 RUs
    Client application
    (write)
    Another client
    application
    (read)
    To read data efficiently, the app
    must provide the partition key of
    the documents it is requesting
    physical
    partition 3
    5,000 RUs

    View Slide

  17. How is data distributed?

    View Slide

  18. How is data distributed?
    {#}
    Range of partition
    addresses
    Hashing
    algorithm
    Physical partitions
    Data with
    partition keys

    View Slide

  19. How is data distributed?
    {#}
    Range of partition
    addresses
    Hashing
    algorithm
    Physical partitions
    Data with
    partition keys
    Whenever a document is
    inserted, the partition key
    value will be checked and
    assigned to a physical
    partition
    pk = 1

    View Slide

  20. How is data distributed?
    {#}
    Range of partition
    addresses
    Hashing
    algorithm
    Physical partitions
    Data with
    partition keys
    The item will be assigned to a
    partition based on its
    partitioning key.
    pk = 1

    View Slide

  21. How is data distributed?
    {#}
    Range of partition
    addresses
    Hashing
    algorithm
    Physical partitions
    All partition key values will
    be distributed amongst the
    physical partitions
    Data with
    partition keys

    View Slide

  22. How is data distributed?
    {#}
    Range of partition
    addresses
    Hashing
    algorithm
    Physical partitions
    However, items with the
    exact same partition key
    value will be co-located
    pk = 1
    pk = 1

    View Slide

  23. How are partitions managed?

    View Slide

  24. First scenario: Splitting partitions

    View Slide

  25. Partitioning dynamics
    Sri
    Tim
    Client application
    (write)
    Thomas
    Scenario 1

    View Slide

  26. Partitioning dynamics
    Sri
    Tim
    Client application
    (write)
    Thomas
    Scenario 1
    All partitions are almost
    full of data

    View Slide

  27. Partitioning dynamics
    Sri
    Tim
    Client application
    (write)
    Thomas
    Scenario 1
    In order to insert this
    document, we need to
    increase the total capacity

    View Slide

  28. Partitioning dynamics
    Sri
    Tim
    Client application
    (write)
    Thomas
    Scenario 1
    We have added a new
    empty partition for the new
    document

    View Slide

  29. Partitioning dynamics
    Sri
    Tim
    Client application
    (write)
    Thomas
    Scenario 1
    And now we will take the
    largest partition and re-balance
    it with the new one

    View Slide

  30. Partitioning dynamics
    Sri
    Tim
    Client application
    (write)
    Thomas
    Scenario 1
    Now that it's re-balanced, we
    can keep inserting new data

    View Slide

  31. Second scenario: Adding more throughput

    View Slide

  32. Cosmos DB Data Explorer

    View Slide

  33. All scale settings can
    be modified using the
    Data Explorer

    View Slide

  34. All scale settings can
    be modified using the
    Data Explorer
    They can also be modified
    programmatically via the SDK
    or Azure CLI

    View Slide

  35. Throughput has a
    lower and upper limit

    View Slide

  36. Throughput has a
    lower and upper limit
    Lower limit is determined by
    the current number of
    physical partitions

    View Slide

  37. Throughput has a
    lower and upper limit
    Lower limit is determined by
    the current number of
    physical partitions
    Upper limit adds new
    partitions

    View Slide

  38. When the limit is set beyond the
    current capacity, more physical
    partitions will be added
    This process can take a few
    to several minutes

    View Slide

  39. Best practices

    View Slide

  40. Best practices

    View Slide

  41. Best practices

    View Slide

  42. Best practices

    View Slide

  43. Best practices

    View Slide

  44. Best practices

    View Slide

  45. Best practices

    View Slide

  46. To do this, go to the Metrics
    blade in the Azure Portal

    View Slide

  47. Then select the Storage tab
    and select your desired
    container

    View Slide

  48. An efficient partitioning strategy
    has a close to even
    distribution

    View Slide

  49. An efficient partitioning strategy
    has a close to even
    distribution
    An inefficient partitioning
    strategy is the main source
    of cost and performance
    challenges

    View Slide

  50. An efficient partitioning strategy
    has a close to even
    distribution
    An inefficient partitioning
    strategy is the main source
    of cost and performance
    challenges
    A random partition key can
    provide an even data
    distribution

    View Slide

  51. Best practices

    View Slide

  52. Best practices

    View Slide

  53. Best practices

    View Slide

  54. How to deal with multi-tenancy?

    View Slide

  55. Database Account
    (per tenant)
    Container w/
    Dedicated
    Throughput
    (per tenant)
    Container w/
    Shared Throughput
    (per tenant)
    Partition Key
    (per tenant)
    Isolation Knobs
    Independent geo-replication
    knobs
    Multiple throughput knobs
    (dedicated throughput –
    eliminating noisy neighbors)
    Independent throughput knobs
    (dedicated throughput –
    eliminating noisy neighbors)
    Group tenants within database
    account(s) based on regional needs
    Share throughput across tenants
    grouped by database
    (great for lowering cost on “spiky”
    tenants)
    Easy management of tenants
    (drop container when tenant leaves)
    Mitigate noisy-neighbor blast radius
    (group tenants by database)
    Share throughput across tenants
    grouped by container
    (great for lowering cost on “spiky”
    tenants)
    Enables easy queries across tenants
    (containers act as boundary for queries)
    Mitigate noisy-neighbor blast radius
    (group tenants by container)
    Throughput
    requirements
    >400 RUs per Tenant
    (> $24 per tenant)
    >400 RUs per Tenant
    (> $24 per tenant)
    >100 RUs per Tenant
    (> $6 per tenant)
    >0 RUs per Tenant
    (> $0 per tenant)
    T-Shirt Size
    Large
    Example: Premium offer for
    B2B apps
    Large
    Example: Premium offer for B2B
    apps
    Medium
    Example: Standard offer for B2B apps
    Small
    Example: B2C apps

    View Slide

  56. View Slide

  57. Global Distribution

    View Slide

  58. Consistency Latency Availability

    View Slide

  59. A
    Atomicity
    C
    Consistency
    I
    Isolation
    D
    Durability

    View Slide

  60. View Slide

  61. View Slide

  62. Master Replica

    View Slide

  63. Master Replica

    View Slide

  64. In the case of network Partitioning in a distributed
    computer system, one has to choose between
    Availability and Consistency, but Else, even when
    the system is running normally in the absence of
    partitions, one has to choose between Latency and
    Consistency.

    View Slide

  65. Master Replica

    View Slide

  66. Master Replica

    View Slide

  67. Read Latency

    View Slide

  68. View Slide

  69. Demo
    Read Latency with single region, vs multi-region

    View Slide

  70. View Slide

  71. Write Latency

    View Slide

  72. Region A
    Region B
    Region C
    Azure
    Traffic
    Manager
    Master
    (read/write)
    Master
    (read/write)
    Master
    (read/write)
    Master
    (read/write)
    Replica
    (read)
    Replica
    (read)

    View Slide

  73. Demo
    Write latency for single-write vs. multi-write

    View Slide

  74. View Slide

  75. Consistency

    View Slide

  76. Strong Bounded-staleness Session Consistent prefix Eventual

    View Slide

  77. View Slide

  78. Consistency
    Level
    Quorum Reads Quorum Writes
    Strong Local Minority (2 RU) Global Majority (1 RU)
    Bounded
    Staleness
    Local Minority (2 RU) Local Majority (1 RU)
    Session Single replica using
    session token(1 RU)
    Local Majority (1 RU)
    Consistent Prefix Single replica (1 RU) Local Majority (1 RU)
    Eventual Single replica (1 RU) Local Majority (1 RU)
    forwarder
    follower
    follower

    View Slide

  79. Demo
    Consistency vs. Latency
    Consistency vs. Throughput

    View Slide

  80. View Slide

  81. View Slide

  82. Availability

    View Slide

  83. Internet
    Device
    Traffic Manager
    Mobile
    Browser
    West US 2
    Cosmos DB
    Application
    Gateway
    Web Tier
    Middle Tier
    Load
    Balancer
    North
    Europe
    Cosmos DB
    Application
    Gateway
    Web Tier
    Middle Tier
    Load
    Balancer
    Southeast
    Asia
    Cosmos DB
    Application
    Gateway
    Web Tier
    Middle Tier
    Load
    Balancer

    View Slide

  84. Time
    Lost Data Downtime
    RPO Disaster RTO

    View Slide

  85. Time
    Lost Data Downtime
    RPO Disaster RTO
    Region(s) Mode Consistency RPO RTO
    1 Any Any < 240 minutes < 1 week
    >1 Single Master Session, Consistent Prefix, Eventual < 15 minutes < 15 minutes
    >1 Single Master Bounded Staleness K & T* < 15 minutes
    >1 Single Master Strong 0 < 15 minutes
    >1 Multi Master Session, Consistent Prefix, Eventual < 15 minutes 0
    >1 Multi Master Bounded Staleness K & T* 0
    >1 Multi Master Strong N/A < 15 minutes
    Partition
    Yes
    Availability Consistency
    No
    Latency Consistency
    *Number of "K" updates of an item or "T" time. In >1 regions, K=100,000 updates or T=5 minutes.

    View Slide

  86. Indexing

    View Slide

  87. Azure Cosmos DB’s schema-less service automatically indexes all
    your data, regardless of the data model, to delivery blazing fast
    queries.
    Item Color
    Microwave
    safe
    Liquid
    capacity
    CPU Memory Storage
    Geek
    mug
    Graphite Yes 16ox ??? ??? ???
    Coffee
    Bean
    mug
    Tan No 12oz ??? ??? ???
    Surface
    book
    Gray ??? ??? 3.4 GHz
    Intel
    Skylake
    Core i7-
    6600U
    16GB 1 TB SSD
    • Automatic index management
    • Synchronous auto-indexing
    • No schemas or secondary indices needed
    • Works across every data model
    GEEK

    View Slide

  88. Custom Indexing Policies
    Though all Azure Cosmos DB data is indexed by default,
    you can specify a custom indexing policy for your
    collections. Custom indexing policies allow you to design
    and customize the shape of your index while maintaining
    schema flexibility.
    • Define trade-offs between storage, write and query
    performance, and query consistency
    • Include or exclude documents and paths to and from the
    index
    • Configure various index types
    {
    "automatic": true,
    "indexingMode": "Consistent",
    "includedPaths": [{
    "path": "/*",
    "indexes": [{
    "kind": “Range",
    "dataType": "String",
    "precision": -1
    }, {
    "kind": "Range",
    "dataType": "Number",
    "precision": -1
    }, {
    "kind": "Spatial",
    "dataType": "Point"
    }]
    }],
    "excludedPaths": [{
    "path": "/nonIndexedContent/*"
    }]
    }

    View Slide

  89. {
    "locations": [
    {
    "country": "Germany",
    "city": "Berlin"
    },
    {
    "country": "France",
    "city": "Paris"
    }
    ],
    "headquarter": "Belgium",
    "exports": [
    { "city": "Moscow" },
    { "city": "Athens" }
    ]
    }
    locations headquarter exports
    0
    country city
    Germany Berlin
    1
    country city
    France Paris
    0 1
    city
    Athens
    city
    Moscow
    Belgium

    View Slide

  90. {
    "locations": [
    {
    "country": "Germany",
    "city": "Bonn",
    "revenue": 200
    }
    ],
    "headquarter": "Italy",
    "exports": [
    {
    "city": "Berlin",
    "dealers": [
    { "name": "Hans" }
    ]
    },
    { "city": "Athens" }
    ]
    }
    locations headquarter exports
    0
    country city
    Germany Bonn
    revenue
    200
    0 1
    city
    city
    Berlin
    Italy
    dealers
    0
    name
    Hans

    View Slide

  91. Athens
    locations headquarter exports
    0
    country city
    Germany Bonn
    revenue
    200
    0 1
    city
    city
    Berlin
    Italy
    dealers
    0
    name
    Hans
    locations headquarter exports
    0
    country city
    Germany Berlin
    1
    country city
    France Paris
    0 1
    city
    Athens
    city
    Moscow
    Belgium

    View Slide

  92. locations headquarter exports
    0
    country city
    Germany
    Berlin
    revenue
    200
    0 1
    city
    Athens
    city
    Berlin
    Italy
    dealers
    0
    name
    Hans
    Bonn
    1
    country city
    France Paris
    Belgium
    Moscow

    View Slide

  93. {
    "indexingMode": "none",
    "automatic": false,
    "includedPaths": [],
    "excludedPaths": []
    }
    {
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
    {
    "path": "/age/?",
    "indexes": [
    {
    "kind": "Range",
    "dataType": "Number",
    "precision": -1
    },
    ]
    },
    {
    "path": "/gender/?",
    "indexes": [
    {
    "kind": "Range",
    "dataType": "String",
    "precision": -1
    },
    ]
    }
    ],
    "excludedPaths": [
    {
    "path": "/*"
    }
    ]
    }

    View Slide

  94. On-the-fly Index Changes
    In Azure Cosmos DB, you can make changes to the
    indexing policy of a collection on the fly. Changes can
    affect the shape of the index, including paths,
    precision values, and its consistency model.
    A change in indexing policy effectively requires a
    transformation of the old index into a new index.

    View Slide

  95. Metrics Analysis
    The SQL APIs provide information about performance metrics, such as the
    index storage used and the throughput cost (request units) for every
    operation. You can use this information to compare various indexing
    policies, and for performance tuning.
    When running a HEAD or GET request against a collection resource, the
    x-ms-request-quota and the x-ms-request-usage headers provide the
    storage quota and usage of the collection.
    You can use this information to compare various indexing policies,
    and for performance tuning.

    View Slide

  96. Understand query patterns – which properties are being
    used?
    Understand impact on write cost – index update RU cost
    scales with # properties

    View Slide

  97. http://cosmosdb.com/
    https://azure.microsoft.com/try/cosmosdb/
    https://docs.microsoft.com/learn/paths/work-with-nosql-data-in-
    azure-cosmos-db/
    Resources

    View Slide

  98. © 2018 Microsoft Corporation. All rights reserved.
    本情報の内容(添付文書、リンク先などを含む)は、作成日時点でのものであり、予告なく変更される場合があります。
    © 2019 Microsoft Corporation. All rights reserved.

    View Slide