Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modeling data and best practices for the Azure Cosmos DB.

Modeling data and best practices for the Azure Cosmos DB.

Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. In this session we are going to cover ,modeling of data using NOSQL cosmos database and how it's helpful for distributed application to maintain high availability ,scaling in multiple region and throughput.

Asif Waquar

July 17, 2019
Tweet

More Decks by Asif Waquar

Other Decks in Technology

Transcript

  1. Turnkey global distribution Elastic scale out of storage & throughput

    Comprehensive SLAs Guaranteed low latency at the 99th percentile Five well-defined consistency models Azure Cosmos DB A globally distributed, massively scalable, multi-model database service
  2. Turnkey global distribution Elastic scale out of storage & throughput

    Comprehensive SLAs Guaranteed low latency at the 99th percentile Five well-defined consistency models Azure Cosmos DB A globally distributed, massively scalable, multi-model database service Column-family Document Graph Key-value
  3. Column-family Document Graph Turnkey global distribution Elastic scale out of

    storage & throughput Comprehensive SLAs Guaranteed low latency at the 99th percentile Five well-defined consistency models TableAPI Key-value Cosmos DB’s API for MongoDB Azure Cosmos DB A globally distributed, massively scalable, multi-model database service
  4. Features • Multi-model data paradigm: key-value, document, graph, family of

    columns; • Low latency for 99% of queries: less than 10 ms for read operations and less than 15 ms for (indexed) write operations; • Designed for high throughput; • Ensures availability, consistency of data, delay at SLA level of 99.999%; • Configurable throughput; • Automatic replication (master-slave); • Automatic data indexing; • Configurable levels of consistency of data. Five different levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual);
  5. CONTAINERS Logical resources “surfaced” to APIs as tables, collections or

    graphs, which are made up of one or more physical partitions or servers. Containers Resource Partitions Collections Tables Graphs Tenants Follower Follower Leader Forwarder Replica Set RESOURCE PARTITIONS • Consistent, highly available, and resource-governed coordination primitives • Consist of replica sets, with each replica hosting an instance of the database engine To remote resource partition(s) Resource Hierarchy
  6. Account Database Database Database Database Database Container Database Database Item

    Account URI and Credentials ********.azure.com pass…
  7. Account Database Database Database Database Database Container Database Database Item

    Document Vertices/Edges Row Collection Graph Table Item Representations
  8. Account Database Database Database Database Database Container Database Database Item

    Conflict Stored procedure Trigger UDF Container-Level Resources
  9. Data Modelling: Relational vs. Document UserID Name Dob 1 John

    Smith 8/30/1964 StockID UserID Qty Symbol 1 1 100 MSFT 2 1 75 WMT Document { "id": 1, "name": "John Smith", "dob": "1964-30-08", "holdings": [ { "qty": 100, "symbol": "MSFT" }, { "qty": 75, "symbol": "WMT" } ] } User Table Holdings Table Relational Store Document Store Rows Documents Columns Properties Strongly-typed schemas Schema-free Highly normalized Typically denormalized
  10. Modelling challenges • How to de-normalize ? • How to

    normalize ? • To embed or reference ? • Can I apply joins ? • Should I put data types in same collection ,or different ?
  11. Modelling challenges: To embed or reference ? Document "id": 1,

    "name": "John Smith", "dob": "1964-30-08", "holdings": [ { "qty": 100, "symbol": "MSFT" }, { "qty": 75, "symbol": "WMT" } ] Document { "postid": "1", "title": "My blog post", "body": "Post content…", "comments": [ "comment #1", "comment #2", "comment #3", "comment #4“, : "comment #1598873", : Embed Reference Document { "postid": "1", "title": "My blog post", "body": "Post content…“ } Document Document { Document { } } { "postid": "1", "comment": "comment #3“ }
  12. When to embed ? o Data that is queried together,

    should live together. o Child data is dependent on parent. o 1:1 relationship eg. All customer have email, phone, nric number for 1:1 relationship. o Data doesn’t change that frequently eg. Email ,address don’t change too often. o Usually embedding provides better read performance but trade-off for write performance, So if we aren’t doing more write this approach will be good.
  13. When to reference ? o 1 : many (unbounded relationship)

    o many : many relationships o Data changes at different rates o What is referenced, is heavily referenced by many others o Typically provides better write performance o But may require more network calls for reads
  14. Why is choice of partition key so important? o Enables

    your data in Cosmos DB to scale o Large impact on performance of system What can go wrong? o Hot partitions o Choice forces many cross-partition queries for workload Partitioning
  15. Logical partition: Stores all data associated with the same partition

    key value Physical partition: Fixed amount of reserved SSD-backed storage + compute. Cosmos DB distributes logical partitions among a smaller number of physical partitions. From your perspective: define 1 partition key per container Partitioning
  16. Partition Key: User Id Logical Partitioning Abstraction Behind the Scenes:

    Physical Partition Sets hash(User Id) Psuedo-random distribution of data over range of possible hashed values Cosmos DB Container (e.g. Collection)
  17. hash(User Id) …. Melvin karen … Physical Partition 1 Physical

    Partition 2 Physical Partition n John Dharma Shireesh Nilesh Sukhi Bob Milton … Frugal # of Partitions based on actual storage and throughput needs (yielding scalability with low total cost of ownership) Range 1 Range 2 Range n Physical Partition Sets
  18. hash(User Id) …. Melvin Karen … Physical Partition 1 Physical

    Partition 2 Physical Partition n John Dharma Shireesh Nilesh Sukhi Bob Milton … What happens when partitions need to grow? Range 1 Range 2 Range n Physical Partition Sets
  19. hash(User Id) Partition X Dharma Shireesh Nilesh Sukhi Bob Milton

    … + Dharma Shireesh … Partition X1 Nilesh Sukhi … Partition X2 Partition Ranges can be dynamically sub-divided To seamlessly grow database as the application grows While sedulously maintaining high availability Range 1 Range 2 Range X1 Range X2 Range X Physical Partition Sets
  20. hash(User Id) Partition Ranges can be dynamically sub-divided To seamlessly

    grow database as the application grows While sedulously maintaining high availability Best of All: Partition management is completely taken care of by the system You don’t have to lift a finger… the database takes care of you. Partition X Dharma Shireesh Nilesh Sukhi Bob Milton … + Dharma Shireesh … Partition X1 Nilesh Sukhi … Partition X2 Range 1 Range 2 Range X1 Range X2 Physical Partition Sets
  21. How do you ensure consistent reads across replicas? - Define

    a consistency level Replication within aregion - Data moves extremely fast (typically, within1ms) between neighboring racks Global replication - Ittakeshundreds of milliseconds to move data across continents Strongerconsistency Higherlatency Loweravailability Weakerconsistency Lower latency Higher availability Replication and Consistency
  22. Consistency Level Guarantees Strong Linearizability (once operation is complete, it

    will be visible to all), No dirty reads Bounded Staleness Consistent Prefix. Reads lag behind writes by at most k prefixes or t interval (Dirty reads possible Bounded by time and updates.) Similar properties to strong consistency (except within staleness window), while preserving 99.99% availability and low latency. Session Consistent Prefix. Within a session: Predictable consistency for a session, high read throughput + low latency No dirty reads for writers (read your own writes),Dirty reads possible for other users Consistent Prefix Reads will never see out of order writes (no gaps). Eventual Potential for out of order reads. Lowest cost for reads of all consistency levels. Well-Defined Consistency Models
  23. Important Links https://azure.microsoft.com/en-us/pricing/calculator/?service=cosmos-db#cosmos-db7aed2059-b457-48cc- a0e9-6744ce81096b Pricing Calculator https://docs.microsoft.com/en-us/azure/cosmos-db/sql-query-getting-started Azure Cosmos Emulator

    https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator#controlling-the-emulator SQL API Query http://www.microsoft.com/en-us/download/details.aspx?id=46436 Data Migration Tool