Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CosmosDB: Jack Of All Trades, Master Of Many

Daron Yondem
November 27, 2018

CosmosDB: Jack Of All Trades, Master Of Many

This is the presentation deck I used at ESPC in Copenhagen.

Daron Yondem

November 27, 2018
Tweet

More Decks by Daron Yondem

Other Decks in Technology

Transcript

  1. Column-family Document Graph Turnkey global distribution Elastic scale out of

    storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Table API Key-value A globally distributed, massively scalable, multi-model database service Azure Cosmos DB MongoDB
  2. Azure Cosmos DB Multi Master Every region now writable Single-digit

    latency 99.999% availability Tunable consistency levels Flexible conflict resolution Unlimited endpoint scalability All Azure regions All data models All SDK’s Region A Region B Region C Azure Traffic Manager Master (read/write) Master (read/write) Master (read/write) Master (read/write) Replica (read) Replica (read)
  3. Flexible conflict management Last-Writer Wins Default mode User Defined Procedure

    Custom – Asynchronous Only available for SQL model
  4. Last Writer Wins Default Use numeric property resolve conflicts Can

    be user defined or _ts Available for all data models
  5. Custom – User Defined Procedure Register stored procedure Special signature

    Conflict dropped after Processing. If error then written to Conflict feed, handled manually.
  6. 99.99% SLA for Low latency reads + writes • Reads

    and writes served from local region • Guaranteed millisecond latency worldwide • Write optimized, latch-free database engine • Automatically indexed SSD storage Reads (1KB) Indexed writes (1KB) Read < 2 ms Writes < 6 ms Read < 10 ms Writes < 15 ms 99% 50% • Synchronous and automatic indexing at sustained ingestion rates • No schema or index management needed • No schema versioning needed • No schema migration needed
  7. What is RU? • Request Unit • Not all requests

    are equal. • A normalized quantity of request unit based on the amount of computation (CPU, memory, and IOPS) required to serve the request.
  8. How to calculate? Item Size Reads/secon d Writes/secon d Request

    units 1 KB 500 100 (500 * 1) + (100 * 5) = 1,000 RU/s 1 KB 500 500 (500 * 1) + (500 * 5) = 3,000 RU/s 4 KB 500 100 (500 * 1.3) + (100 * 7) = 1,350 RU/s 4 KB 500 500 (500 * 1.3) + (500 * 7) = 4,150 RU/s 64 KB 500 100 (500 * 10) + (100 * 48) = 9,800 RU/s 64 KB 500 500 (500 * 10) + (500 * 48) = 29,000 RU/s See: https://www.documentdb.com/capacityplanner
  9. For example • SELECT * FROM c • (2.87 RU)

    • SELECT * FROM c where Contains (c.Name, "Sample") • (2.45 RU)
  10. public static async Task<IEnumerable<T>> GetItemsAsync(Expression<Func<T, bool>> predicate) { double queryCost

    = 0; IDocumentQuery<T> query = client.CreateDocumentQuery<T>( UriFactory.CreateDocumentCollectionUri(DatabaseId, CollectionId), new FeedOptions { MaxItemCount = -1 }) .Where(predicate) .AsDocumentQuery(); List<T> results = new List<T>(); while (query.HasMoreResults) { var response = await query.ExecuteNextAsync<T>(); queryCost += response.RequestCharge; results.AddRange(response); } Debug.WriteLine(queryCost.ToString()); return results; }
  11. Request Unit Management Single Partition Container Partitioned Container Minimum Throughput

    400 RU/sec 1.000 RU/sec Maximum Throughput 10.000 RU/sec Unlimited Offer offer = client.CreateOfferQuery() .Where(r => r.ResourceLink == collection.SelfLink) .AsEnumerable().SingleOrDefault(); offer = new OfferV2(offer, 12000); client.ReplaceOfferAsync(offer); A partition key is required to scale your collection's throughput beyond 2,500 request units in the future
  12. Affecting RUs are; • Item size • Item property count

    (Indexing) • Data consistency (Strong or Bounded Staleness) • Indexed properties (lazy indexing can help) • Document indexing (Disable if you don’t need) • Query patterns (predicates, UDFs, data source size) • Script usage (SPs, triggers)
  13. Choose Your Consistency Level 01 Strong Bounded Staleness Session Consistent

    Prefix Eventual Clear Tradeoffs • Latency • Availability • Throughput Lower latency, higher availability, better read scalability.
  14. Bounded Staleness When choosing bounded staleness, the "staleness" can be

    configured in two ways: number of versions K of the item by which the reads lag behind the writes, and the time interval t 01 Strong Bounded Staleness Session Consistent Prefix Eventual Lower latency, higher availability, better read scalability.
  15. Consistent Prefix Consistent prefix guarantees that reads never see out

    of order writes. If writes were performed in the order A, B, C, then a client sees either A, A,B, or A,B,C, but never out of order like A,C or B,A,C. 01 Strong Bounded Staleness Session Consistent Prefix Eventual Lower latency, higher availability, better read scalability.
  16. Native Support for Multiple Data Models • Database engine operates

    on atom-record-sequence (ARS) based type system • All data models are efficiently translated to ARS • API and wire protocols are supported via extensible modules • Instance of a given data model can be materialized as trees • Graph, documents, key-value, column-family, … more to come KEY-VALUE COLUMN-FAMILY DOCUMENT GRAPH
  17. Auto Indexing • IndexingMode • Consistent (Collection Consistency applies) •

    Lazy (ingest now, query later) • None (EnableScanInQuery) • DataTypes • String, Number, Point, Polygon • Index Types • Hash (joins) • Range (<, >) • Spatial • Precision can be defined.
  18. Analyzing Query Performance • QueryMetrics • RetrievedDocumentCount • WriteOutputTime •

    DocumentLoadTime • IndexLookupTime • UserDefinedFunctionExecutionTime • SystemFunctionExecutionTime
  19. 4 Axis SLA Latency @ 99th percentile SLA Throughput SLA

    Consistency SLA Availability SLA 2 4 3 1 Cosmos DB: 99.99% HA within a single region 99.999% across regions 99.99 SLA throughput, latency, consistency all at the 99th percentile
  20. Security • Documents and backups are encrypted at rest •

    IP-based access controls • Role-based access controls • Automated online backups • Attack monitoring • Geo-fencing
  21. Disclaimer • Cosmos DB is not a SQL Database, no

    complex table joins. (you are doing it wrong) • Other NoSQL databases are good at doing one or two things really well but not native to Cloud.
  22. Summary Every region is writable!!! <10ms write latency 99.999% availability

    Flexible consistency levels Flexible conflict resolution aka.ms/cosmosdb-mm-learn aka.ms/cosmosdb-mm-samples aka.ms/trycosmosdb