Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Azure Cosmos DB

Azure Cosmos DB

This is a presentation I borrowed from Rimma Nehme and customized for a Cosmos DB webinar. Enjoy.

Daron Yondem

June 15, 2017

More Decks by Daron Yondem

Other Decks in Technology


  1. Lowest Cost Cosmos DB: Deeply Exploits Cloud Core Properties and

    Economies of Scale IaaS hosted managed offerings cannot beat this Millions of trans/sec Petabytes of data Scale-out Architecture Global Distribution from the Ground Up Fully-managed and Secure
  2. Azure Cosmos DB: Value to Customer Become more productive Save

    money Global Business Store Become more flexible Become more responsive Supplier Partner Become more innovative
  3. Globally-distribute data around the world Turn-key global distribution Automatically replicate

    all your data around the world – across more regions than Amazon and Google combined
  4. Global Distribution From The Ground-Up • Cosmos DB is a

    Foundational (Ring 0) Azure service – Available in all Azure regions by default, including sovereign/government clouds • Transparent and automatic multi-region replication – Associate any number of regions with your database account, at any time – Policy based geo-fencing • Multi-homing APIs – All endpoints are logical, by default – Apps don’t need to be redeployed during regional failover – Apps can also access physical endpoints if needed • Support for both manual and automatic failover • Designed for high availability – Allows for dynamically setting priorities to regions – Simulate regional disasters via API – Test the end-to-end availability for the entire app (beyond just the database) • Comprehensive SLAs – First and only to offer comprehensive SLA for latency, throughput, availability and consistency
  5.  Reads and writes served from local region  Guaranteed

    millisecond latency worldwide  Write optimized, latch-free database engine  Automatically indexed SSD storage  Synchronous and automatic indexing at sustained ingestion rates  No schema or index management needed  No schema versioning needed  No schema migration needed  All of this is highly relevant for rapidly evolving apps in a globally distributed setup Guaranteed Low Latency Reads (1KB) Indexed writes (1KB) Read < 2 ms Writes < 6 ms Read < 10 ms Writes < 15 ms 99% 50%
  6. Transaction data Web/content data Social data/Machine-generated data KB Data variety/complexity

    Data volume Log 10 scale 1 15 Cosmos DB: Elastically Scalable Storage • Single machine is never a bottleneck • A single table can scale from GB-PBs, across many machines, and regions • Transparent server side partition management and routing • Optionally evict old data using built-in support for TTL • Policy based, automatic tiering to any HDFS compatible data lake (e.g. ADLS or Azure Storage) • • Customers pay only for the throughput and storage they need
  7. Cosmos DB: Elastically Scalable Throughput • Elastically scale throughput from

    10 to 100s of millions of requests/sec across multiple regions • Support for requests/sec and requests/min for different workloads – This ensures that never have to provision for the peak • Customers pay only for the throughput and storage they need • Customers pay by the hour for the provisioned throughput
  8. 46,920 10,000 100,000 98,990 92,323 55,403 100,000 0 20,000 40,000

    60,000 80,000 100,000 120,000 sec 1 sec 4 sec 7 sec 10 sec 13 sec 16 sec 19 sec 22 sec 25 sec 28 sec 31 sec 34 sec 37 sec 40 sec 43 sec 46 sec 49 sec 52 sec 55 sec 58 sec 61 sec 64 sec 67 sec 70 sec 73 sec 76 sec 79 sec 82 sec 85 sec 88 RU/m - Predictable Performance For Unpredictable Needs RU Consumed RU/sec RU/min Second 29: 36,920 RUs consumed above provisioned RU/sec (10k). Remaining Budget RU/min: 55,403 Second 61: RU/min budget reset to 100,000 Cosmos DB – Lowest TCO Deeply exploits cloud core properties and economies of scale Azure Cosmos DB Cosmos DB Cosmos DB: 5-10X more cost-effective Customers save 60-73% in provisioning cost! • Commodity hardware • Fine-grained multi-tenancy • End to end resource governance • Optimal utilization of resources
  9. RU/m - Predictable Performance For Unpredictable Needs 46,920 10,000 100,000

    98,990 92,323 55,403 100,000 0 20,000 40,000 60,000 80,000 100,000 120,000 sec 1 sec 3 sec 5 sec 7 sec 9 sec 11 sec 13 sec 15 sec 17 sec 19 sec 21 sec 23 sec 25 sec 27 sec 29 sec 31 sec 33 sec 35 sec 37 sec 39 sec 41 sec 43 sec 45 sec 47 sec 49 sec 51 sec 53 sec 55 sec 57 sec 59 sec 61 sec 63 sec 65 sec 67 sec 69 sec 71 sec 73 sec 75 sec 77 sec 79 sec 81 sec 83 sec 85 sec 87 sec 89 DocumentDB RU Consumption and Provisioning RU Consumed RU/sec RU/min Second 29: 36,920 RUs consumed above provisioned RU/sec (10k). Remaining Budget RU/min: 55,403 Second 61: RU/min budget reset to 100,000 Customers save 60-73% in provisioning cost! Guaranteed low latency for spiky workloads
  10. Programmable Data Consistency • Databases are divided into two categories

    – Provide extreme choices – strong vs. eventual consistency (e.g., DynamoDB) – Leave everything for developers to configure (e.g., Cassandra) • Read repair, Hinted handoff, quorum sizes, replication topologies etc • Developers have to make precise tradeoffs between – Consistency and availability (during failures) – Consistency and latency (during steady state) – Consistency and throughput (this is important for TCO reasons)
  11. Choices of Consistency 5 well-defined consistency levels for low latency

    and high availability Strong Bounded-stateless Session Consistent prefix Eventual Most real-life applications do not fall into these two extremes
  12. Azure Cosmos DB 01 Strong Bounded Staleness Session Consistent Prefix

    Eventual 5 well-defined consistency models Clear Tradeoffs • Latency • Availability • Throughput
  13. Latency @ 99th percentile SLA Throughput SLA Consistency SLA Availability

    SLA 2 4 3 1 Industry-Leading, Comprehensive SLAs 6
  14. Comprehensive SLAs Globally distributed database needs to tackle 1. latency

    vs. consistency tradeoffs (in steady state) 2. availability vs. consistency tradeoff (during failures) 3. throughput vs. consistency tradeoffs during all times 4. throughput vs. latency tradeoffs during all times Simply offering high availability SLAs are not sufficient! Cosmos DB: – 99.99% HA within a single region – 99.999% across regions – 99.99 SLA throughput, latency, consistency all at the 99th percentile
  15. High Availability Performance Latency Performance Throughput Data Consistency Only database

    with comprehensive SLAs across 4 dimensions Microsoft Azure
  16. Schema-agnostic, automatic indexing • At global scale, schema/index management is

    hard • Automatic and synchronous indexing of all ingested content - hash, range, geo- spatial, and columnar – No schemas or secondary indices ever needed • Resource governed, write optimized database engine with latch free and log structured techniques • Online and in-situ index transformations • While the database is fully schema-agnostic, schema-extraction is built in – Customers can get Avro schemas from the database
  17. Why Multi-Model? Transaction data Web/content data Social data/Machine-generated data KB

    Data variety/complexity Data volume Log 10 scale 1 15 Who Wants to Have 3-5 Different Backend Databases?
  18. Global Distribution from the ground-up Limitless Scale Extremely Low Latency

    Multiple Consistency Levels ARS model Comprehensive SLAs Planet-Scale Multi-Model Multi-API Versatile Workloads Operational Workloads Analytical Workloads Key-Value Tabular Graph Documents Azure Cosmos DB Relational ANSI SQL
  19. Native Support for Multiple Data Models • Database engine operates

    on atom-record-sequence (ARS) based type system – All data models are efficiently translated to ARS • API and wire protocols are supported via extensible modules • Instance of a given data model can be materialized as trees • Graph, documents, key-value, column-family, … more to come KEY-VALUE COLUMN-FAMILY DOCUMENT GRAPH
  20. Tables API in Azure Cosmos DB ✓Premium experience (low latency,

    well-defined consistency) ✓Globally Distributed ✓Secondary Indexes for user-defined queries ✓Millisecond latency, Guaranteed throughput ✓We heard you – “Top user voice asks” Azure Cosmos DB: Table API Azure Storage: Standard Table API Azure Storage SDKs 100% Backwards compatible, Seamless experience Azure Cosmos DB: Table API Azure Storage SDKs Coming Soon : Update for standard Tables, optimized for storage Seamless migration
  21.  Model the real world  Relationship as first-class entities

     Optimized for graph storage & traversal  Gremlin standard Gremlin API in Azure Cosmos DB Azure Cosmos DB: Graph API
  22. Globally distributed, elastically scalable, low latency, auto-indexed service Independently scalable

    graph engine (using Tinkerpop framework) Gremlin and SQL query languages Native Graph Processing
  23. Security & Compliance Enterprise grade security Encryption at Rest •

    Always encrypted at rest and in motion • Data, index, backups, and attachments encrypted Encryption is enabled automatically by default • No impact on performance, throughput or availability • Transparent to your application Comprehensive Azure compliance certification • ISO 27001, ISO 27018, EUMC, HIPAA, PCI • SOC1 and SOC2 (Audit complete, Certification in Q2 2017) • FedRAMP , IRS 1075, UK Official (IL2) (Q2 2017) • HITRUST (H2 2017)
  24. Getting Started  Web  cosmosdb.com  portal.azure.com  aka.ms/cosmosdb

     aka.ms/cosmosdb-Tables  aka.ms/cosmosdb-Graph  aka.ms/cosmosdb-MongoDB  aka.ms/cosmosdb-DocumentDB  cosmosdb.com/capacityplanner  Download  aka.ms/CosmosDB-emulator  Re-visit Build session recordings on Channel 9.  Continue your education at Microsoft Virtual Academy online.