Slide 1

Slide 1 text

#Azure #CosmosDB Dharma Shukla, @dharmashukla, Distinguished Engineer, Microsoft Lessons learnt from building a globally distributed database from the ground up

Slide 2

Slide 2 text

2010 Project Florence 2017 Cosmos DB Blood, Sweat and Tears Requirements (circa 2010) • Turnkey global distribution • Low latency at the 99th percentile, worldwide • Guaranteed high availability • Programmable consistency • Elastically scale throughput and storage, globally, on demand • Operate at low cost

Slide 3

Slide 3 text

Cosmos DB

Slide 4

Slide 4 text

Global Distribution Exploiting the cloud properties to the extreme IaaS hosted managed database offerings cannot match this! Millions of trans/sec Petabytes of data Elastic and unlimited scalability Cost efficiencies with fine grained multi-tenancy

Slide 5

Slide 5 text

Global distribution from the ground-up • Cosmos DB as a foundational Azure service – Available in all Azure regions by default, including sovereign/government clouds • Automatic multi-region replication – Associate any number of regions with your database account – Policy based geo-fencing • Multi-homing APIs – Apps don’t need to be redeployed during regional failover • Allows for dynamically setting priorities to regions – Simulate regional disaster via API – Test the end to end availability for the entire app (beyond just the database) • First to offer comprehensive SLA for latency, throughput, availability and consistency

Slide 6

Slide 6 text

• Globally distributed with reads and writes served from local region • Write optimized, latch-free database engine designed for SSDs and low latency access • Synchronous and automatic indexing at sustained ingestion rates Guaranteed low latency @ P99

Slide 7

Slide 7 text

• System designed to independently scale storage and throughput • Transparent server side partition management and routing • Automatically indexed SSD storage • Automatic global distribution of data across any number of Azure regions • Optionally evict old data using built-in support for TTL Elastically scalable storage

Slide 8

Slide 8 text

Typical activity of an application

Slide 9

Slide 9 text

 Elastically scaling throughput from 10 to 100s of millions of transactions/sec across multiple regions  Fully resource governed stack  Highly responsive partition management  Modular, resource governed nested consensus  Multiple granularities of throughput (e.g. sec, min, hour) at different price points Elastically scaling throughput, anywhere, anytime 9 PM PST Less throughput More throughput More throughput Less throughput 11 PM PST

Slide 10

Slide 10 text

Scaling throughput at different granularities

Slide 11

Slide 11 text

US Open!

Slide 12

Slide 12 text

Real world consistency is not a binary choice

Slide 13

Slide 13 text

The wild west of consistency models…

Slide 14

Slide 14 text

The state of commercial databases Strong consistency High latency Eventual consistency, Low latency

Slide 15

Slide 15 text

Consistency models in Cosmos DB 5 well-defined consistency levels with clear tradeoffs Strong Bounded-stateless Session Consistent prefix Eventual Most real-life applications do not fall into these two extremes

Slide 16

Slide 16 text

Insights from production workloads 4 18 73 2 3 Usage (%) Strong Bounded Staleness Session Consistent Prefix Eventual 0 0.5 1 1.5 Throughput Consistency distribution among customers Consistency vs. Throughput

Slide 17

Slide 17 text

High availability SLA is not good enough

Slide 18

Slide 18 text

Microsoft Azure

Slide 19

Slide 19 text

Retailer - Black Friday/Cyber Monday (11/18-11/30) 2016 0 2000000 4000000 6000000 8000000 10000000 12000000 10/27/2016 11/6/2016 11/16/2016 11/26/2016 12/6/2016 12/16/2016 12/26/2016 1/5/2017 Throughput (transactions/sec)

Slide 20

Slide 20 text

Retailer - Black Friday/Cyber Monday (11/18-11/30) 2016 99.96 99.962 99.964 99.966 99.968 99.97 99.972 99.974 99.976 99.978 99.98 99.982 99.984 99.986 99.988 99.99 99.992 99.994 99.996 99.998 100 0.00 500,000,000.00 1,000,000,000.00 1,500,000,000.00 2,000,000,000.00 2,500,000,000.00 3,000,000,000.00 Total Requests v/s Availability TotalRequests Availability

Slide 21

Slide 21 text

Retailer - Black Friday/Cyber Monday (11/18-11/30) 2016 0 2 4 6 8 10 12 14 11/1/2016 11/3/2016 11/5/2016 11/7/2016 11/9/2016 11/11/2016 11/13/2016 11/15/2016 11/17/2016 11/19/2016 11/21/2016 11/23/2016 11/25/2016 11/27/2016 11/29/2016 12/1/2016 12/3/2016 12/5/2016 12/7/2016 12/9/2016 12/11/2016 12/13/2016 12/15/2016 12/17/2016 12/19/2016 12/21/2016 12/23/2016 12/25/2016 12/27/2016 12/29/2016 12/31/2016 Latency (ms) P99 latency Read P99 latency Write

Slide 22

Slide 22 text

At global scale CREATE INDEX, DROP INDEX, ALTER TABLE

Slide 23

Slide 23 text

• Logical index layouts (inverted, tree, columnar, …) • Automatic and synchronous indexing of all ingested content • No schemas or secondary indices ever needed • Resource governed, write optimized database engine with latch free and log structured techniques Schema agnostic indexing locations headquarter exports 0 1 country Germany city Berlin country France city Paris city Moscow city Athens Belgium 0 1 { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports":[{"city":"Moscow"}, {"city":"Athens"}] }

Slide 24

Slide 24 text

Meet developers where they are

Slide 25

Slide 25 text

• Database engine operates on atom-record-sequence (ARS) based type system • All data models are translated to ARS • API and wire protocols are supported via extensible modules • Instance of a given data model can be materialized as trees • Graph, documents, key-value, column-family, … more to come Native support for multiple data models SQL

Slide 26

Slide 26 text

Azure Cosmos DB Global distribution Elastic scale out Guaranteed low latency Comprehensive SLAs Five consistency models SQL Key-Value Column-family Graph Documents Microsoft’s globally distributed, multi-model database service

Slide 27

Slide 27 text

Specify, Verify, Test

Slide 28

Slide 28 text

Running the Service • Weekly deployments of the entire stack worldwide • Quality gates • Chaos, component and functional test coverage • Automated performance, RG and consistency runs every 4 hours • 16+ hours of stress run every day • Full stack upgrades with customer workloads • Chaos tests • Automated linearizability checker and Jepsen tests • Invariant checks • All invariant violations are traced • SEV2 alerts on any invariant violation either pre or post production • Hot fix all invariant violation within 5 days • Transparently making all important metrics available to customers • SLA violations, workload metrics, PBS etc.

Slide 29

Slide 29 text

 Global distribution, horizontal partitioning and fine-grained multi-tenancy cannot be an afterthought while building a cloud database  Schema agnostic database engine design is crucial for a globally distributed database  Intermediate consistency models are extremely useful  A globally distributed database must provide comprehensive SLAs beyond just high availability  Throughput, latency at 99th percentile, consistency and high availability Summary

Slide 30

Slide 30 text

References • Getting started with Cosmos DB • cosmosdb.com • portal.azure.com • aka.ms/cosmosdb • Downloadable service emulator (aka.ms/CosmosDB-emulator) • Technical Overview -> https://azure.microsoft.com/en-us/blog/a-technical-overview-of- azure-cosmos-db/ • Schema Agnostic Indexing, VLDB 2015 -> http://www.vldb.org/pvldb/vol8/p1668- shukla.pdf • Follow #CosmosDB on Twitter • @azurecosmosdb • @dharmashukla • @rimmanehme

Slide 31

Slide 31 text

Azure Cosmos DB We are just getting started… We are Hiring Bangalore, Redmond