Slide 1

Slide 1 text

CosmosDB Jack of All Trades, Master of Many Daron Yöndem http://daron.me | @daronyondem

Slide 2

Slide 2 text

A little history 2010 – Project Florence 2015 – DocumentDB 2017 – CosmosDB

Slide 3

Slide 3 text

News Flash! “It is just JSON” https://www.documentdb.com/sql/demo

Slide 4

Slide 4 text

1-Click Global Replication • Ring 0 Service • Multi-Homing • Priorities for regions • Manual or automatic failover

Slide 5

Slide 5 text

99.99% SLA for Low latency reads + writes • Reads and writes served from local region • Guaranteed millisecond latency worldwide • Write optimized, latch-free database engine • Automatically indexed SSD storage Reads (1KB) Indexed writes (1KB) Read < 2 ms Writes < 6 ms Read < 10 ms Writes < 15 ms 99% 50% • Synchronous and automatic indexing at sustained ingestion rates • No schema or index management needed • No schema versioning needed • No schema migration needed

Slide 6

Slide 6 text

Provisioned Throughput

Slide 7

Slide 7 text

What is RU? • Request Unit • Not all requests are equal. • A normalized quantity of request unit based on the amount of computation (CPU, memory, and IOPS) required to serve the request.

Slide 8

Slide 8 text

How to calculate? Item Size Reads/second Writes/second Request units 1 KB 500 100 (500 * 1) + (100 * 5) = 1,000 RU/s 1 KB 500 500 (500 * 1) + (500 * 5) = 3,000 RU/s 4 KB 500 100 (500 * 1.3) + (100 * 7) = 1,350 RU/s 4 KB 500 500 (500 * 1.3) + (500 * 7) = 4,150 RU/s 64 KB 500 100 (500 * 10) + (100 * 48) = 9,800 RU/s 64 KB 500 500 (500 * 10) + (500 * 48) = 29,000 RU/s See: https://www.documentdb.com/capacityplanner

Slide 9

Slide 9 text

For example • SELECT * FROM c • (2.87 RU) • SELECT * FROM c where Contains (c.Name, "Sample") • (2.45 RU)

Slide 10

Slide 10 text

DEMO Calculating RU On-The-Fly

Slide 11

Slide 11 text

public static async Task> GetItemsAsync(Expression> predicate) { double queryCost = 0; IDocumentQuery query = client.CreateDocumentQuery( UriFactory.CreateDocumentCollectionUri(DatabaseId, CollectionId), new FeedOptions { MaxItemCount = -1 }) .Where(predicate) .AsDocumentQuery(); List results = new List(); while (query.HasMoreResults) { var response = await query.ExecuteNextAsync(); queryCost += response.RequestCharge; results.AddRange(response); } Debug.WriteLine(queryCost.ToString()); return results; }

Slide 12

Slide 12 text

Request Unit Management Single Partition Container Partitioned Container Minimum Throughput 400 RU/sec 1.000 RU/sec Maximum Throughput 10.000 RU/sec Unlimited Offer offer = client.CreateOfferQuery() .Where(r => r.ResourceLink == collection.SelfLink) .AsEnumerable().SingleOrDefault(); offer = new OfferV2(offer, 12000); client.ReplaceOfferAsync(offer); A partition key is required to scale your collection's throughput beyond 2,500 request units in the future

Slide 13

Slide 13 text

Affecting RUs are; • Item size • Item property count (Indexing) • Data consistency (Strong or Bounded Staleness) • Indexed properties (lazy indexing can help) • Document indexing (Disable if you don’t need) • Query patterns (predicates, UDFs, data source size) • Script usage (SPs, triggers)

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

What if you exceed? HTTP Status 429 Status Line: RequestRateTooLarge x-ms-retry-after-ms :100

Slide 16

Slide 16 text

Partitioning Partition schema is immutable

Slide 17

Slide 17 text

Reading data with partitions.

Slide 18

Slide 18 text

Reading data with partitions.

Slide 19

Slide 19 text

How to detect Hot Partitions?

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Choose Your Consistency Level 01 Strong Bounded Staleness Session Consistent Prefix Eventual Clear Tradeoffs • Latency • Availability • Throughput Lower latency, higher availability, better read scalability.

Slide 23

Slide 23 text

Bounded Staleness 01 Strong Bounded Staleness Session Consistent Prefix Eventual When choosing bounded staleness, the "staleness" can be configured in two ways: number of versions K of the item by which the reads lag behind the writes, and the time interval t Lower latency, higher availability, better read scalability.

Slide 24

Slide 24 text

Consistent Prefix 01 Strong Bounded Staleness Session Consistent Prefix Eventual Consistent prefix guarantees that reads never see out of order writes. If writes were performed in the order A, B, C, then a client sees either A, A,B, or A,B,C, but never out of order like A,C or B,A,C. Lower latency, higher availability, better read scalability.

Slide 25

Slide 25 text

Multi-Model API

Slide 26

Slide 26 text

Native Support for Multiple Data Models • Database engine operates on atom-record-sequence (ARS) based type system • All data models are efficiently translated to ARS • API and wire protocols are supported via extensible modules • Instance of a given data model can be materialized as trees • Graph, documents, key-value, column-family, … more to come KEY-VALUE COLUMN-FAMILY DOCUMENT GRAPH

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Auto Indexing • IndexingMode • Consistent (Collection Consistency applies) • Lazy (ingest now, query later) • None (EnableScanInQuery) • DataTypes • String, Number, Point, Polygon • Index Types • Hash (joins) • Range (<, >) • Spatial • Precision can be defined.

Slide 29

Slide 29 text

4 Axis SLA Latency @ 99th percentile SLA Throughput SLA Consistency SLA Availability SLA 2 4 3 1 Cosmos DB: 99.99% HA within a single region 99.999% across regions 99.99 SLA throughput, latency, consistency all at the 99th percentile

Slide 30

Slide 30 text

Connecting via the Gateway (pyDocumentDB)

Slide 31

Slide 31 text

Connecting via the Data Nodes (azure-cosmosdb-spark)

Slide 32

Slide 32 text

Pushdown Predicate Filtering

Slide 33

Slide 33 text

Advantages: Blazing Fat IoT Scenarios

Slide 34

Slide 34 text

Updateable Columns

Slide 35

Slide 35 text

Updateable Columns

Slide 36

Slide 36 text

DEMO Accessing change feed through Cosmos DB Java SDK and Spark

Slide 37

Slide 37 text

https://github.com/Azure/azure-cosmosdb-spark

Slide 38

Slide 38 text

Change Feed Tracking Sample Scenario

Slide 39

Slide 39 text

Security • Documents and backups are encrypted at rest • IP-based access controls • Role-based access controls • Automated online backups • Attack monitoring • Geo-fencing

Slide 40

Slide 40 text

Disclaimer • Cosmos DB is not a SQL Database, no complex table joins. (you are doing it wrong) • Other NoSQL databases are good at doing one or two things really well but not native to Cloud.

Slide 41

Slide 41 text

Thanks! Slides: http://daron.me/decks Daron Yöndem http://daron.me | @daronyondem