Cosmos DB Operations

Cosmos DB Operations email : [email protected] Twitter - @azurecosmosdb

Managing Database Operations • Performance management • Provisioned throughput guaranteed
by Cosmos DB • Latency for point reads/writes of 1 KB guaranteed • Capacity management • Cosmos DB is elastic for both your throughput and storage needs • Availability management • Cosmos DB provides ability to distribute data for low latency reads & availability

What to monitor • Throughput • Throttles • Metrics- Distribution
of throughput • Storage • Metrics- Distribution of data • Latency • Metrics Server latency • Consistency/Availability – as per SLA

Performance • Throughput management • RU – Request Unit is
the budget at per second • Throughput equally distributed across partition key ranges • # of partition key ranges are transient and change to accommodate increased data • Throughput increases automatically when data increases to serve that data • Every CRUD operation uses RU • TTL operation does not use RU. • SQL Query RU can change based on # of entities valid for filter condition • Point lookup/Write RU never changes with help of partition key and exact id • Scale up/down • Scheduled • Web job Code – cli/sdk • Portal • Alerts • Monitor (soon to have collection id) –

Performance – Throughput • Throughput management • Throttling - SDK
retries throttling issue by retrying • This behavior can be overridden • Side effect of automatic retry – more perceived latency

Performance - SQL API – query execution metrics • https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-
query-metrics#query-execution-metrics

Performance - • Metadata requests • Use canonicalized model to
refer to “resources” • do not query for them frequently – cache the ref

Performance • Client Side log/etl • For debugging retrys or
other issues – unreachable host • Do not switch on indefinitely • Look for CPU being high on client as 1st measure. https://github.com/Azure/azure-cosmosdb-java#prerequisites

Performance • Client side Latency • Latency is the function
of • operation or • automatic retrys • Also possible because of • MaxDegreeOfParallelism • MaxBufferedItemCount • MaxItemCount • Colocate client in same Datacenter as the Cosmos DB account • Use Static Instance of the DocumentClient • Follow performance tips - https://docs.microsoft.com/en-us/azure/cosmos- db/performance-tips , https://docs.microsoft.com/en-us/azure/cosmos- db/performance-tips-java

Performance • Index management • Automatic indexing • Range can
do hash’s job • Hash useful for contains query over array • Indexing policy • Do not use Lazy indexing • If you want to query on id • Create another attribute with • same value and index it

Performance - Index • Disable Index if all you need
is kv store in SQL api • Id can be partition key • Index only what is required

Performance • You found an issue in performance • Query/operation
taking longer • Option • log time/ru yourself to appinsight • Look at little delayed log analytics data • Log Analytics • Which query takes more RU • Which query/operation takes more time

Performance Summary • In order of preference (latency and throughput)
• GET • Single-partition query • Cross-partition query • Read feed (or) scan query • Bulk Insert (SP) > POST > PUT • TTL Delete > Bulk Delete (SP) > DELETE > PUT • Use change feed! • Stored procs are good for writing bulk/batch in transactional manner (do not use them for doing reads/bulk reads). Client reads will always get you more bang for the buck.

Storage management • No capacity management • Platform takes care
of growth and required growth of request units • Ensure no data skew • Use metrics to detect but design • Ensure right partition key • Rebuild the container • Use TTL to expire stuff • Use Change feed + Azure functions to move data • What if you need different partition key for same data (secondary indexing ) • Use Change feed to populate another collection with different partition key

Storage – large documents – how to • Large documents
• Consume high RUs due to IOs and indexing over • Lead to partition key quota full • Lead to rate-limiting • Patterns to manage large documents • Storing large attributes in separate linked document/collection • Storing large attributes in Azure Blob Storage • Compress these attributes • Custom indexing policy, disable on subset of properties

Storage – large partition keys > 10 GB • Common
scenarios: • Multi-tenant applications where few tenants are very large • Router publishes telemetry at higher rate than sensors • Celebrity in a social networking app, viral gaming tournament • Patterns to manage large partition keys • Have a surrogate partition key like tenant ID + 0-100 • Use hybrid partitioning scheme for small tenants, and large tenants = 0-100 • Move large tenants to their own collections • If the per-document size is large, use the patterns for large documents

Storage – hot partition keys • Subset of keys much
more frequently accessed than others • Popular item in retail catalog, common driver defect in Windows DnA telemetry • Patterns to manage hot partition keys • Secondary cache collection with just the hot keys • Scale out across regions for isolating read and write RUs • Reduce RU consumption by converting critical-path queries to GETs • Materialized views for aggregates like COUNT into a document • Materialized view for latest state, leaderboard into a document • Why? Amortize cost at write time vs. read time

Availability management • Always – add Geo DR • 99.99
within region, 99.999 for reads • Data available in read regions for low latency read workload(Changefeed or just reads) • Data abides by consistency provided • Auto homing SDK • Leverage Manual failover testing for DR testing or follow the sun • Leverage consistency settings to take advantage of throughput (if required) – 2 * strong consistency, reads at lower consistency

Error codes • Http Status code - https://docs.microsoft.com/en- us/rest/api/documentdb/http-status-codes-for-documentdb •
200 • 400 • 401/403 .. • 404/413 • 429 – SDK will handle it • Retry Policy – default • Override it if needed • 500 – file support • 503 – retryable

Other • Bulk load • Increase RU, shuffle the data
, push parallelly • Use a tool which knows distributed database • ODBC would not be good way to connect for example • A tool/service in offering – reach out for bulk load tool in java/.net • Backup • Automatic two - 4 hourly snapshots (for oops I deleted scenario) • Restore on demand via support call • Create copy of database • Changefeed + Azure function • Paging • Continuation tokens in Cosmos DB never expire. ation token corresponding to 1,2,3 and 4. So you can execute the query to go back to that page. – WIP

Managing Database Operations • Performance • Throughput - Choosing right
partition/operations/data model • T = Reads + Writes + updates + deletes + queries • Latency - Ensuring queries have right partitions • GET < Single partition < Multiple Partition • Storage management • Choose right partition keys – query fan out for all data vs data size • Colocate data with partition key • Availability management • Always Add Geo redundancy with one click

Other • Access to account level activities can be controlled
• Log of activities in log analytics • Token based time access to resources • Data at rest encrypted • Data in motion encrypted

Links • Metrics - https://docs.microsoft.com/en-us/azure/cosmos-db/use-metrics • Diagnostic logging - https://docs.microsoft.com/en-us/azure/cosmos-db/logging
• Set throughput - https://docs.microsoft.com/en-us/azure/cosmos-db/set-throughput • Access control - https://docs.microsoft.com/en-us/azure/cosmos-db/access-control • Failover - https://docs.microsoft.com/en-us/azure/cosmos-db/regional-failover • TTL - https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live • Indexing - https://docs.microsoft.com/en-us/azure/cosmos-db/indexing-policies • Change feed - https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed • SQL Query perf - https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query-metrics • Partitioning - https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-partition-data • Modelling - https://docs.microsoft.com/en-us/azure/cosmos-db/modeling-data • Perf tips - https://docs.microsoft.com/en-us/azure/cosmos-db/performance-tips • Throughput - https://docs.microsoft.com/en-us/azure/cosmos-db/request-units • Azure CLI - https://docs.microsoft.com/en-us/azure/cosmos-db/cli-samples

Learn more www.azurecosmosdb.com GLOBAL APPS NEED GLOBAL DATA FROM A
SERVICE THAT’S OUT OF THIS WORLD WELCOME TO AZURE COSMOS DB Sign up to Azure for free https://aka.ms/azureaccount Try Azure Cosmos DB https://aka.ms/tryazurecosmosdb Join next weeks session to learn about how to build serverless apps and resister https://aka.ms/CosmosDBlearn

Thank you for joining us.

Cosmos DB Operations

Cosmos DB Operations

Govind Kanshi

More Decks by Govind Kanshi

Other Decks in Programming

Featured

Transcript

Cosmos DB Operations email : [email protected] Twitter - @azurecosmosdb

Managing Database Operations • Performance management • Provisioned throughput guaranteed

What to monitor • Throughput • Throttles • Metrics- Distribution

Performance • Throughput management • RU – Request Unit is

Performance – Throughput • Throughput management • Throttling - SDK

Performance - SQL API – query execution metrics • https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-

Performance - • Metadata requests • Use canonicalized model to

Performance • Client Side log/etl • For debugging retrys or

Performance • Client side Latency • Latency is the function

Performance • Index management • Automatic indexing • Range can

Performance - Index • Disable Index if all you need

Performance • You found an issue in performance • Query/operation

Performance Summary • In order of preference (latency and throughput)

Storage management • No capacity management • Platform takes care

Storage – large documents – how to • Large documents

Storage – large partition keys > 10 GB • Common

Storage – hot partition keys • Subset of keys much

Availability management • Always – add Geo DR • 99.99

Error codes • Http Status code - https://docs.microsoft.com/en- us/rest/api/documentdb/http-status-codes-for-documentdb •

Other • Bulk load • Increase RU, shuffle the data

Managing Database Operations • Performance • Throughput - Choosing right

Other • Access to account level activities can be controlled

Links • Metrics - https://docs.microsoft.com/en-us/azure/cosmos-db/use-metrics • Diagnostic logging - https://docs.microsoft.com/en-us/azure/cosmos-db/logging

Learn more www.azurecosmosdb.com GLOBAL APPS NEED GLOBAL DATA FROM A

Thank you for joining us.