Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cosmos DB Operations

Cosmos DB Operations

Azure Cosmos DB is globally distributed, multimodel SLA based database for throughput, low latency for reads/writes and consistency. The operations of Cosmos DB are breeze as capacity management, performance management and availability management are all taken care by the platform. Right modelling with partition for scale out in mind and right throughput - ensures you do not have to do much.

Govind Kanshi

March 15, 2018
Tweet

More Decks by Govind Kanshi

Other Decks in Programming

Transcript

  1. Managing Database Operations • Performance management • Provisioned throughput guaranteed

    by Cosmos DB • Latency for point reads/writes of 1 KB guaranteed • Capacity management • Cosmos DB is elastic for both your throughput and storage needs • Availability management • Cosmos DB provides ability to distribute data for low latency reads & availability
  2. What to monitor • Throughput • Throttles • Metrics- Distribution

    of throughput • Storage • Metrics- Distribution of data • Latency • Metrics Server latency • Consistency/Availability – as per SLA
  3. Performance • Throughput management • RU – Request Unit is

    the budget at per second • Throughput equally distributed across partition key ranges • # of partition key ranges are transient and change to accommodate increased data • Throughput increases automatically when data increases to serve that data • Every CRUD operation uses RU • TTL operation does not use RU. • SQL Query RU can change based on # of entities valid for filter condition • Point lookup/Write RU never changes with help of partition key and exact id • Scale up/down • Scheduled • Web job Code – cli/sdk • Portal • Alerts • Monitor (soon to have collection id) –
  4. Performance – Throughput • Throughput management • Throttling - SDK

    retries throttling issue by retrying • This behavior can be overridden • Side effect of automatic retry – more perceived latency
  5. Performance - • Metadata requests • Use canonicalized model to

    refer to “resources” • do not query for them frequently – cache the ref
  6. Performance • Client Side log/etl • For debugging retrys or

    other issues – unreachable host • Do not switch on indefinitely • Look for CPU being high on client as 1st measure. https://github.com/Azure/azure-cosmosdb-java#prerequisites
  7. Performance • Client side Latency • Latency is the function

    of • operation or • automatic retrys • Also possible because of • MaxDegreeOfParallelism • MaxBufferedItemCount • MaxItemCount • Colocate client in same Datacenter as the Cosmos DB account • Use Static Instance of the DocumentClient • Follow performance tips - https://docs.microsoft.com/en-us/azure/cosmos- db/performance-tips , https://docs.microsoft.com/en-us/azure/cosmos- db/performance-tips-java
  8. Performance • Index management • Automatic indexing • Range can

    do hash’s job • Hash useful for contains query over array • Indexing policy • Do not use Lazy indexing • If you want to query on id • Create another attribute with • same value and index it
  9. Performance - Index • Disable Index if all you need

    is kv store in SQL api • Id can be partition key • Index only what is required
  10. Performance • You found an issue in performance • Query/operation

    taking longer • Option • log time/ru yourself to appinsight • Look at little delayed log analytics data • Log Analytics • Which query takes more RU • Which query/operation takes more time
  11. Performance Summary • In order of preference (latency and throughput)

    • GET • Single-partition query • Cross-partition query • Read feed (or) scan query • Bulk Insert (SP) > POST > PUT • TTL Delete > Bulk Delete (SP) > DELETE > PUT • Use change feed! • Stored procs are good for writing bulk/batch in transactional manner (do not use them for doing reads/bulk reads). Client reads will always get you more bang for the buck.
  12. Storage management • No capacity management • Platform takes care

    of growth and required growth of request units • Ensure no data skew • Use metrics to detect but design • Ensure right partition key • Rebuild the container • Use TTL to expire stuff • Use Change feed + Azure functions to move data • What if you need different partition key for same data (secondary indexing ) • Use Change feed to populate another collection with different partition key
  13. Storage – large documents – how to • Large documents

    • Consume high RUs due to IOs and indexing over • Lead to partition key quota full • Lead to rate-limiting • Patterns to manage large documents • Storing large attributes in separate linked document/collection • Storing large attributes in Azure Blob Storage • Compress these attributes • Custom indexing policy, disable on subset of properties
  14. Storage – large partition keys > 10 GB • Common

    scenarios: • Multi-tenant applications where few tenants are very large • Router publishes telemetry at higher rate than sensors • Celebrity in a social networking app, viral gaming tournament • Patterns to manage large partition keys • Have a surrogate partition key like tenant ID + 0-100 • Use hybrid partitioning scheme for small tenants, and large tenants = 0-100 • Move large tenants to their own collections • If the per-document size is large, use the patterns for large documents
  15. Storage – hot partition keys • Subset of keys much

    more frequently accessed than others • Popular item in retail catalog, common driver defect in Windows DnA telemetry • Patterns to manage hot partition keys • Secondary cache collection with just the hot keys • Scale out across regions for isolating read and write RUs • Reduce RU consumption by converting critical-path queries to GETs • Materialized views for aggregates like COUNT into a document • Materialized view for latest state, leaderboard into a document • Why? Amortize cost at write time vs. read time
  16. Availability management • Always – add Geo DR • 99.99

    within region, 99.999 for reads • Data available in read regions for low latency read workload(Changefeed or just reads) • Data abides by consistency provided • Auto homing SDK • Leverage Manual failover testing for DR testing or follow the sun • Leverage consistency settings to take advantage of throughput (if required) – 2 * strong consistency, reads at lower consistency
  17. Error codes • Http Status code - https://docs.microsoft.com/en- us/rest/api/documentdb/http-status-codes-for-documentdb •

    200 • 400 • 401/403 .. • 404/413 • 429 – SDK will handle it • Retry Policy – default • Override it if needed • 500 – file support • 503 – retryable
  18. Other • Bulk load • Increase RU, shuffle the data

    , push parallelly • Use a tool which knows distributed database • ODBC would not be good way to connect for example • A tool/service in offering – reach out for bulk load tool in java/.net • Backup • Automatic two - 4 hourly snapshots (for oops I deleted scenario) • Restore on demand via support call • Create copy of database • Changefeed + Azure function • Paging • Continuation tokens in Cosmos DB never expire. ation token corresponding to 1,2,3 and 4. So you can execute the query to go back to that page. – WIP
  19. Managing Database Operations • Performance • Throughput - Choosing right

    partition/operations/data model • T = Reads + Writes + updates + deletes + queries • Latency - Ensuring queries have right partitions • GET < Single partition < Multiple Partition • Storage management • Choose right partition keys – query fan out for all data vs data size • Colocate data with partition key • Availability management • Always Add Geo redundancy with one click
  20. Other • Access to account level activities can be controlled

    • Log of activities in log analytics • Token based time access to resources • Data at rest encrypted • Data in motion encrypted
  21. Links • Metrics - https://docs.microsoft.com/en-us/azure/cosmos-db/use-metrics • Diagnostic logging - https://docs.microsoft.com/en-us/azure/cosmos-db/logging

    • Set throughput - https://docs.microsoft.com/en-us/azure/cosmos-db/set-throughput • Access control - https://docs.microsoft.com/en-us/azure/cosmos-db/access-control • Failover - https://docs.microsoft.com/en-us/azure/cosmos-db/regional-failover • TTL - https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live • Indexing - https://docs.microsoft.com/en-us/azure/cosmos-db/indexing-policies • Change feed - https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed • SQL Query perf - https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query-metrics • Partitioning - https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-partition-data • Modelling - https://docs.microsoft.com/en-us/azure/cosmos-db/modeling-data • Perf tips - https://docs.microsoft.com/en-us/azure/cosmos-db/performance-tips • Throughput - https://docs.microsoft.com/en-us/azure/cosmos-db/request-units • Azure CLI - https://docs.microsoft.com/en-us/azure/cosmos-db/cli-samples
  22. Learn more www.azurecosmosdb.com GLOBAL APPS NEED GLOBAL DATA FROM A

    SERVICE THAT’S OUT OF THIS WORLD WELCOME TO AZURE COSMOS DB Sign up to Azure for free https://aka.ms/azureaccount Try Azure Cosmos DB https://aka.ms/tryazurecosmosdb Join next weeks session to learn about how to build serverless apps and resister https://aka.ms/CosmosDBlearn