Design and implementation of Cosmos DB Change Feed-centric architecture

Design and implementation of Cosmos DB Change Feed-centric architecture Kazuyuki
Miyake Tatsuro Shibamura

Agenda 1. Change Feed-centric architecture Design & Strategy Kazuyuki Miyake
(Microsoft MVP for Azure) @kazuyukimiyake 2. Change Feed-centric architecture Deep Dive Tatsuro Shibamura (Microsoft MVP for Azure) @shibayan

1. Cosmos DB Change Feed-centric architecture Design & Strategy

Massive data processing Needs and Challenges Balancing massive data writing
and complex queries No performance degradation under Massive writing Can handle different types of queries Cost model to pay as you go 5

Limitations of traditional architectures Try to handle everything in one
big datastore... Write-optimized datastore are weak on complex queries Query-optimized data stores are weak to massively concurrent writes -> As a result, rely on over-spec datastores 6

CQRS + Materialized-Views 1. Separate write and read to absorb
differences 2. Deploy a query-optimized Materialized-View 7

Cosmos DB ChangeFeed + Azure Functions No need to implement
mechanisms for CQRS Synchronized in near Real-time 8

Scalable ChangeFeed Centric Architecture Various services can be added starting
from Change Feed 9

ChangeFeed Centric Case Study - JFE Engineering 10

2. Cosmos DB Change Feed-centric architecture Deep Dive 11

Two Change Feed usage patterns 1. Push model -> Data
Transformation, Stream Processing 2. Pull model -> Batch Processing 12

Data Transformation, Stream Processing Used for processing to stream data
with low latency The best solution is to use CosmosDBTrigger in Azure Functions For write-fast storage such as SQL Database and Redis Cache Also used when writing back to Cosmos DB (creating materialized view) 13

Sample code - Push model public class Function1 { public
Function1(CosmosClient cosmosClient) { _container = cosmosClient.GetContainer("SampleDB", "MaterializedView"); } private readonly Container _container; [FunctionName("Function1")] public async Task Run([CosmosDBTrigger( databaseName: "SampleDB", collectionName: "TodoItems", LeaseCollectionName = "leases")] IReadOnlyList<Document> input, ILogger log) { var tasks = new Task[input.Count]; for (int i = 0; i < input.Count; i++) { // Change the partition key and write it back (actually, do advanced conversion) var partitionKey = new PartitionKey(input[i].GetPropertyValue<string>("anotherKey")); tasks[i] = _container.UpsertItemStreamAsync(new MemoryStream(input[i].ToByteArray()), partitionKey); } await Task.WhenAll(tasks); } } 14

Batch Processing Use when you need to process a large
amount of data at one time It is practical to implement it using TimerTrigger in Azure Functions Used for archiving to Blob Storage / Data Lake Storage Gen 2 Storage GPv2 and Data Lake Storage Gen 2 are charged by the number of write transactions, so writing stream data every time increases costs 15

Sample code - Pull model public class Function2 { public
Function2(CosmosClient cosmosClient) { _container = cosmosClient.GetContainer("SampleDB", "TodoItems"); } private readonly Container _container; [FunctionName("Function2")] public async Task Run([TimerTrigger("0 */5 * * * *")] TimerInfo myTimer, ILogger log) { var continuationToken = await LoadContinuationTokenAsync(); var changeFeedStartFrom = continuationToken != null ? ChangeFeedStartFrom.ContinuationToken(continuationToken) : ChangeFeedStartFrom.Now(); var changeFeedIterator = _container.GetChangeFeedIterator<TodoItem>(changeFeedStartFrom, ChangeFeedMode.Incremental); while (changeFeedIterator.HasMoreResults) { try { var items = await changeFeedIterator.ReadNextAsync(); // TODO: Implementation } catch (CosmosException ex) when (ex.StatusCode == HttpStatusCode.NotModified) { continuationToken ??= ex.Headers.ContinuationToken; break; } } await SaveContinuationTokenAsync(continuationToken); } } 16

For a reliable Change Feed-centric architecture Improving resiliency Idempotency and
eventual consistency Avoid inconsistent states 17

Improving resiliency - Retry policy CosmosDBTrigger proceeds to the next
Change Feed when an execution error occurs. Retry policy is used because data in case of failure will be lost without being processed again. Use FixedDelayRetry or ExponentialBackoffRetry with an unlimited ( -1 ) maximum number of retries. Change Feed will not proceed until successful, so no data will be lost. 18

Sample code - Retry policy public class Function1 { //
infinity retry with 10 sec interval [FixedDelayRetry(-1, "00:00:10")] [FunctionName("Function1")] public async Task Run([CosmosDBTrigger( databaseName: "SampleDB", collectionName: "TodoItems", LeaseCollectionName = "leases")] IReadOnlyList<Document> input) { // TODO: Implementation } } 19

Focus on idempotency and eventual consistency Coding for idempotency whenever
possible For storage that can be overwrite or delete (Cosmos DB / SQL Database / etc) When it is difficult to ensure idempotency, focus on eventual consistency. Focus on "At least once" For storage that can only be append (Blob Storage / Data Lake Storage Gen 2) 20

Avoid inconsistent states - Graceful shutdown Azure Functions will be
restarted when a new version is deployed or platform is updated If the host is restarted while executing a Function, the states may be inconsistent Implement Graceful shutdown to avoid inconsistent states Increase resiliency by combining with Retry policy 21

Sample code - Graceful shutdown public class Function1 { //
infinity retry with 10 sec interval [FixedDelayRetry(-1, "00:00:10")] [FunctionName("Function1")] public async Task Run([CosmosDBTrigger( databaseName: "SampleDB", collectionName: "TodoItems", LeaseCollectionName = "leases")] IReadOnlyList<Document> input, CancellationToken cancellationToken) { try { // Pass cancellation token await Task.Delay(TimeSpan.FromSeconds(5), cancellationToken); } catch (OperationCanceledException) { // TODO: Implement rollback throw; } } } 22

References Azure Cosmos DB trigger for Functions 2.x and higher
| Microsoft Docs Azure/azure-cosmos-dotnet-v3: .NET SDK for Azure Cosmos DB for the core SQL API Change feed pull model | Microsoft Docs Azure Functions error handling and retry guidance | Microsoft Docs Cancellation tokens - Develop C# class library functions using Azure Functions | Microsoft Docs 23

Design and implementation of Cosmos DB Change F...

Design and implementation of Cosmos DB Change Feed-centric architecture

miyake

More Decks by miyake

Other Decks in Technology

Featured

Transcript