store optimized for analytical queries Transactional Store Row store optimized for transactional operations Azure Cosmos DB Azure Synapse Analytics Container Cloud-Native HTAP Azure Synapse Link Serverless SQL pool Auto-Sync Machine learning Big data analytics BI Dashboards Operational Data Generate near real-time insights on your operational data Apache Spark pool Existing SQL API Containers - GA in March 2022 Newly created SQL API containers Newly created Mongo DB API containers Existing Mongo DB API Containers - On roadmap
Change Feed with Data Factory / Synapse Mapping Data Flow Mapping Data Flow Sources Cosmos DB Targets Synapse Analytics Data Factory Change Feed Write data to target 15+ data stores are supported with Mapping Data Flow Read data as source
time analytics with Synapse Serverless SQL pool & Power BI Serverless SQL pool Sources Transactional Store Synapse Analytics Analytical Store Cosmos DB (Synapse Link enabled) Auto sync SQL database (As metastore) Power BI Read data via query DirectQuery or Import
based data processing with Synapse Spark pool Serverless SQL pool Sources Transactional Store Synapse Analytics Analytical Store Cosmos DB (Synapse Link enabled) Auto sync SQL database (As metastore) Power BI Apache Spark pool Lake database (As metastore) Data Lake Storage Read data via query DirectQuery or Import Write processed data Read data as source
free data processing with Synapse Mapping Data Flow Serverless SQL pool Sources Transactional Store Synapse Analytics Analytical Store Cosmos DB (Synapse Link enabled) Auto sync SQL database (As metastore) Power BI Apache Spark pool Lake database (As metastore) Mapping Data Flow Data Lake Storage Read data via query DirectQuery or Import Write processed data Read data as source
archival of historical data • Transactional Store and Analytical Store can have mutually independent Time-to-Live (TTL) • Analytical Store’s storage cost is cheaper than Transactional Store’s one: Enables to achieve cost-effective archival of historical data • Transaction Store’s TTL aka “TTTL” • Analytical Store’s TTL aka “ATTL” • If TTTL <= ATTL • Items in Transactional Store are deleted before items in Analytical Store • If ATTL is -1 • Analytical Store keeps all historical data • ATTL: On (no default) == ATTL -1 (See more info) https://docs.microsoft.com/en-us/azure/cosmos-db/analytical-store-introduction#cost-effective-archival-of-historical-data
and partition pruning is important to improve the efficiency of reading data using Apache Spark • Partitioning - Separate directories to store data according to the value of fields that frequently specified in query filter conditions • Partition pruning - Avoid reading unnecessary partition data in query Data in ADLS Gen2 Spark SQL SELECT * FROM example_logs WHERE event_year=2022 AND event_month=02 example_logs / event_year=2022 / event_month=01 / xxx.parquet example_logs / event_year=2022 / event_month=02 / xxx.parquet example_logs / event_year=2022 / event_month=03 / xxx.parquet (Note) Analytical Store is not partitioned by default
update or delete operations Slow data ingestion • Output data from Analytical Store partitioned by specified fields to ADLS Gen2 using Synapse Spark • The ADLS Gen2 is primary storage account linked to Synapse Workspace Benefits Reduced data scanning from partition pruning Query performance improvements Current limitations Only available for Synapse Spark, Cosmos DB SQL API Can only point to the primary storage account Additional costs Synapse Spark (As compute) ADLS Gen2 (As storage)
- Overview Custom partitioning in Azure Synapse Link for Azure Cosmos DB (Preview) | Microsoft Docs Product docs - How-to guide Configure custom partitioning to partition analytical store data (Preview) | Microsoft Docs Update published in Mar. 2022 Public preview: Azure Synapse Link for Azure Cosmos DB partitioning Spark 3.1 | Azure updates | Microsoft Azure Update published in Nov. 2021 Azure Synapse Link for Azure Cosmos DB: Custom partitioning support in public preview | Azure updates | Microsoft Azure
Flow 2. Near-real time analytics with Synapse Serverless SQL pool & Power BI 3. Code free data processing with Synapse Mapping Data Flow 4. Custom partitioning (Preview)
DB Change Feed with Data Factory / Synapse Mapping Data Flow • Prefer Low-code / No-code style • Don’t want to / can’t use Synapse Link for Cosmos DB • Need to process data with shorter latency than Synapse Link for Cosmos DB 2 Near-real time analytics with Synapse Serverless SQL pool & Power BI • Appropriate for near-real time analytics • BI with fully serverless, cost-effective way 3 Code based data processing with Synapse Spark pool • Good to use with #2 & Custom partitioning (Preview) • Large volume of data, which should be aggregated for analytics & BI 4 Code free data processing with Synapse Mapping Data Flow • Good to use with #2 & Custom partitioning (Preview) • Unlike #3, if Low-code / No-code style is preferable, use #4 5 Cost-effective archival of historical data • Need to retain read-only copy cost-effectively