Part 3 - Modern Data Warehouse with Azure

Nilesh Gule @nileshgule | www.HandsOnArchitect.com Modern Data Warehouse Using Azure

$whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github”
: “https://github.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “email” : “[email protected]", “likes” : “Technical Evangelism, Cricket”, “co-organizer” : “Azure Singapore UG” }

Credits: James Serra

Part 1 - Recap – ADLS & ADF • Petabyte
scale storage • Hierarchical namespace • Hadoop compatible access with ABFS driver ADLS - Main features ADF - Main features • Cloud ETL service • Scale-out serverless data integration & data transformation • Code-free UI • Monitoring & Management

Part 2 - Recap • Collaborative Spark based Analytical service
• Different cluster types (automated / interactive / pool) • Autoscale based on workloads • Fine grained access controls Azure Databricks - Main features

Azure Synapse Limitless analytics service for enterprise data warehousing and
Big Data analytics

Parallelism • Uses many separate CPUs running in parallel to
execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing

Synapse Architecture https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture • Control Node • Compute Node •
Data Movement Service (DMS) Components • Hash • Round Robin • Replicate Distributions

Synapse Data Distributions https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture • Highest query perf for joins
& aggregations on large tables • Rows per distribution varies Hash • Fastest query performance for small tables Replicated

ALTER DATABASE ContosoDW MODIFY (service_objective = 'DW1000'); DWU DW100 DW200
DW300 DW400 DW500 DW1000 DW1500 DW2000 DW2500 DW3000 DW5000 DW6000 DW7500 DW10000 DW15000 DW30000

Azure SQL Data Warehouse Engine Worker1 Azure Storage Blob(s) D12
D11 D13 D14 D15 D16 D18 D17 D19 D20 D22 D21 D23 D24 D25 D26 D28 D27 D29 D30 D32 D31 D33 D34 D35 D36 D38 D37 D39 D40 D42 D41 D43 D44 D45 D46 D48 D47 D49 D50 D52 D51 D53 D54 D55 D56 D58 D57 D59 D60 D2 D1 D3 D4 D5 D6 D8 D7 D9 D10

Azure SQL Data Warehouse Engine Worker4 Azure Storage Blob(s) Worker1
Worker5 Worker3 Worker2 Worker6 D52 D51 D53 D54 D55 D56 D58 D57 D59 D60 D12 D11 D13 D14 D15 D16 D18 D17 D19 D20 D22 D21 D23 D24 D25 D26 D28 D27 D29 D30 D32 D31 D33 D34 D35 D36 D38 D37 D39 D40 D42 D41 D43 D44 D45 D46 D48 D47 D49 D50 D2 D1 D3 D4 D5 D6 D8 D7 D9 D10

Azure Databricks – SQL DW Connectivity

External Data Sources • External Data Source • Hadoop, ADLS
• External File Format • File types • Delimited Text, Hive RCFile, Hive ORC file, Parquet • Data Compression • Gzip, Snappy • Field Delimiters • Date Format • External Table

What workloads are NOT suitable? • High frequency reads and
writes. • Large numbers of singleton selects. • High volumes of single row inserts. Operational workloads (OLTP) • Row by row processing needs. • Incompatible formats (XML). Data Preparations SQL SQL

What Workloads are Suitable? Store large volumes of data. Consolidate
disparate data into a single location. Shape, model, transform and aggregate data. Batch/Micro-batch loads. Perform query analysis across large datasets. Ad-hoc reporting across large data volumes. All using simple SQL constructs. Analytics

Summary • MPP Architecture • Can be paused • Optimized
for analytics workloads • Supports multiple external file formats • Works with Polybase Azure Synapse - Main features

SQL Server & SQL Data Warehouse Differences Azure Synapse Workload
Management External Data Source External File Formats External Table SQL Data Warehouse Benchmark

References – MS Learn https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse

Thank you very much Code with Passion and Strive for
Excellence https://www.slideshare.net/nileshgule/presentations https://speakerdeck.com/nileshgule/

Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and
Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com

Part 3 - Modern Data Warehouse with Azure

Part 3 - Modern Data Warehouse with Azure

Nilesh Gule

More Decks by Nilesh Gule

Other Decks in Technology

Featured

Transcript

Nilesh Gule @nileshgule | www.HandsOnArchitect.com Modern Data Warehouse Using Azure

$whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github”

Credits: James Serra

Part 1 - Recap – ADLS & ADF • Petabyte

Part 2 - Recap • Collaborative Spark based Analytical service

Azure Synapse Limitless analytics service for enterprise data warehousing and

Parallelism • Uses many separate CPUs running in parallel to

Synapse Architecture https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture • Control Node • Compute Node •

Synapse Data Distributions https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture • Highest query perf for joins

ALTER DATABASE ContosoDW MODIFY (service_objective = 'DW1000'); DWU DW100 DW200

Azure SQL Data Warehouse Engine Worker1 Azure Storage Blob(s) D12

Azure SQL Data Warehouse Engine Worker4 Azure Storage Blob(s) Worker1

Azure Databricks – SQL DW Connectivity

External Data Sources • External Data Source • Hadoop, ADLS

What workloads are NOT suitable? • High frequency reads and

What Workloads are Suitable? Store large volumes of data. Consolidate

Summary • MPP Architecture • Can be paused • Optimized

SQL Server & SQL Data Warehouse Differences Azure Synapse Workload

References – MS Learn https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse

Thank you very much Code with Passion and Strive for

Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and

Q&A