Slide 1

Slide 1 text

Battle Of Modern Data Architectures Azure Solution Architect Macaw Netherlands @DaveRuijter ModernData.ai dataMinds Connect 2019 2 Have you had that moment, where you are in doubt which Azure service to use for your 'Modern Data Warehousing' solution? So many good options.. Like the Mapping/Wrangling Data Flows capabilities in Azure Data Factory, or the Delta feature in Databricks! In this session we will look at the different services, compare them using real use-cases, and learn how to choose the best fit for each scenario. Dave Ruijter 10/8/2019

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Our Partners dataMinds Connect 2019 4 10/8/2019

Slide 4

Slide 4 text

Datawarehousing in the cloud… dataMinds Connect 2019 5 8/10/2019

Slide 5

Slide 5 text

Datawarehousing in the cloud… Ingest Transform Load Result Storage Orchestration Store

Slide 6

Slide 6 text

Datawarehousing in the cloud… Ingest Transform Load Result Storage Compute Engine Azure Data Factory SQL DB / DW Store Azure Data Lake Storage (ADLS) Azure Data Lake Storage (ADLS) Compute Engine

Slide 7

Slide 7 text

Why is it so complicated? • Various sources/formats • Schema drift • Scalability / “big data” • Monitoring & Auditing • Keep up with the business needs • Version control • ALM / DevOps dataMinds Connect 2019 8 8/10/2019

Slide 8

Slide 8 text

Datawarehousing in the cloud… dataMinds Connect 2019 9 10/8/2019

Slide 9

Slide 9 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 10 10/8/2019

Slide 10

Slide 10 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 11 10/8/2019

Slide 11

Slide 11 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 12 10/8/2019

Slide 12

Slide 12 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 13 10/8/2019

Slide 13

Slide 13 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 14 10/8/2019

Slide 14

Slide 14 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 15 10/8/2019

Slide 15

Slide 15 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks • Power BI dataflows dataMinds Connect 2019 16 10/8/2019

Slide 16

Slide 16 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks (Delta) • Power BI dataflows dataMinds Connect 2019 17 10/8/2019

Slide 17

Slide 17 text

Datawarehousing in the cloud… • Azure SQL /w PolyBase • HDInsight • Azure Data Lake Analytics • Azure SSIS Integration Runtime • Azure Data Factory (Mapping Data Flows) • Azure Databricks (Delta) • Power BI dataflows dataMinds Connect 2019 18 10/8/2019

Slide 18

Slide 18 text

The battle! • round #1: capabilities • round #2: developer experience • round #3: operator experience • round #4: security • round #5: roadmap / future readiness • round #6: coolness dataMinds Connect 2019 19 8/10/2019

Slide 19

Slide 19 text

capabilities round.set(1) 01 Dataminds test presentation 20 10/8/2019

Slide 20

Slide 20 text

Capabilities – SSIS in Azure dataMinds Connect 2019 21 8/10/2019

Slide 21

Slide 21 text

Capabilities – SSIS in Azure • Pros • Combination of simple & advanced options • Continue with existing solution / less initial investment • Mature tooling for meta-driven generation • Git integration in VS • Cons: • Not serverless • No further development (continuity?) • Integration with Azure PaaS (data Source connectivity) • No advanced analytics / data science • No streaming data support dataMinds Connect 2019 22 8/10/2019

Slide 22

Slide 22 text

Capabilities - ADF (MDF!) dataMinds Connect 2019 23 8/10/2019

Slide 23

Slide 23 text

Capabilities - ADF (MDF!) dataMinds Connect 2019 24 8/10/2019

Slide 24

Slide 24 text

Capabilities - ADF (MDF!) dataMinds Connect 2019 25 8/10/2019

Slide 25

Slide 25 text

Capabilities - ADF (MDF!) • Pros • Simplicity / easy to understand / no programming required • Low time to value • DevOps ready (bit clumsy design) • Cons: • New / immature • Poor generation using meta-data driven frameworks • Limited set of (advanced) transformations • Not suited for complex tasks, fallback to functions/notebooks • Limited v-net support. MDF / Web Activities can’t access Key Vault. • Publishing is a manual action in the browser • No advanced analytics / data science • No streaming data support dataMinds Connect 2019 26 8/10/2019

Slide 26

Slide 26 text

Capabilities - Azure Databricks dataMinds Connect 2019 27 8/10/2019

Slide 27

Slide 27 text

Capabilities – Databricks Delta Essentially, it’s an optimized Spark table with SQL-like features: • ACID transactions • DELETES / UPDATES / UPSERTS • Statistics, data skipping and ZORDER clustering dataMinds Connect 2019 28 8/10/2019

Slide 28

Slide 28 text

Capabilities - Azure Databricks • Pros • Extremely versatile and scalable • Easily add streaming data • Not only applicable for data engineering “unified analytics” • Interactive notebook experience • Cloud agnostic / open source • Cons: • Steep learning curve • Not serverless (don't underestimate cluster management) • Poor Git integration • Longer time to value • Poor Service Principal support dataMinds Connect 2019 30 8/10/2019

Slide 29

Slide 29 text

developer experience round.set(2) 02 Dataminds test presentation 31 10/8/2019

Slide 30

Slide 30 text

Developer experience - SSIS • Tool: Visual Studio (crashes / manual updates) • Infrastructure-as-a-Service: Virtual Machine • Good options to generate code • Poor collaboration • Testing via dataviewers • Disconnected from ADF • Source code XML • Schema drift dataMinds Connect 2019 32 8/10/2019

Slide 31

Slide 31 text

Developer experience – ADF MDF • Tool: Browser • Platform-as-a-Service: Azure Portal • Simple to start • Can feel limited • Breaking changes • Poor collaboration • Seamless integration ADF • Source code is JSON • Testing via debug-mode • Schema drift dataMinds Connect 2019 33 8/10/2019

Slide 32

Slide 32 text

Developer expierience – Databricks • Tool: Browser • Platform-as-a-Service: Azure Portal • Just code • Good collaboration • Almost anything is possible • Multilingual • Future VS Code support • Schema drift dataMinds Connect 2019 34 8/10/2019

Slide 33

Slide 33 text

operator expierience round.set(3) 03 Dataminds test presentation 35 10/8/2019

Slide 34

Slide 34 text

Operator experience – SSIS in Azure • SQL Server Management Studio (SSMS)

Slide 35

Slide 35 text

Operator experience – SSIS in Azure • Deployment can be complicated • Debugging / troubleshooting can be intimidating • Limited integration of monitoring (ADF / Databricks)

Slide 36

Slide 36 text

Operator experience - MDF • Internet Browser (Azure Portal) • (Azure) PowerShell

Slide 37

Slide 37 text

Operator experience - Databricks • Deployment complicated (clusters/notebooks) • Internet Browser (Azure Portal) • (Azure) PowerShell • Debugging / troubleshooting can be intimidating • Limited integration of monitoring (ADF / Databricks)

Slide 38

Slide 38 text

pricing round.set(4) 04 Dataminds test presentation 44 10/8/2019

Slide 39

Slide 39 text

Pricing – SSIS in Azure • Integration Runtime costs (“it depends”) • Benefit from existing SQL Server licensing dataMinds Connect 2019 45 10/8/2019

Slide 40

Slide 40 text

Pricing – SSIS in Azure • Integration Runtime costs (“it depends” • License SQL Server • License Visual Studio dataMinds Connect 2019 46 10/8/2019 https://azure.microsoft.com/en-us/pricing/details/data-factory/ssis/

Slide 41

Slide 41 text

Pricing – SSIS in Azure • License Visual Studio (don’t forget about the DevOps server) dataMinds Connect 2019 47 10/8/2019

Slide 42

Slide 42 text

Pricing – SSIS in Azure • Development workload (3 days a week) dataMinds Connect 2019 48 10/8/2019

Slide 43

Slide 43 text

Pricing – SSIS in Azure • Production workload (2h a day) dataMinds Connect 2019 49 10/8/2019

Slide 44

Slide 44 text

ADF MDF - Pricing • Debug Mode: • “Preview Pricing” • 8 cores default • $0.112 / hour • 60 minutes default Time To Live (TTL) • Example dev-day: 10h. x 8 (cores) x $0.112 = $8.96 • Transform data in Blob Store (scheduled): • “Preview Pricing” • 8 cores default • $0.112 / hour • 10 minutes default Time To Live (TTL) • Example: 10m. compute + 10m. TTL = 0,33h. x 8 (cores) x $0.112 = $0.299 dataMinds Connect 2019 50 10/8/2019

Slide 45

Slide 45 text

Azure Databricks - Pricing dataMinds Connect 2019 51 10/8/2019

Slide 46

Slide 46 text

Azure Databricks - Pricing • Data Analytics: • Interactive Clusters only here • Power BI connection to data in cluster • Notebook collaboration experience • ‘Data Engineering Light’ • Delta not available • Notebooks not available (also no scheduling of notebooks) • Premium: • Role-based access control for notebooks, clusters, jobs, and tables • Audit Logs (preview) • JDBC/ODBC Endpoint Authentication dataMinds Connect 2019 52 10/8/2019

Slide 47

Slide 47 text

Azure Databricks - Pricing • Development: Premium Tier - Data Analytics • Production: Premium Tier – Data Engineering dataMinds Connect 2019 53 10/8/2019

Slide 48

Slide 48 text

Azure Databricks - Pricing • Development: Premium Tier - Data Analytics • Workload: 5 days a week dataMinds Connect 2019 54 10/8/2019

Slide 49

Slide 49 text

Azure Databricks - Pricing • Production : Premium Tier – Data Engineering • Workload: 2 hours a day dataMinds Connect 2019 55 10/8/2019

Slide 50

Slide 50 text

roadmap round.set(5) 05 Dataminds test presentation 56 10/8/2019

Slide 51

Slide 51 text

Roadmap – SSIS in Azure • ? dataMinds Connect 2019 57 8/10/2019

Slide 52

Slide 52 text

Roadmap – ADF MDF • Active Monitoring (watch progress live) dataMinds Connect 2019 58 8/10/2019

Slide 53

Slide 53 text

Roadmap – Databricks • C# as notebook language • Integration with Visual Studio Code dataMinds Connect 2019 59 8/10/2019

Slide 54

Slide 54 text

coolness round.set(6) 06 Dataminds test presentation 60 10/8/2019

Slide 55

Slide 55 text

Recruiting • People are increasingly looking for new tooling in job offers: • Azure • Azure Databricks • Azure Data Factory • Data Lake • Datawarehouse • DevOps

Slide 56

Slide 56 text

Job offers - examples

Slide 57

Slide 57 text

Job offers - examples

Slide 58

Slide 58 text

Job offers - examples

Slide 59

Slide 59 text

The battle! • round #1: capabilities • round #2: developer experience • round #3: operator experience • round #4: security • round #5: roadmap / future readiness • round #6: coolness dataMinds Connect 2019 65 8/10/2019

Slide 60

Slide 60 text

dataMinds Connect 2019 66 8/10/2019 Thank You

Slide 61

Slide 61 text

What do you think? Dataminds test presentation 67 http://bit.ly/dataMindsConnectSession bit.ly is CASE SENSITIVE! 1.Open the form 2.Provide constructive feedback 3.Be eligible for an amazing prize! 9/10/2019

Slide 62

Slide 62 text

Q&A 99 dataMinds Connect 2019 68 8/10/2019

Slide 63

Slide 63 text

Our Partners dataMinds Connect 2019 69 10/8/2019