Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apr2023 [Slides]: Azure Synapse by isolutions

Apr2023 [Slides]: Azure Synapse by isolutions

https://www.meetup.com/de-DE/microsoft-azure-zurich-user-group/events/291570644/

Details
Please note: This meetup will be presented in GERMAN.

Wir freuen uns auf dieses in-person Meetup gehostet von isolutions (https://isolutions.ch/) in ihrem luxuriösen Office im "The Circle" am Flughafen Zürich!

Azure Synapse ist das Major Data & Analytics Tool in der Microsoft Cloud. Es bietet eine Unmenge an Funktionen und Besonderheiten. In diesen Sessions möchten wird das Tool Synapse selbst vorstellen, sowie unsere Erfahrung und Projektbeispiele zu Security, CI / CD und Data Engineering mit Azure Synapse teilen.

Speakers: Laurent Christen (Senior Architect), Marc Rufer (Lead Software Developer) und Maximiliano Luchsinger (Data Engineer) und Linus Niederhauser.

***
AGENDA

18:00 Welcome (isolutions und Azure Zürich User Group)
18:05 Session 1
Aufbau und Komponenten von Azure Synapse (Laurent Christen)
Flows, Pipelines & Notebooks (Max Luchsinger / Linus Niederhauser)
18:50 Drinks, Food and Networking
19:20 Session 2
CI / CD (Marc Rufer)
Security Konzept (Laurent Christen & Max Luchsinger)
20:30 Ende
Dieses Event findet in DEUTSCH statt und ist, wie immer, kostenlos und offen für alle Interessierten. Vielen Dank an isolutions für das Hosting und Sponsoring dieses Meetups.

Bitte reserviere Deinen Platz so bald wie möglich. Danke!

Azure Zurich User Group

April 04, 2023
Tweet

More Decks by Azure Zurich User Group

Other Decks in Technology

Transcript

  1. Linus Niederhauser Data Analyst Marc Rufer Senior Software Engineer Maximiliano

    Luchsinger Data Engineer Speakers Laurent Christen Senior Solution Architect
  2. Data integration Data warehousing Big data analytics The first unified,

    cloud native platform for converged analytics Azure Synapse is the only unified platform for analytics, blending big data, data warehousing, and data integration into a single cloud native service for end-to-end analytics at cloud scale. Azure Synapse Analytics
  3. Azure Synapse Analytics The first unified, cloud native platform for

    converged analytics On-prem Cloud data Applications Streaming data Data Warehouse Data Lake Event Brokers Azure Synapse Link Cosmos DB SQL Server Dataverse
  4. Powered by a new cloud native distributed SQL engine Powered

    by Apache Spark using clustered compute power at scale in your preferred language: Python, Spark (Scala), .NET Spark, SQL, or R Azure Synapse Analytics
  5. Cloud native ETL/ELT 95+ connectors available Secure connectivity to on-premise

    data sources, other clouds, and SaaS applications Code-first and low/no code design interfaces Schedule and Event based triggering
  6. Synapse Studio Toolbox, Editor, Management Automatic code completion (Intellisense) Script

    collaboration within the Workspace Built-in visualizations Easily switch between clusters
  7. Ready, Set, Go…. Fully ARM integrated Minimal costs by default

    You get… Synapase Workspace Default Storage Account Gen2 Serverless SQL Spark pool (paused) (Don’t forget firewall)
  8. • Pipelines • Get data from A to B •

    Orchestrate with Triggers • Monitoring & Alerting • Spark Notebooks • Complex data transformations – custom code • Different programming languages (Python, SQL, Scala, C#, R) • Data Flows • Simple data transformations – no code Role - Data Engineer
  9. • Figuring out bottlenecks • Data Source timeouts • VM

    Sizing for the Self Hosted Integration Runtime (SHIR) • Notebooks • Keep the notebooks tidy • No possibility of using SHIR from Notebooks • Data Flow • Low-Code debugging (e.g. breakpoints non-existent) Struggles, Challenges and Learnings
  10. • SQL • Explorative Data Analytics (EDA) using SQL •

    Data Aggregations • Data Transformations • Power BI • Power BI Integration into Synapse • Build visuals with (transformed and aggregated) Data directly in Synapse • Check whether Aggregations and Transformations produce the required Data Role - Data Analyst
  11. Figuring out what kind of SQL Pool to use (serverless

    vs. dedicated) • Serverless: • Great for data exploration • Quick and simple transformations • Dedicated: • Better performance when it comes to more complex queries and transformations. • More functionality (SQL DWH in the background) • More expensive • Hints: For easy setup but higher query running time, choose serverless. For more elaborate setup but fast performance, choose dedicated. Cost Management can be difficult for SQL Pools Struggles, Challenges and Learnings
  12. Train Models • Apache Spark Pool using MLib • Azure

    Machine Learning automated ML Deployment and scoring • Apache Spark Pools • TSQL PREDICT Function (Dedicated SQL Pool needed) SynapseML • APIs for pre-built intelligent services (ex.: Azure Cognitive Services) Role - Data Scientist
  13. • ML Integration is not as clear cut as it

    seems • Cost Management can be difficult • Limited functionality for data science • primarily designed for data warehousing and business intelligence tasks • Limited customization Struggles, Challenges and Learnings
  14. • CI/CD not yet fully developed in the context of

    data services • Bugs / Known issues • Workarounds • Some differences to the CI/CD workflow software developers are used to • Focus: Continuous Delivery • Initial implementation of CD in Q1 2022 Introduction
  15. • Real live example of an implementation for one of

    our customers • 3 Stages (DEV, TEST, PROD) • Separate resource groups per stage • 3 resource groups per stage Initial Situation
  16. • Development takes place in Synapse Studio in the Azure

    Portal Target environment: DEV Development Workflow
  17. • A Git repository must be associated with the Synapse

    workspace in DEV • Changes are published as soon as changes are ready to be deployed to DEV Development Workflow
  18. Prerequisites • Synapse Workspace Deployment extension installed from Visual Studio

    Marketplace @Azure DevOps organization • Service Connection for each environment set up in Azure DevOps project • Role Synapse Administrator assigned to service principal of Azure DevOps service connection Continuous Delivery with Az DevOps YAML pipeline
  19. • Secrets and connection strings stored in Azure Key Vault

    • Azure Key Vault Access policy / RBAC set up for service principal of Azure DevOps service connection • Environments set up in Azure DevOps project • Approval for PROD environment Continuous Delivery with Az DevOps YAML pipeline
  20. YAML pipeline • Location: workspace_publish branch Ideally in a separate

    directory (i.e. deploy or pipeline(s)) • Trigger • Branch: workspace_publish • Path: Synapse workspace directory • Stages, jobs, steps • Deep dive Continuous Delivery with Az DevOps YAML pipeline Specify environment and set variables Checkout repository Stop triggers Deploy Synapse Workspace Restart triggers
  21. Workspace_publish branch DEV Continuous Delivery with Az DevOps YAML pipeline

    Synapse Studio TemplateForWorkspace.json TemplateParametersForWorkspace.json TEST PROD YAML Pipeline
  22. • Start-AzSynapseTrigger throws error but starts triggers Known-Issue (open) ->

    workaround available • Spark pools and self-hosted integration runtimes aren't created in workspace deployment task Self-hosted integration runtime needs to be created manually in new workspaces • Pipeline debugging • Run manually to avoid publishing little changes • Set System.Debug variable to true • Wording ;) – DevOps Engineer vs. Data Engineer Struggles, Challenges and Learnings
  23. Complete data protection Best-in-class security Customer & System Managed Keys

    All data encrypted by default Up to 3x levels of data encryption at rest Democratize data at scale with fine-grained ACL Proactive protection Comprehensive Compliance Category Feature Data Protection Data in transit Data encryption at rest Data discovery and classification Access Control Object level security (tables/views) Row level security Column level security Dynamic data masking Column level encryption Authentication SQL login Azure active directory Multi-factor authentication Network Security Managed virtual network Custom virtual network Firewall Azure ExpressRoute Azure Private Link Threat protection Threat detection Auditing Vulnerability assessment Isolation Dedicated metadata store Hosted in customer tenant
  24. • Use managed identities as much as possible • Synapse

    RBAC != Azure AD RBAC • Caution: Synapse cannot verify nested Azure Groups • Use Key Vault also as configuration store Some Security recommendations…