Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Synapse 101

Synapse 101

This is an Intro to Synapse session I hosted for the PowerBI Turkey Meetup group.


Daron Yondem

October 10, 2020


  1. Synapse 101 Daron Yöndem http://daron.me @daronyondem

  2. Platform Azure Data Lake Storage Common Data Model Enterprise Security

    Optimized for Analytics METASTORE SECURITY MANAGEMENT MONITORING DATA INTEGRATION Analytics Runtimes PROVISIONED SERVERLESS Form Factors SQL Languages Python .NET Java Scala R Experience Azure Synapse Studio Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence METASTORE SECURITY MANAGEMENT MONITORING
  3. Linked Services • Linked services define the connection information needed

    to connect to external resources. • Easy cross platform data migration • Represents data store or compute resources
  4. Datasets Once a dataset is defined, it can be used

    in pipelines and sources of data or as sinks of data.
  5. Pipelines • 90+ Connectors • Various activities • Supports common

    loading patterns. • Fully parallel loading into data lake or SQL tables. • Graphical development experience.
  6. Data Flows • Handle upserts, updates, deletes on sql sinks

    • Commonly used ETL patterns(Sequence generator/Lookup transformation/SCD…) • Add file handling (move files after read, write files to file names described in rows etc)
  7. Triggers Triggers represent a unit of processing that determines when

    a pipeline execution needs to be kicked off. • Scheduled • Event based • Tumbling window

  9. Exploratory Data Analysis Preparing To Transform

  10. CSV Files

  11. Parquet Files

  12. Go batch

  13. Applying Transformations

  14. Code based transformations - Spark Starting from a table, auto-

    generate a single line of PySpark code that makes it easy to load a SQL table into a Spark dataframe​.
  15. Code based transformations - SQL

  16. Transform with Notebooks • Allows to write multiple languages in

    one notebook • Offers use of temporary tables across languages • Language support for Syntax highlight, syntax error, syntax code completion, smart indent, code folding • Export results
  17. Transform with Pipelines and Data Flows

  18. Transform with Serverless • An interactive query service that provides

    T-SQL queries over high scale data in Azure Storage. • No infrastructure • Pay only for query execution • T-SQL syntax to query data • Supports data in various formats (Parquet, CSV, JSON) • Support for BI ecosystem
  19. Machine Learning in Azure Synapse Analytics

  20. Making Predictions with T-SQL Azure Machine Learning or Azure Synapse

    Spark Notebook Train Model Convert to ONNX Export to Storage Storage Models Azure Synapse SQL SQL Script Read model Load into Table Insert Predictions Azure Synapse SQL SQL Script Load from Table Use Predict Create the model Register the model Use the model
  21. From ingestion to visualization. Demo

  22. Databricks vs Synapse • If you are primarily looking for

    a Data Warehousing solution, go with Azure Synapse Analytics. • If looking for a Spark solution and don’t have data warehousing needs, go with Azure Databricks. In case of Spark based ML scenarios, include Azure Machine Learning from within Azure Databricks for experiment tracking, automated machine learning and MLOPs. • If heavily invested in Spark and have data warehousing needs, go with both Azure Databricks and Azure Synapse.
  23. Links worth sharing Microsoft Cloud Workshop for Azure Synapse Analytics

    and AI • https://drn.fyi/2FkuMCw Data and AI Engagement Accelerators for PoC • https://drn.fyi/3dgf87C
  24. Thanks http://daron.me | @daronyondem Download slides here; http://decks.daron.me