Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Synapse 101

Synapse 101

This is an Intro to Synapse session I hosted for the PowerBI Turkey Meetup group.

Daron Yondem

October 10, 2020
Tweet

More Decks by Daron Yondem

Other Decks in Technology

Transcript

  1. Synapse 101
    Daron Yöndem
    http://daron.me
    @daronyondem

    View Slide

  2. Platform
    Azure
    Data Lake Storage
    Common Data Model
    Enterprise Security
    Optimized for Analytics
    METASTORE
    SECURITY
    MANAGEMENT
    MONITORING
    DATA INTEGRATION
    Analytics Runtimes
    PROVISIONED SERVERLESS
    Form Factors
    SQL
    Languages
    Python .NET Java Scala R
    Experience Azure Synapse Studio
    Artificial Intelligence / Machine Learning / Internet of Things
    Intelligent Apps / Business Intelligence
    METASTORE
    SECURITY
    MANAGEMENT
    MONITORING

    View Slide

  3. Linked Services
    • Linked services define the connection
    information needed to connect to
    external resources.
    • Easy cross platform data migration
    • Represents data store or compute
    resources

    View Slide

  4. Datasets
    Once a dataset is defined, it can be used in pipelines and sources
    of data or as sinks of data.

    View Slide

  5. Pipelines
    • 90+ Connectors
    • Various activities
    • Supports common loading patterns.
    • Fully parallel loading into data lake or SQL tables.
    • Graphical development experience.

    View Slide

  6. Data Flows
    • Handle upserts, updates, deletes on sql sinks
    • Commonly used ETL patterns(Sequence generator/Lookup
    transformation/SCD…)
    • Add file handling (move
    files after read, write files
    to file names described
    in rows etc)

    View Slide

  7. Triggers
    Triggers represent a unit of processing
    that determines when a pipeline
    execution needs to be kicked off.
    • Scheduled
    • Event based
    • Tumbling window

    View Slide

  8. Modern Data Warehouse
    STORE
    VISUALIZE
    INGEST PREPARE TRANSFORM &
    ENRICH
    SERVE
    AZURE SYNAPSE ANALYTICS
    Data
    Sources

    View Slide

  9. Exploratory Data Analysis
    Preparing To Transform

    View Slide

  10. CSV Files

    View Slide

  11. Parquet Files

    View Slide

  12. Go batch

    View Slide

  13. Applying Transformations

    View Slide

  14. Code based transformations - Spark
    Starting from a table, auto-
    generate a single line of
    PySpark code that makes it
    easy to load a SQL table into a
    Spark dataframe​.

    View Slide

  15. Code based transformations - SQL

    View Slide

  16. Transform with Notebooks
    • Allows to write multiple languages in one notebook
    • Offers use of temporary tables across languages
    • Language support for Syntax highlight, syntax error, syntax
    code completion, smart indent, code folding
    • Export results

    View Slide

  17. Transform with Pipelines and Data
    Flows

    View Slide

  18. Transform with Serverless
    • An interactive query service that provides T-SQL queries over
    high scale data in Azure Storage.
    • No infrastructure
    • Pay only for query execution
    • T-SQL syntax to query data
    • Supports data in various formats (Parquet, CSV, JSON)
    • Support for BI ecosystem

    View Slide

  19. Machine Learning in Azure Synapse Analytics

    View Slide

  20. Making Predictions with T-SQL
    Azure Machine Learning or
    Azure Synapse Spark
    Notebook
    Train Model
    Convert to ONNX
    Export to Storage
    Storage
    Models
    Azure Synapse SQL
    SQL Script
    Read model
    Load into Table
    Insert Predictions
    Azure Synapse SQL
    SQL Script
    Load from Table
    Use Predict
    Create the model Register the model Use the model

    View Slide

  21. From ingestion to visualization.
    Demo

    View Slide

  22. Databricks vs Synapse
    • If you are primarily looking for a Data Warehousing solution, go
    with Azure Synapse Analytics.
    • If looking for a Spark solution and don’t have data warehousing
    needs, go with Azure Databricks. In case of Spark based ML
    scenarios, include Azure Machine Learning from within Azure
    Databricks for experiment tracking, automated machine learning
    and MLOPs.
    • If heavily invested in Spark and have data warehousing needs, go
    with both Azure Databricks and Azure Synapse.

    View Slide

  23. Links worth sharing
    Microsoft Cloud Workshop for Azure Synapse Analytics and AI
    • https://drn.fyi/2FkuMCw
    Data and AI Engagement Accelerators for PoC
    • https://drn.fyi/3dgf87C

    View Slide

  24. Thanks
    http://daron.me | @daronyondem
    Download slides here;
    http://decks.daron.me

    View Slide