Synapse 101

Synapse 101 Daron Yöndem http://daron.me @daronyondem

Platform Azure Data Lake Storage Common Data Model Enterprise Security
Optimized for Analytics METASTORE SECURITY MANAGEMENT MONITORING DATA INTEGRATION Analytics Runtimes PROVISIONED SERVERLESS Form Factors SQL Languages Python .NET Java Scala R Experience Azure Synapse Studio Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence METASTORE SECURITY MANAGEMENT MONITORING

Linked Services • Linked services define the connection information needed
to connect to external resources. • Easy cross platform data migration • Represents data store or compute resources

Datasets Once a dataset is defined, it can be used
in pipelines and sources of data or as sinks of data.

Pipelines • 90+ Connectors • Various activities • Supports common
loading patterns. • Fully parallel loading into data lake or SQL tables. • Graphical development experience.

Data Flows • Handle upserts, updates, deletes on sql sinks
• Commonly used ETL patterns(Sequence generator/Lookup transformation/SCD…) • Add file handling (move files after read, write files to file names described in rows etc)

Triggers Triggers represent a unit of processing that determines when
a pipeline execution needs to be kicked off. • Scheduled • Event based • Tumbling window

Modern Data Warehouse STORE VISUALIZE INGEST PREPARE TRANSFORM & ENRICH
SERVE AZURE SYNAPSE ANALYTICS Data Sources

Exploratory Data Analysis Preparing To Transform

CSV Files

Parquet Files

Go batch

Applying Transformations

Code based transformations - Spark Starting from a table, auto-
generate a single line of PySpark code that makes it easy to load a SQL table into a Spark dataframe.

Code based transformations - SQL

Transform with Notebooks • Allows to write multiple languages in
one notebook • Offers use of temporary tables across languages • Language support for Syntax highlight, syntax error, syntax code completion, smart indent, code folding • Export results

Transform with Pipelines and Data Flows

Transform with Serverless • An interactive query service that provides
T-SQL queries over high scale data in Azure Storage. • No infrastructure • Pay only for query execution • T-SQL syntax to query data • Supports data in various formats (Parquet, CSV, JSON) • Support for BI ecosystem

Machine Learning in Azure Synapse Analytics

Making Predictions with T-SQL Azure Machine Learning or Azure Synapse
Spark Notebook Train Model Convert to ONNX Export to Storage Storage Models Azure Synapse SQL SQL Script Read model Load into Table Insert Predictions Azure Synapse SQL SQL Script Load from Table Use Predict Create the model Register the model Use the model

From ingestion to visualization. Demo

Databricks vs Synapse • If you are primarily looking for
a Data Warehousing solution, go with Azure Synapse Analytics. • If looking for a Spark solution and don’t have data warehousing needs, go with Azure Databricks. In case of Spark based ML scenarios, include Azure Machine Learning from within Azure Databricks for experiment tracking, automated machine learning and MLOPs. • If heavily invested in Spark and have data warehousing needs, go with both Azure Databricks and Azure Synapse.

Links worth sharing Microsoft Cloud Workshop for Azure Synapse Analytics
and AI • https://drn.fyi/2FkuMCw Data and AI Engagement Accelerators for PoC • https://drn.fyi/3dgf87C

Thanks http://daron.me | @daronyondem Download slides here; http://decks.daron.me

Synapse 101

Synapse 101

Daron Yondem

More Decks by Daron Yondem

Other Decks in Technology

Featured

Transcript

Synapse 101 Daron Yöndem http://daron.me @daronyondem

Platform Azure Data Lake Storage Common Data Model Enterprise Security

Linked Services • Linked services define the connection information needed

Datasets Once a dataset is defined, it can be used

Pipelines • 90+ Connectors • Various activities • Supports common

Data Flows • Handle upserts, updates, deletes on sql sinks

Triggers Triggers represent a unit of processing that determines when

Modern Data Warehouse STORE VISUALIZE INGEST PREPARE TRANSFORM & ENRICH

Exploratory Data Analysis Preparing To Transform

CSV Files

Parquet Files

Go batch

Applying Transformations

Code based transformations - Spark Starting from a table, auto-

Code based transformations - SQL

Transform with Notebooks • Allows to write multiple languages in

Transform with Pipelines and Data Flows

Transform with Serverless • An interactive query service that provides

Machine Learning in Azure Synapse Analytics

Making Predictions with T-SQL Azure Machine Learning or Azure Synapse

From ingestion to visualization. Demo

Databricks vs Synapse • If you are primarily looking for

Links worth sharing Microsoft Cloud Workshop for Azure Synapse Analytics

Thanks http://daron.me | @daronyondem Download slides here; http://decks.daron.me