Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Melhores praticas para carga de dados para o Az...

sidney cirqueira
October 24, 2020
100

Melhores praticas para carga de dados para o Azure Synapse Analytics - SQL Pool

Nessa sessão apresentei um overview rapido do novo Azure Synapse Analytics com foco nas melhores praticas para carga de dados para o Azure Synapse Analytics - SQL Pool. Você irá entender os porquês de utilizar ELT em vez de ETL em ambiente de Big Data no Azure por meio de uma arquitetura escalável, com computação flexivel e recursos de armazenamento otimizados.

sidney cirqueira

October 24, 2020
Tweet

More Decks by sidney cirqueira

Transcript

  1. Explore your PASS community Free online webinar events Connect with

    the global data community Local user groups around the world Online special interest user groups Learning on-demand and delivered to you Get involved Own your career with interactive learning built by community and guided by data experts. Get involved. Get ahead. .org
  2. Sidney Cirqueira ✓ Data Analyst – Lanlink; ✓ Microsoft Certified

    Trainer; ✓ Microsoft Learn Student Ambassador - Beta; ✓ Post Graduation in Big Data & Machine Learning - FASAM; ✓ Experience focused on Data Engineering & Artificial Intelligence on Azure;
  3. ✓ What is Azure Synapse Analytics? ✓ Synapse Analytics (formerly

    SQL DW) – Architecture ✓ Table Designer ✓ Options for loading ✓ Best practices ✓ Demo ✓ Questions ✓ References Agenda:
  4. Azure Synapse Analytics Limitless analytics service with unmatched time to

    insight Platform Azure Data Lake Storage Common Data Model Enterprise Security Optimized for Analytics METASTORE SECURITY MANAGEMENT MONITORING DATA INTEGRATION Analytics Runtimes PROVISIONED ON-DEMAND Form Factors SQL Languages Python .NET Java Scala R Experience Synapse Analytics Studio Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence METASTORE SECURITY MANAGEMENT MONITORING
  5. Azure Data Lake Storage Store Azure Synapse Analytics Key Component

    Of Modern Data warehouse Power BI Azure Machine Learning
  6. Data Hub – Databases Explore the different kinds of databases

    that exist in a workspace. SQL pool SQL on-demand Spark
  7. Azure SQL Data Warehouse Architecture Control Node Compute Node Compute

    Node Compute Node Compute Node SQL DB SQL DB SQL DB SQL DB Blob storage [WASB(S)] Compute Scale compute up or down when required (SLA <= 60 seconds). Pause, Resume, Stop, Start. Storage Add\Load data to WASB(S) without incurring compute costs Massively Parallel Processing (MPP) Engine Azure Infrastructure and Storage 100 DWU < > 30000 DWU Storage and Compute are de-coupled, enabling a true elastic service and separate charging for both compute and storage Application or User connection Data Loading (SSIS, REST, OLE, ADO, ODBC, WebHDFS, AZCopy, PS) DM S DM S DM S DM S DM S DMS (Data Movement Service) executes across all database nodes
  8. Creating tables CREATE TABLE [dbo].[DimStore] ( [StoreKey] int NOT NULL

    , [GeographyKey] int NOT NULL , [StoreName] nvarchar(100) NOT NULL , [StoreType] nvarchar(15) NULL , [StoreDescription] nvarchar(300) NOT NULL , [Status] nvarchar(20) NOT NULL , [OpenDate] datetime NOT NULL , [CloseDate] datetime NULL , [ETLLoadID] int NULL , [LoadDate] datetime NULL , [UpdateDate] datetime NULL ) WITH ( CLUSTERED INDEX([StoreKey]) , DISTRIBUTION = REPLICATE ) ; CREATE TABLE [dbo].[FactOnlineSales] ( [OnlineSalesKey] int NOT NULL , [DateKey] datetime NOT NULL , [StoreKey] int NOT NULL , [ProductKey] int NOT NULL , [PromotionKey] int NOT NULL , [CurrencyKey] int NOT NULL , [CustomerKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [SalesOrderLineNumber] int NULL , [SalesQuantity] int NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH([ProductKey]) ) ;
  9. Creating tables CREATE TABLE [build].[FactOnlineSales] ( [OnlineSalesKey] int NOT NULL

    , [DateKey] datetime NOT NULL , [StoreKey] int NOT NULL , [ProductKey] int NOT NULL , [PromotionKey] int NOT NULL , [CurrencyKey] int NOT NULL , [CustomerKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [SalesOrderLineNumber] int NULL , [SalesQuantity] int NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = ROUND_ROBIN ) ;
  10. Options For loading To load data, you can use any

    of these loading options: ✓ COPY statement ✓ PolyBase with T-SQL ✓ PolyBase and COPY statement with Azure Data Factory (ADF) ✓ PolyBase with SSIS ✓ PolyBase with Azure Databricks
  11. Best practices for loading large amount of data PolyBase benefits

    Variety of file formats PolyBase supports a variety of file formats including Parquet, RC, ORC and Gzip files. Azure Data Factory support Azure Data Factory also supports PolyBase loads and can achieve similar performance to running PolyBase manually Leverages MPP architecture PolyBase is designed to leverage the MPP (Massively Parallel Processing) architecture of Azure Synapse Analytics and will therefore load and export data magnitudes faster than any other tool.
  12. Create External Table - Using T-SQL Overview Creates an external

    table and then exports results of the Select statement. These operations will import data into the database for the duration of the query Steps: 1. Create Master Key 2. Create Credentials 3. Create External Data Source 4. Create External Data Format 5. Create External Table
  13. COPY command Overview Copies data from source to destination Benefits

    Retrieves data from all files from the folder and all its subfolders. Supports multiple locations from the same storage account, separated by comma Supports Azure Data Lake Storage (ADLS) Gen 2 and Azure Blob Storage. Supports CSV, PARQUET, ORC file formats
  14. Best practices To load data, you can use any of

    these Best Practices ✓ Preparing data in Azure Storage ✓ Running loads with enough compute ✓ Creating a loading user ✓ Loading to a staging table ✓ Loading to a columnstore index ✓ Creating statistics after the load ✓ Rotate storage keys (PolyBase)