Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sofia UG Webinar - End to end implementation of...

Avatar for Ivan Donev Ivan Donev
January 10, 2020

Sofia UG Webinar - End to end implementation of Microsoft BI Solution

Sofia UG Webinar - End to end implementation of Microsoft BI Solution

Avatar for Ivan Donev

Ivan Donev

January 10, 2020
Tweet

More Decks by Ivan Donev

Other Decks in Technology

Transcript

  1. WHAT IS A DATAWAREHOUSE AND WHAT IS BI  Datawarehouse

     Designed for storing historical data from various source systems (supported by processes for collecting and managing data)  Supports analytical workloads  Supports answering questions like What happened?, Why it happened?, How it happened?  Not used for transactional processing  Not suitable for Operational reporting  Output: dimensional model/data vault model  Users: data engineers, data analysts, report developers  Business Intelligence  Uses data for exploring the business – how it performs, analyze trends, etc.  Answers questions like What happened?, Why it happened?, How it happened?  Output: Graphical representation (reports, charts, etc.)  Users: Business users
  2. THE TRADITIONAL DWH AND BI  Three tier architecture 

    Bottom tier – the DWH (SSIS, SQL DB Engine, MDS, DQS)  Middle tier – the semantic layer (SSAS/PowerBI)  Top tier – the presentation tier (SSRS/PowerBI)  ETL/ELT  ETL – 1/2/3 stage loading  ELT with Hadoop (big data scenarios)
  3. THE KEY POINTS IN DATA LOADING AND APPROACHES  Data

    loading is mostly two-fold operation  Full load  Delta/Incremental load (5 different strategies for change detection)  Full, CDC, changetracking, high watermark, checksum  Sematic layer requires additional processing/update  The “Realtime”-ness of the solution depends on:  Changes on the source  Frequency of detecting and importing changes  Frequency of updating the semantic layer  Report caching/snapshotting
  4. THE KEY POINTS IN DATA LOADING AND APPROACHES (2) 

    How often to incrementally load?  Not only time requirements matter!  How often you change the source?  Text files generated daily or direct import?  How many sources? Are they changing at the same time?  How easily accessible is the source?  Is there a proper extraction window?  How much and how fast can you transform the data?
  5. SOME TYPICAL ETL APPROACHES  Entirely contained in SQL Server

    DB  Source is SQL Server only and cross DB queries are allowed  Stored procedures are used  Pros: easy to code, modularize and maintain; CI/CD is easier and potentially easy to migrate to cloud  Cons: might require synonyms, cross-db transactions, excessive locking, difficult debugging; some use linked servers as well  Entirely contained in SSIS  Multiple types of sources  Flow and logic is controlled via packages  Pros: modules and ability to combine different sources, checkpoints, transaction management outside T-SQL, logging, debugging, scripting in C#, can be moved to a different server, scaleout, can use external processes (cmd, ftp, etc.)  Cons: additional knowledge, customizations are difficult, CI/CD is not straight forward  Two strategies  Master-child package (DEMO!)  Individual packages  Package vs Project deployment of SSIS
  6. IMPORTANT DECISION POINTS  Architecture  Data mart  Enterprise

    DWH  Hub-and-spoke  Model  Star or snowflake  Data vault 1.0/2.0  ETL  SSIS or SPs
  7. IMPORTANT DECISION POINTS (2)  Semantic layer  SSAS or

    PBI or Direct Reporting with T-SQL  AS – MD or Tabular  Reporting  SSRS, PBI, Excel  #Users, #Concurrent queries, knowledge of users, type of calculations, chosen frontend (AS is compatible with multiple FE reporting tools)  Delivery mechanism/Self-service BI  Front-end delivery  Embedding  Portal access  Security and refreshes/caching/report snapshots
  8. WHAT TO EXPECT  February SQL&BI UG meetup (virtual or

    in person) (free)  15.02.2020 – Power Platform Bootcamp (free)  Our DWH and BI training track is scheduled  PowerBI (Feb/March 2020)  BI Fundamentals (March 2020)  MOC 20767 – Implementing a Microsoft Datawarehouse (May 2020) (Ivan Donev)  More info @ sqlmasteracademy.com