Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sofia UG Webinar - End to end implementation of...

Ivan Donev
January 10, 2020

Sofia UG Webinar - End to end implementation of Microsoft BI Solution

Sofia UG Webinar - End to end implementation of Microsoft BI Solution

Ivan Donev

January 10, 2020
Tweet

More Decks by Ivan Donev

Other Decks in Technology

Transcript

  1. WHAT IS A DATAWAREHOUSE AND WHAT IS BI  Datawarehouse

     Designed for storing historical data from various source systems (supported by processes for collecting and managing data)  Supports analytical workloads  Supports answering questions like What happened?, Why it happened?, How it happened?  Not used for transactional processing  Not suitable for Operational reporting  Output: dimensional model/data vault model  Users: data engineers, data analysts, report developers  Business Intelligence  Uses data for exploring the business – how it performs, analyze trends, etc.  Answers questions like What happened?, Why it happened?, How it happened?  Output: Graphical representation (reports, charts, etc.)  Users: Business users
  2. THE TRADITIONAL DWH AND BI  Three tier architecture 

    Bottom tier – the DWH (SSIS, SQL DB Engine, MDS, DQS)  Middle tier – the semantic layer (SSAS/PowerBI)  Top tier – the presentation tier (SSRS/PowerBI)  ETL/ELT  ETL – 1/2/3 stage loading  ELT with Hadoop (big data scenarios)
  3. THE KEY POINTS IN DATA LOADING AND APPROACHES  Data

    loading is mostly two-fold operation  Full load  Delta/Incremental load (5 different strategies for change detection)  Full, CDC, changetracking, high watermark, checksum  Sematic layer requires additional processing/update  The “Realtime”-ness of the solution depends on:  Changes on the source  Frequency of detecting and importing changes  Frequency of updating the semantic layer  Report caching/snapshotting
  4. THE KEY POINTS IN DATA LOADING AND APPROACHES (2) 

    How often to incrementally load?  Not only time requirements matter!  How often you change the source?  Text files generated daily or direct import?  How many sources? Are they changing at the same time?  How easily accessible is the source?  Is there a proper extraction window?  How much and how fast can you transform the data?
  5. SOME TYPICAL ETL APPROACHES  Entirely contained in SQL Server

    DB  Source is SQL Server only and cross DB queries are allowed  Stored procedures are used  Pros: easy to code, modularize and maintain; CI/CD is easier and potentially easy to migrate to cloud  Cons: might require synonyms, cross-db transactions, excessive locking, difficult debugging; some use linked servers as well  Entirely contained in SSIS  Multiple types of sources  Flow and logic is controlled via packages  Pros: modules and ability to combine different sources, checkpoints, transaction management outside T-SQL, logging, debugging, scripting in C#, can be moved to a different server, scaleout, can use external processes (cmd, ftp, etc.)  Cons: additional knowledge, customizations are difficult, CI/CD is not straight forward  Two strategies  Master-child package (DEMO!)  Individual packages  Package vs Project deployment of SSIS
  6. IMPORTANT DECISION POINTS  Architecture  Data mart  Enterprise

    DWH  Hub-and-spoke  Model  Star or snowflake  Data vault 1.0/2.0  ETL  SSIS or SPs
  7. IMPORTANT DECISION POINTS (2)  Semantic layer  SSAS or

    PBI or Direct Reporting with T-SQL  AS – MD or Tabular  Reporting  SSRS, PBI, Excel  #Users, #Concurrent queries, knowledge of users, type of calculations, chosen frontend (AS is compatible with multiple FE reporting tools)  Delivery mechanism/Self-service BI  Front-end delivery  Embedding  Portal access  Security and refreshes/caching/report snapshots
  8. WHAT TO EXPECT  February SQL&BI UG meetup (virtual or

    in person) (free)  15.02.2020 – Power Platform Bootcamp (free)  Our DWH and BI training track is scheduled  PowerBI (Feb/March 2020)  BI Fundamentals (March 2020)  MOC 20767 – Implementing a Microsoft Datawarehouse (May 2020) (Ivan Donev)  More info @ sqlmasteracademy.com