SQL Saturday 917 Vienna - Designing PowerBI and Tabular models for performance and scalability

Tabular design for performance and scalability Ivan Donev Doing data
in Microsoft’s realm for 13 years @ikdonev | https://www.linkedin.com/in/ivandonev/

The order of importance (personal opinion) Optimization patterns with Aggregations
Storage and formula optimization patterns Base layer design

Tabular internals

DAX Query Engine Formula engine Produce query plans Data request
from SE Single threaded Storage engine: Returns data to FE Multi-threaded https://insightquest.files.wordpress.com/2016/11/dax-query-engine.png

Storing and compressing data Data is encoded and stored compressed
on disk Encoded and uncompressed in memory The process when importing data: 1. Read the data and detect data types 2. Encode 3. Compress when saving on disk

Encoding and compression Automatic detection Based on data types Hash
and Value Hash – big data types Value – whole numbers/currency

Understanding value encoding in Vertipaq Modify the range Takes 194
as base Stores the difference Bytes decreased Only on Integer values

Understanding dictionary encoding 8 Column encoding Build a dictionary Replace
values This stores strings as integers Strings, floating points, etc. are dictionary encoded Distinct values determine size of model

Understanding Run Length Encoding (RLE) Additional compression technique Avoids repeating
values Sorting is important

Compression summary Order of importance Column cardinality Number of repetitions
Number of rows Data type Re-encoding Non-integer Integer

The overall process Data Values VALUE ENCODING HASH ENCODING Encoded
data RLE and BitPacking Compressed data Calculated columns Sorting patterns Processed data Reading in segments of 8 mln. (default) Data structures for rel. and hier. (mappings) Segment size - Tabular = 8 mln, configurable - PBI = 1 mln, not configurable Processing and compression are parallel processes => need additional memory Optimize by disabling hierarchies (discussed later in the deck) AS Instance property: Vertipaq\ProcessingTimeboxSecPerMRow Sorting matters!

Storage optimization patterns

Pattern #1 – Excessive columns Not all columns are needed
from the source Text and big numbers are with high priority If you need analysis – try to push it to the DB engine (if possible) Filter, filter, filter!!!

Pattern #2 – Date/time table Date table - Created automatically
by default in PBI - Required for Time Intelligence How to change it? - PowerBI settings - Choose an explicit date table (mind the gaps in dates and ranges)

Pattern #3 – Data types Data types - Define encoding
- Take more storage - High precision is not always required How to change it? - PowerBI: change data types - SSAS Tabular: SSDT or Tabular Editor or TSML/XMLA

Pattern #4 – Long values Big numbers - Imagine a
20-digit value - Can use DECIMAL - Not much of an effect How to fix it? - Split the column to parts 100200300 = 100*1000000 + 200*100000 + 300 - Numbers: use modulo and division - Text: use Left/Substring/Right

Pattern #4 – Long values – additional example * https://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/

A not so scientific experiment

Pattern #5 – Precision Precision - Affects encoding and compression
- Hits FE - Not always required How to fix it? - Split whole part from decimal part into columns 12.4 = 12 + 4/10 - Write explicit measures that first aggregate then devide SUM(WholePart) + SUM(DecimalPart)/10

Pattern #6 – Date, time Date and time - Usually
in the same column in the source - Usually date or time is needed for analysis How to fix it - Split date and time into separate columns - Define granularity for time (do you need HH, MM and SS or just HH); separate them in different columns with high cardinality - Store the date in int format (i.e. YYYYMMDD)

Pattern #7 – Source Key columns or IDs Keys and
IDs - High cardinality - Are they really needed How to fix it? - Denormalize - Remove them if not needed - Split them (if feasible)

DEMO – Auto date/time and AS hints

Big model aggregation patterns by Phil Seamark (@PhilSeamark)

Aggregation awareness in PowerBI Detailed rows Report Visual Option 1:
Import, useful for small to med size tables Detailed rows Report Visual Option 2: DQ/Dual for med-large size tables Detailed rows Report Visual Option 3: Usage of aggregated tables Aggregated tables

A sample model from Reza Rad

Agg pattern # 1 - Horizontal aggs

Horizontal aggs (2)

Horizontal aggs (3)

Horizontal Aggs Applicable in: - SSAS and PowerBI (no agg
awareness) Scenario - Transactional Fact table(s) - Queries aggregate data in ranges i.e. last X days, last month, last week, etc. The solutions - Aggregate table per week - Aggregate table per month - Lots of DAX (~120 lines per measure) - https://dax.tips/2019/10/21/creative-aggs-part-ii-horizontal-aggs/

DEMO – Row-based TI

Row-based Time Intelligence Applicable - PowerBI and SSAS Scenario -
Time intelligence calculations, multiple periods - Custom periods like last 28 days, last 9 weeks Solution - Additional time intelligence table calculated at ETL or DAX

Agg pattern #2 - Accordion aggregations Use TSQL to load
the Agg table in PowerBI Table Rows Base Table 8,231,630,715 Pure Week Agg 2,206,251 Pure Month Agg 628,175 Accordion Agg 2,558,871

Accordion aggregations (2) 1. Create a temp boundary table and
join it with the fact during import (ETL)

Accordion aggregates (3) Applicable - Best in PowerBI due to
Aggregation Awareness Scenario - Transactional fact table - Monthly, weekly or mixed reporting The solution - Accordion agg table - Can be combined with Row-Based TI

Pivoting data for performance ?

Pivoting data for performance (2) ETL 35% Performance improvement over
8 bln records table

Pivoting data for performance (3) Applicable - PowerBI and SSAS
Scenario - Push TI calculations to ETL Solution - Pivoting TI data at import time (LY, LM,etc.)

Agg pattern #3 – Filtered aggregations

DEMO – Filtered aggregations

Agg pattern #4 – Incremental Aggs You have a Fact
table You have an aggregation table How to keep sync between those? - New row in fact - Delete a row in fact - Update a row in fact

Incremental Aggs Add ETL logic to capture changes PBI Premium
– incremental refresh SSAS – ProcessAdd Process to consolidate the AGG table (part of the ETL)

Star schema specifics

Sample schema

Optimizing dimensions Minimize the number of columns. Reduce cardinality (data
type conversions). Filter out unused dimension values (unless a business scenario requires them). Integer Surrogate Keys (SK). Hint for VALUE encoding on numeric columns (AS ONLY). Hint for disabling hierarchies on SKs (AS ONLY). - speeds up processing - decreases model size (eventually)

Optimizing facts Replace dimension IDs with their surrogate keys. Reduce
cardinality (data type conversions). Consider moving calculations to the source (to use in compression evaluations). Ordered by less diverse SKs first (to maximize compression). Increased Tabular sample size for deciding Encoding, by considering segments and partitions(AS ONLY). Hint for VALUE encoding on numeric columns (AS ONLY). Hint for disabling hierarchies (AAS and AS2017 CU7 ONLY).

Conclusion Use only the data you need Star-schema is your
friend, and Bidirectional filtering is the enemy As small memory footprint as possible Decrease calculated tables and columns Tweak server and model settings Use custom aggregations tables DAX Studio, Vertipaq Analyzer and Tabular Editor FTW

Whitepapers and articles Performance tuning of tabular models in SQL
Server 2012 Analysis Services https://blogs.msdn.microsoft.com/karang/2013/08/02/sql-2012-tabular- performance-tuning-of-tabular-models-in-sql-server-2012-analysis-services/ Modeling AS for Scalability https://blogs.msdn.microsoft.com/analysisservices/2018/08/08/whitepaper- on-modeling-for-as-tabular-scalability/ Phil Seamarks’s series on smart aggregations https://dax.tips/2019/10/18/creative-aggs-part-i-introduction/

Used tools SQL Server Management Studio - DMVs SQL Server
Profiler - NOT deprecated for Analysis Services DAX Studio - FREE and extremely powerful (http://daxstudio.org/) Community excel templates - VertiPaq Analyzer (https://www.sqlbi.com/tools/vertipaq-analyzer/) Tabular Editor (https://tabulareditor.github.io/) - Design and management

SQL Saturday 917 Vienna - Designing PowerBI and...

SQL Saturday 917 Vienna - Designing PowerBI and Tabular models for performance and scalability

More Decks by Ivan Donev

Other Decks in Technology

Featured

Transcript