Tabular design for performance
and scalability
Ivan Donev
Doing data in Microsoft’s realm for 13 years
@ikdonev | https://www.linkedin.com/in/ivandonev/
Slide 2
Slide 2 text
The order of importance (personal opinion)
Optimization patterns
with Aggregations
Storage and formula
optimization patterns
Base layer design
Slide 3
Slide 3 text
Tabular internals
Slide 4
Slide 4 text
DAX Query Engine
Formula engine
Produce query plans
Data request from SE
Single threaded
Storage engine:
Returns data to FE
Multi-threaded
https://insightquest.files.wordpress.com/2016/11/dax-query-engine.png
Slide 5
Slide 5 text
Storing and compressing data
Data is encoded and stored compressed on disk
Encoded and uncompressed in memory
The process when importing data:
1. Read the data and detect data types
2. Encode
3. Compress when saving on disk
Slide 6
Slide 6 text
Encoding and compression
Automatic detection
Based on data types
Hash and Value
Hash – big data types
Value – whole numbers/currency
Slide 7
Slide 7 text
Understanding value encoding in Vertipaq
Modify the range
Takes 194 as base
Stores the difference
Bytes decreased
Only on Integer values
Slide 8
Slide 8 text
Understanding dictionary encoding
8
Column encoding
Build a dictionary
Replace values
This stores strings as integers
Strings, floating points, etc. are
dictionary encoded
Distinct values determine size of
model
Slide 9
Slide 9 text
Understanding Run Length Encoding (RLE)
Additional compression
technique
Avoids repeating values
Sorting is important
Slide 10
Slide 10 text
Compression summary
Order of importance
Column cardinality
Number of repetitions
Number of rows
Data type
Re-encoding
Non-integer
Integer
Slide 11
Slide 11 text
The overall process
Data Values
VALUE
ENCODING
HASH
ENCODING
Encoded
data
RLE and
BitPacking
Compressed
data
Calculated
columns
Sorting patterns
Processed
data
Reading in
segments of 8
mln. (default)
Data structures
for rel. and hier.
(mappings)
Segment size
- Tabular = 8 mln, configurable
- PBI = 1 mln, not configurable
Processing and compression are parallel processes => need additional memory
Optimize by disabling hierarchies (discussed later in the deck)
AS Instance property: Vertipaq\ProcessingTimeboxSecPerMRow
Sorting matters!
Slide 12
Slide 12 text
Storage
optimization
patterns
Slide 13
Slide 13 text
Pattern #1 – Excessive columns
Not all columns are needed from the source
Text and big numbers are with high priority
If you need analysis – try to push it to the DB engine
(if possible)
Filter, filter, filter!!!
Slide 14
Slide 14 text
Pattern #2 – Date/time table
Date table
- Created automatically by default in PBI
- Required for Time Intelligence
How to change it?
- PowerBI settings
- Choose an explicit date table (mind the gaps in
dates and ranges)
Slide 15
Slide 15 text
Pattern #3 – Data types
Data types
- Define encoding
- Take more storage
- High precision is not always required
How to change it?
- PowerBI: change data types
- SSAS Tabular: SSDT or Tabular Editor or
TSML/XMLA
Slide 16
Slide 16 text
Pattern #4 – Long values
Big numbers
- Imagine a 20-digit value
- Can use DECIMAL
- Not much of an effect
How to fix it?
- Split the column to parts
100200300 = 100*1000000 + 200*100000 + 300
- Numbers: use modulo and division
- Text: use Left/Substring/Right
Slide 17
Slide 17 text
Pattern #4 – Long values – additional example
* https://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/
Slide 18
Slide 18 text
A not so scientific experiment
Slide 19
Slide 19 text
Pattern #5 – Precision
Precision
- Affects encoding and compression
- Hits FE
- Not always required
How to fix it?
- Split whole part from decimal part into columns
12.4 = 12 + 4/10
- Write explicit measures that first aggregate then devide
SUM(WholePart) + SUM(DecimalPart)/10
Slide 20
Slide 20 text
Pattern #6 – Date, time
Date and time
- Usually in the same column in the source
- Usually date or time is needed for analysis
How to fix it
- Split date and time into separate columns
- Define granularity for time (do you need HH, MM and
SS or just HH); separate them in different columns with
high cardinality
- Store the date in int format (i.e. YYYYMMDD)
Slide 21
Slide 21 text
Pattern #7 – Source Key columns or IDs
Keys and IDs
- High cardinality
- Are they really needed
How to fix it?
- Denormalize
- Remove them if not needed
- Split them (if feasible)
Slide 22
Slide 22 text
DEMO – Auto date/time and AS
hints
Slide 23
Slide 23 text
Big model
aggregation patterns
by Phil Seamark (@PhilSeamark)
Slide 24
Slide 24 text
Aggregation awareness in PowerBI
Detailed rows
Report Visual
Option 1: Import, useful for small to med size tables
Detailed rows
Report Visual
Option 2: DQ/Dual for med-large size tables
Detailed rows
Report Visual
Option 3: Usage of aggregated tables
Aggregated tables
Slide 25
Slide 25 text
A sample model from Reza Rad
Slide 26
Slide 26 text
Agg pattern # 1 - Horizontal aggs
Slide 27
Slide 27 text
Horizontal aggs (2)
Slide 28
Slide 28 text
Horizontal aggs (3)
Slide 29
Slide 29 text
Horizontal Aggs
Applicable in:
- SSAS and PowerBI (no agg awareness)
Scenario
- Transactional Fact table(s)
- Queries aggregate data in ranges i.e. last X days, last month, last week,
etc.
The solutions
- Aggregate table per week
- Aggregate table per month
- Lots of DAX (~120 lines per measure)
- https://dax.tips/2019/10/21/creative-aggs-part-ii-horizontal-aggs/
Slide 30
Slide 30 text
DEMO – Row-based TI
Slide 31
Slide 31 text
Row-based Time Intelligence
Applicable
- PowerBI and SSAS
Scenario
- Time intelligence calculations, multiple periods
- Custom periods like last 28 days, last 9 weeks
Solution
- Additional time intelligence table calculated at ETL
or DAX
Slide 32
Slide 32 text
Agg pattern #2 - Accordion aggregations
Use TSQL to load the Agg table
in PowerBI
Table Rows
Base Table 8,231,630,715
Pure Week Agg 2,206,251
Pure Month Agg 628,175
Accordion Agg 2,558,871
Slide 33
Slide 33 text
Accordion aggregations (2)
1. Create a temp boundary table
and join it with the fact
during import (ETL)
Slide 34
Slide 34 text
Accordion aggregates (3)
Applicable
- Best in PowerBI due to Aggregation Awareness
Scenario
- Transactional fact table
- Monthly, weekly or mixed reporting
The solution
- Accordion agg table
- Can be combined with Row-Based TI
Slide 35
Slide 35 text
Pivoting data for performance
?
Slide 36
Slide 36 text
Pivoting data for performance (2)
ETL
35% Performance improvement over 8 bln records table
Slide 37
Slide 37 text
Pivoting data for performance (3)
Applicable
- PowerBI and SSAS
Scenario
- Push TI calculations to ETL
Solution
- Pivoting TI data at import time (LY, LM,etc.)
Slide 38
Slide 38 text
Agg pattern #3 – Filtered aggregations
Slide 39
Slide 39 text
DEMO – Filtered aggregations
Slide 40
Slide 40 text
Agg pattern #4 – Incremental Aggs
You have a Fact table
You have an aggregation table
How to keep sync between
those?
- New row in fact
- Delete a row in fact
- Update a row in fact
Slide 41
Slide 41 text
Incremental Aggs
Add ETL logic to capture changes
PBI Premium – incremental
refresh
SSAS – ProcessAdd
Process to consolidate the AGG
table (part of the ETL)
Slide 42
Slide 42 text
Star schema specifics
Slide 43
Slide 43 text
Sample schema
Slide 44
Slide 44 text
Optimizing dimensions
Minimize the number of columns.
Reduce cardinality (data type conversions).
Filter out unused dimension values (unless a business
scenario requires them).
Integer Surrogate Keys (SK).
Hint for VALUE encoding on numeric columns (AS ONLY).
Hint for disabling hierarchies on SKs (AS ONLY).
- speeds up processing
- decreases model size (eventually)
Slide 45
Slide 45 text
Optimizing facts
Replace dimension IDs with their surrogate keys.
Reduce cardinality (data type conversions).
Consider moving calculations to the source (to use in
compression evaluations).
Ordered by less diverse SKs first (to maximize compression).
Increased Tabular sample size for deciding Encoding, by
considering segments and partitions(AS ONLY).
Hint for VALUE encoding on numeric columns (AS ONLY).
Hint for disabling hierarchies (AAS and AS2017 CU7 ONLY).
Slide 46
Slide 46 text
Conclusion
Use only the data you need
Star-schema is your friend, and Bidirectional filtering is the
enemy
As small memory footprint as possible
Decrease calculated tables and columns
Tweak server and model settings
Use custom aggregations tables
DAX Studio, Vertipaq Analyzer and Tabular Editor FTW
Slide 47
Slide 47 text
Whitepapers and articles
Performance tuning of tabular models in SQL Server 2012 Analysis Services
https://blogs.msdn.microsoft.com/karang/2013/08/02/sql-2012-tabular-
performance-tuning-of-tabular-models-in-sql-server-2012-analysis-services/
Modeling AS for Scalability
https://blogs.msdn.microsoft.com/analysisservices/2018/08/08/whitepaper-
on-modeling-for-as-tabular-scalability/
Phil Seamarks’s series on smart aggregations
https://dax.tips/2019/10/18/creative-aggs-part-i-introduction/
Slide 48
Slide 48 text
Used tools
SQL Server Management Studio
- DMVs
SQL Server Profiler
- NOT deprecated for Analysis Services
DAX Studio
- FREE and extremely powerful (http://daxstudio.org/)
Community excel templates
- VertiPaq Analyzer (https://www.sqlbi.com/tools/vertipaq-analyzer/)
Tabular Editor (https://tabulareditor.github.io/)
- Design and management