Business Intelligence in microservice
architecture
Evgenii Vinogradov
Yandex.Money
Software Engineering Conference Russia 2018
October 12-13
Moscow
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
4
Slide 4
Slide 4 text
From Greece to Moscow
› Boat to Athens
› Feet to Taxi
› Athens Taxi to Airport
› Flight
› Moscow Taxi to Digital October
› Profit!
5
11
Slide 5
Slide 5 text
How we built realtime BI
› High load
› Time to Show
› Request time
› Support architecture
Slide 6
Slide 6 text
We start with ETL
7
Data
Warehouse
ETL
Cubes
Source
Source
Source
Reports
Slide 7
Slide 7 text
We start with ETL
› Not so much sources
› Data is highly aggregated
› Gaps are rare and noticable
8
Slide 8
Slide 8 text
We start with ETL
9
Data
Warehouse
ETL
Cubes/Data Marts
Source
Source
Source
Reports
API
Slide 9
Slide 9 text
Then we add Backoffice interface
› More frequent ETL runs
› Source should be prepared
10
Slide 10
Slide 10 text
And SLA
› Data Update time
› Interface Update time
› Monitoring
11
Time To Show
Slide 11
Slide 11 text
Testing
› Load testing – sources
› Load testing – ETL/API
› Load Testing – Together
› ETL – just pause
› And it is not so hard to create
› (You can test ETL with just pause)
12
Slide 12
Slide 12 text
Then we start microservicing
13
Slide 13
Slide 13 text
Then we start microservicing
14
Data
Warehouse
ETL
Cubes/Data Marts
Source
Source
Source
Reports
API
Slide 14
Slide 14 text
Then we start microservicing
15
Data
Warehouse
ETL
Cubes/Data Marts
Reports
API
Source
Source
Source
Source
Source
Source
Slide 15
Slide 15 text
Then we start microservicing
16
Data
Warehouse
ETL
Cubes/Data Marts
Reports
API
ETL
ETL
Source
Source
Source
Source
Source
Source
Slide 16
Slide 16 text
We start with ETL
17
Data
Warehouse
ETL
Cubes/
Data
Marts
Source
Reports
Source
Source
Source
Source
Source
API
ETL
ETL
Aggreg
ation
Slide 17
Slide 17 text
Then we start microservicing
› A lot of sources give less aggregated and integrated data
› One (or more) aggregtation layer
› Data Update Time = [delay] * [sources per aggregation]
18
Slide 18
Slide 18 text
Then we start microservicing
› Unique Identifier Issue
› Aggregation rules sync
› ETL still is pretty good
19
Slide 19
Slide 19 text
Update strategies: trade request time for
process time and load
› Load -> Store -> Show
› Load -> Aggregate -> Store -> Show
› Load -> Store -> Aggregate -> Show
› Load -> Aggregate -> Store - > Update -> Show
› Load -> Aggregate -> Store - > Update -> Show
20
Slide 20
Slide 20 text
Update strategies: trade request time for
process time and load
› Load -> Store -> Show
› Load -> Aggregate -> Store -> Show
› Load -> Store -> Aggregate -> Show
› Load -> Aggregate -> Store - > Update -> Show
› Load -> Aggregate -> Store - > Update -> Show
21
Slide 21
Slide 21 text
Drawbacks
› Sources still have to store a cache of data
› Sources have to preserve timelime (replication point)
› Request portion can be huge
22
Slide 22
Slide 22 text
Then we go to events
23
Data
Warehouse
ETL
Cubes/
Data
Marts
Reports
API
ETL
ETL
Aggreg
ation
Source
Source
Source
Source
Source
Source
Slide 23
Slide 23 text
Then we go to events
24
Data
Warehouse
Events
Cubes/
Data
Marts
Reports
API
ETL
ETL
Aggreg
ation
Source
Source
Source
Source
Source
Source
Slide 24
Slide 24 text
Then we go to events
25
Data
Warehouse
Events
Cubes/
Data
Marts
Reports
API
ETL
ETL
Aggreg
ation
Aggre
gation
Source
Source
Source
Source
Source
Source
Slide 25
Slide 25 text
Update strategies: trade request time for
process time and load
› Load -> Store -> Show
› Load -> Aggregate -> Store -> Show
› Load -> Store -> Aggregate -> Show
› Load -> Aggregate -> Store - > Update -> Show
› Load -> Aggregate -> Store - > Update -> Show
› Load -> Construct -> Aggregate -> Store -> Show
26
11
Slide 26
Slide 26 text
Now
› We still have ETL for aggregations
› Time To Show: 10 sec / source
› Source type count: close to 100
27
Slide 27
Slide 27 text
Starring
› Data Warehouse: MS SQL Server
› ETL: SSIS
› Events queue: Kafka
› Events reader: С#
› Other Sources: API/Java/etc
› Out of scope: sharding, downtime, processing, etc
28
Slide 28
Slide 28 text
Key takeaways
› ETL works fine with Events
› Find sufficient unique identifier among data sources
› Metrics: Time to Show and others
29
Slide 29
Slide 29 text
dm_jonny
[email protected]
Questions?
Evgenii VInogradov
Head of BI development
30