Slide 1

Slide 1 text

Business Intelligence in microservice architecture Evgenii Vinogradov Yandex.Money Software Engineering Conference Russia 2018 October 12-13 Moscow

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

4

Slide 4

Slide 4 text

From Greece to Moscow › Boat to Athens › Feet to Taxi › Athens Taxi to Airport › Flight › Moscow Taxi to Digital October › Profit! 5 11

Slide 5

Slide 5 text

How we built realtime BI › High load › Time to Show › Request time › Support architecture

Slide 6

Slide 6 text

We start with ETL 7 Data Warehouse ETL Cubes Source Source Source Reports

Slide 7

Slide 7 text

We start with ETL › Not so much sources › Data is highly aggregated › Gaps are rare and noticable 8

Slide 8

Slide 8 text

We start with ETL 9 Data Warehouse ETL Cubes/Data Marts Source Source Source Reports API

Slide 9

Slide 9 text

Then we add Backoffice interface › More frequent ETL runs › Source should be prepared 10

Slide 10

Slide 10 text

And SLA › Data Update time › Interface Update time › Monitoring 11 Time To Show

Slide 11

Slide 11 text

Testing › Load testing – sources › Load testing – ETL/API › Load Testing – Together › ETL – just pause › And it is not so hard to create › (You can test ETL with just pause) 12

Slide 12

Slide 12 text

Then we start microservicing 13

Slide 13

Slide 13 text

Then we start microservicing 14 Data Warehouse ETL Cubes/Data Marts Source Source Source Reports API

Slide 14

Slide 14 text

Then we start microservicing 15 Data Warehouse ETL Cubes/Data Marts Reports API Source Source Source Source Source Source

Slide 15

Slide 15 text

Then we start microservicing 16 Data Warehouse ETL Cubes/Data Marts Reports API ETL ETL Source Source Source Source Source Source

Slide 16

Slide 16 text

We start with ETL 17 Data Warehouse ETL Cubes/ Data Marts Source Reports Source Source Source Source Source API ETL ETL Aggreg ation

Slide 17

Slide 17 text

Then we start microservicing › A lot of sources give less aggregated and integrated data › One (or more) aggregtation layer › Data Update Time = [delay] * [sources per aggregation] 18

Slide 18

Slide 18 text

Then we start microservicing › Unique Identifier Issue › Aggregation rules sync › ETL still is pretty good 19

Slide 19

Slide 19 text

Update strategies: trade request time for process time and load › Load -> Store -> Show › Load -> Aggregate -> Store -> Show › Load -> Store -> Aggregate -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Aggregate -> Store - > Update -> Show 20

Slide 20

Slide 20 text

Update strategies: trade request time for process time and load › Load -> Store -> Show › Load -> Aggregate -> Store -> Show › Load -> Store -> Aggregate -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Aggregate -> Store - > Update -> Show 21

Slide 21

Slide 21 text

Drawbacks › Sources still have to store a cache of data › Sources have to preserve timelime (replication point) › Request portion can be huge 22

Slide 22

Slide 22 text

Then we go to events 23 Data Warehouse ETL Cubes/ Data Marts Reports API ETL ETL Aggreg ation Source Source Source Source Source Source

Slide 23

Slide 23 text

Then we go to events 24 Data Warehouse Events Cubes/ Data Marts Reports API ETL ETL Aggreg ation Source Source Source Source Source Source

Slide 24

Slide 24 text

Then we go to events 25 Data Warehouse Events Cubes/ Data Marts Reports API ETL ETL Aggreg ation Aggre gation Source Source Source Source Source Source

Slide 25

Slide 25 text

Update strategies: trade request time for process time and load › Load -> Store -> Show › Load -> Aggregate -> Store -> Show › Load -> Store -> Aggregate -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Construct -> Aggregate -> Store -> Show 26 11

Slide 26

Slide 26 text

Now › We still have ETL for aggregations › Time To Show: 10 sec / source › Source type count: close to 100 27

Slide 27

Slide 27 text

Starring › Data Warehouse: MS SQL Server › ETL: SSIS › Events queue: Kafka › Events reader: С# › Other Sources: API/Java/etc › Out of scope: sharding, downtime, processing, etc 28

Slide 28

Slide 28 text

Key takeaways › ETL works fine with Events › Find sufficient unique identifier among data sources › Metrics: Time to Show and others 29

Slide 29

Slide 29 text

dm_jonny [email protected] Questions? Evgenii VInogradov Head of BI development 30