Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Business Intelligence в микросервисной архитектуре

SECR 2018
October 13, 2018

Business Intelligence в микросервисной архитектуре

SECR 2018
Евгений Виноградов
Яндекс.Деньги

В докладе пойдет речь о том, как поменялся подход к разработке BI с переходом компании к микросервисной архитектуре. Я затрону некоторые аспекты разработки хранилища и витрин данных, а также текущие тренды в этой области, которые оказались актуальны для нас.

SECR 2018

October 13, 2018
Tweet

More Decks by SECR 2018

Other Decks in Programming

Transcript

  1. 4

  2. From Greece to Moscow › Boat to Athens › Feet

    to Taxi › Athens Taxi to Airport › Flight › Moscow Taxi to Digital October › Profit! 5 11
  3. How we built realtime BI › High load › Time

    to Show › Request time › Support architecture
  4. We start with ETL › Not so much sources ›

    Data is highly aggregated › Gaps are rare and noticable 8
  5. Testing › Load testing – sources › Load testing –

    ETL/API › Load Testing – Together › ETL – just pause › And it is not so hard to create › (You can test ETL with just pause) 12
  6. Then we start microservicing 15 Data Warehouse ETL Cubes/Data Marts

    Reports API Source Source Source Source Source Source
  7. Then we start microservicing 16 Data Warehouse ETL Cubes/Data Marts

    Reports API ETL ETL Source Source Source Source Source Source
  8. We start with ETL 17 Data Warehouse ETL Cubes/ Data

    Marts Source Reports Source Source Source Source Source API ETL ETL Aggreg ation
  9. Then we start microservicing › A lot of sources give

    less aggregated and integrated data › One (or more) aggregtation layer › Data Update Time = [delay] * [sources per aggregation] 18
  10. Update strategies: trade request time for process time and load

    › Load -> Store -> Show › Load -> Aggregate -> Store -> Show › Load -> Store -> Aggregate -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Aggregate -> Store - > Update -> Show 20
  11. Update strategies: trade request time for process time and load

    › Load -> Store -> Show › Load -> Aggregate -> Store -> Show › Load -> Store -> Aggregate -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Aggregate -> Store - > Update -> Show 21
  12. Drawbacks › Sources still have to store a cache of

    data › Sources have to preserve timelime (replication point) › Request portion can be huge 22
  13. Then we go to events 23 Data Warehouse ETL Cubes/

    Data Marts Reports API ETL ETL Aggreg ation Source Source Source Source Source Source
  14. Then we go to events 24 Data Warehouse Events Cubes/

    Data Marts Reports API ETL ETL Aggreg ation Source Source Source Source Source Source
  15. Then we go to events 25 Data Warehouse Events Cubes/

    Data Marts Reports API ETL ETL Aggreg ation Aggre gation Source Source Source Source Source Source
  16. Update strategies: trade request time for process time and load

    › Load -> Store -> Show › Load -> Aggregate -> Store -> Show › Load -> Store -> Aggregate -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Aggregate -> Store - > Update -> Show › Load -> Construct -> Aggregate -> Store -> Show 26 11
  17. Now › We still have ETL for aggregations › Time

    To Show: 10 sec / source › Source type count: close to 100 27
  18. Starring › Data Warehouse: MS SQL Server › ETL: SSIS

    › Events queue: Kafka › Events reader: С# › Other Sources: API/Java/etc › Out of scope: sharding, downtime, processing, etc 28
  19. Key takeaways › ETL works fine with Events › Find

    sufficient unique identifier among data sources › Metrics: Time to Show and others 29