Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Access analysis of Data Platform users

Access analysis of Data Platform users

Takahiro Moteki
LINE Data Infrastructure Team Site Reliability Engineer

LINE DevDay 2020

November 27, 2020

More Decks by LINE DevDay 2020

Other Decks in Technology


  1. None
  2. Agenda › Introduction: Data Platform › Access analysis of Data

    Platform users › Design and Implementation
  3. Introduction Data Platform

  4. Introduction of Data Platform › Provide data infrastructure and BI

    tools What do we do? Services that we provide › Cluster, Storage, Query, Pipeline, Governance, Self-service portal Mission › Provide the data platform as a service to LINE employees
  5. Scale SERVER: 2585 EV CPU: 100K VCORES RAM: 854 TB

    STORAGE: 270 PB INCOMING RECORDS: 661+ GB/day 13 M/s (peak) STORAGE USED: 177 PB WORKLOAD: 300K +/DAY TABLES: 56000+ MEMBERS: 76
  6. Access analysis of Data Platform users

  7. Access analysis (KPI) DR (Dormant rate) DAC (Daily Active User

    Action Count) MAC (Monthly Active User Action Count) MAU (Monthly Active User) RR (Retention Rate) DAU (Daily Active User)
  8. DAU (Daily Active User) YARN Presto Batch High SLA Adhoc

  9. DAC (Daily Active User Action Count) Presto YARN Batch High

    SLA Adhoc
  10. Case example

  11. Online cluster migrations Hadoop Cluster A Hadoop Cluster B Hadoop

    Cluster C Create KGI Migration Rate (MR) MR (%) C = DAU (C) / DAU (A+B+C) A B C
  12. Actions Segments Personal Account System Account Read Write DDL Admin-queue

    Submit-app Read Write Execute Hive YARN HDFS Accounts Components
  13. Design and Implementation

  14. Which logs? Ecosystems Hadoop Clusters A Zookeeper HDFS Hive YARN

    Presto Spark B C Ranger audit logs Presto query logs
  15. Implementation Ranger Audit logs Presto Query logs 1.Collect 1.Collect 2.Aggregate

    3.Visualize HDFS Presto Airflow BI Tools Hive
  16. Future Prospects › Add other KPIs and other segments ›

    Real-time risk detection › Cost (disk/cpu/memory) visualization and report
  17. Thank you