Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Access analysis of Data Platform users

Access analysis of Data Platform users

Takahiro Moteki
LINE Data Infrastructure Team Site Reliability Engineer
https://linedevday.linecorp.com/2020/ja/sessions/0974
https://linedevday.linecorp.com/2020/en/sessions/0974

LINE DevDay 2020

November 27, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1. Agenda › Introduction: Data Platform › Access analysis of Data

    Platform users › Design and Implementation
  2. Introduction of Data Platform › Provide data infrastructure and BI

    tools What do we do? Services that we provide › Cluster, Storage, Query, Pipeline, Governance, Self-service portal Mission › Provide the data platform as a service to LINE employees
  3. Scale SERVER: 2585 EV CPU: 100K VCORES RAM: 854 TB

    STORAGE: 270 PB INCOMING RECORDS: 661+ GB/day 13 M/s (peak) STORAGE USED: 177 PB WORKLOAD: 300K +/DAY TABLES: 56000+ MEMBERS: 76
  4. Access analysis (KPI) DR (Dormant rate) DAC (Daily Active User

    Action Count) MAC (Monthly Active User Action Count) MAU (Monthly Active User) RR (Retention Rate) DAU (Daily Active User)
  5. Online cluster migrations Hadoop Cluster A Hadoop Cluster B Hadoop

    Cluster C Create KGI Migration Rate (MR) MR (%) C = DAU (C) / DAU (A+B+C) A B C
  6. Actions Segments Personal Account System Account Read Write DDL Admin-queue

    Submit-app Read Write Execute Hive YARN HDFS Accounts Components
  7. Which logs? Ecosystems Hadoop Clusters A Zookeeper HDFS Hive YARN

    Presto Spark B C Ranger audit logs Presto query logs
  8. Future Prospects › Add other KPIs and other segments ›

    Real-time risk detection › Cost (disk/cpu/memory) visualization and report