LINEのデータプラットフォームが目指すべき未来 / The future of LINE data platform we are aiming for

奥田輔(LINE株式会社 Data Engineering1チーム マネージャー)


LINE株式会社のData Platform Open Position


May 19, 2021

  1. 奥田輔 Tasuku OKUDA Engineering Manager, Data Engineering1 team, Data Platform

    dept., Data Engineering Center LINE New grad - 新卒入社 - in 2013 LINE Game DBA (MySQL, MongoDB) → ETL engineer for LINE app → Ingestion Pipeline developer for server log → Hadoop migration project leader
  2. Agenda 1. Mission 2. Data Platform in LINE i. Architecture

    ii. KPI for Scale 3 3. Challenges 4. Public Activity 5. Conclusion
  5. Tool/API Compute Storage Data Governance HDFS HBase Elasticsearch Kafka YARN

    Kubernetes Hive Spark Trino Flink Ranger Yanagishima OASIS LINE Analytics Portal Tableau Jupyter RStudio Datahub Central Dogma Kibana Grafana Prometheus 21
  6. Data Analyzing HDFS YARN / Kubernetes Hive Spark Trino Yanagishima

    OASIS LINE Analytics Tableau Jupyter RStudio Datahub 23
  7. Data Discovery What data do we have? What kind of

    data? How much cost? Who is the data owner? Universal Catalog Hive Kafka HBase MySQL MongoDB ObjStorage Deltalake Iceberg Hudi Streaming Snapshot CDC Core ML DS Service Client External Storage Computing Users Daily Monthly Budget 35
  8. Public activity – LINE DEVDAY https://linedevday.linecorp.com/ • Access analysis of

    Data Platform users https://linedevday.linecorp.com/2020/en/sessions/0974 • 100+PB scale Unified Hadoop cluster Federation with 2k+ nodes https://linedevday.linecorp.com/jp/2019/sessions/D1-5 42
  9. Public activity – LINE Engineering Blog https://engineering.linecorp.com/blog/ • Introduce Data

    Platform Department https://engineering.linecorp.com/ja/blog/introduce-data-platform-department/ • LINE全社のデータ基盤のミドルウェアやData ingestion pipelineの開発・ 運用を担当しているチームを紹介します https://engineering.linecorp.com/ja/blog/data-infrastructure-ingestion-pipeline/ • Introducing Frey: LINE’s new self-service batch ingestion system https://engineering.linecorp.com/en/blog/introducing-frey-lines-new-self-service-batch-ingestion-system/ • ダウンタイムなしでHadoopクラスタを移行した時の話 https://engineering.linecorp.com/ja/blog/migrating-a-hadoop-cluster-without-downtime/ 43
  10. Public activity – OSS contribution • HDFS • Hive •

    Spark • Ranger • Flink • Trino (formally PrestoSQL) 44
  11. Data Platform Open Position https://linecorp.com/ja/career/ja/all?text=data%20platform • Data Platform Engineer •

    Software Engineer • Site Reliability Engineer • Distributed System Administrator • Elasticsearch Engineer • ソリューションエンジニア • プロダクトマネージャー 51