Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINEのデータプラットフォームが目指すべき未来 / The future of LINE data platform we are aiming for

LINEのデータプラットフォームが目指すべき未来 / The future of LINE data platform we are aiming for

奥田輔(LINE株式会社 Data Engineering1チーム マネージャー)
DEIM2021(第13回データ工学と情報マネジメントに関するフォーラム/第19回日本データベース学会年次大会)での発表資料です。
https://db-event.jpn.org/deim2021/

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

March 02, 2021
Tweet

Transcript

  1. LINEͷσʔλϓϥοτϑΥʔϜ͕ ໨ࢦ͢΂͖ະདྷ Tasuku Okuda Data Platform, LINE Corporation 2021-03-01, DEIM2021

    1
  2. Agenda 1. Introduction 2. Mission 3. Data Platform in LINE

    i. Architecture ii. KPI 2 4. Challenges 5. Conclusion
  3. Introduction 3

  4. Ԟాี Tasuku Okuda Engineering Manager, Data Engineering1 team, Data Platform

    dept., Data Engineering Center LINE New grad - ৽ଔೖࣾ - in 2013 LINE Game DBA (MySQL, MongoDB) → ETL engineer for LINE app → Ingestion Pipeline developer for server log → Hadoop migration project leader 4
  5. Mission 5

  6. Mission - LINE wide 6

  7. CLOSING THE DISTANCE https://linecorp.com/ja/company/mission 7

  8. LINE STYLE 8

  9. Always Data-driven LINE STYLE 04 ײ֮Ͱ͸ͳ͘ɺσʔλʹࣄ࣮Λ৴͡Δ 9

  10. Mission - Data Platform 10

  11. Make Data-driven easy 11

  12. Make Data-driven easy 🤔 12

  13. Governed, Integrated, Self-service data platform 13

  14. ~2020 14

  15. 2021~ 🤔 15

  16. Data Democracy 16

  17. As ML infrastructure 17

  18. Data Platform in LINE 18

  19. 19

  20. 20

  21. Architecture 21

  22. Tool/API Compute Storage Data Governance HDFS HBase Elasticsearch Kafka YARN

    Kubernetes Hive Spark Trino Flink Ranger Yanagishima OASIS LINE Analytics Portal Tableau Jupyter RStudio Datahub Central Dogma Kibana Grafana Prometheus 22
  23. Kafka Flink HDFS Elasticsearch External System Kubernetes Data Collecting 23

  24. Data Analyzing HDFS YARN / Kubernetes Hive Spark Trino Yanagishima

    OASIS LINE Analytics Tableau Jupyter RStudio Datahub 24
  25. KPI 25

  26. 270 PB 26 HDFS Capacity

  27. 410 TB/day 27 HDFS Daily Increase

  28. 5,436 servers 28 Managing servers (PM/VM)

  29. 56,000 tables 29 Hive tables

  30. 300,000 jobs/day 30 YARN/Presto jobs

  31. 13,000,000 records/sec 31 Pipeline incoming records

  32. 75 Engineers 32 In Data Platform (JP/KR)

  33. Challenges 33

  34. Data Democracy 34

  35. Data Observability Data Democracy 35

  36. Data Discovery What data do we have? What kind of

    data? How much cost? Who is the data owner? Universal Catalog Hive Kafka HBase MySQL MongoDB ObjStorage Deltalake Iceberg Hudi Streaming Snapshot CDC Core ML DS Service Client External Storage Computing Users Daily Monthly Budget 36
  37. Capacity planning Archival Storage IDC design Network planning Resource optimization

    Kubernetes ObjStorage Erasure Coding 37
  38. As ML infrastructure 38

  39. Data Reactivity As ML infrastructure 39

  40. Online Storage Offline Storage E2E pipeline latency HDFS TiDB HBase

    Elasticsearch CockroachDB Kafka Flink 40
  41. Data mutation/versioning Iceberg Deltalake Hudi Schema Evolution ACID Time Travel

    Partition Evolution 41
  42. Conclusion 42

  43. CLOSING THE DISTANCE Data Reactivity Data Democracy Data Observability Always

    Data-driven As ML infrastructure LINE CODE 04 43
  44. One more thing… 44

  45. We are hiring! 45

  46. LINE࠾༻৘ใ https://linecorp.com/ja/career/ 46

  47. LINE৽ଔ࠾༻2022 https://linecorp.com/ja/career/ newgrads/ 47

  48. Data Platform Open Position • Software Engineer - https://linecorp.com/ja/career/position/1750 •

    Site Reliability Engineer - https://linecorp.com/ja/career/position/1751 • Full-Stack Engineer - https://linecorp.com/ja/career/position/2282 • Solution Engineer - https://linecorp.com/ja/career/position/2215 • Product Manager - https://linecorp.com/ja/career/position/2229 48
  49. Public activity • LINE DEVDAY - https://linedevday.linecorp.com/ • LINE Engineering

    Blog - https://engineering.linecorp.com/blog/ • Hadoop/Torino OSS commitment • DEIM2021 Day3 • ϦαʔνΠϯλʔϯܦݧऀ΁ͷͶ΄ΓΜͺ΄ΓΜʂ • ʲٕज़ใࠂʳLINEʹ͓͚ΔϓϥΠόγϑΝʔετͳσʔλ׆༻ٕज़ͷݚڀ։ൃ 49
  50. Thank you! 50