DATA PLATFORM LINE Ads Platform LINE Creators Market LINE NEWS LINE Pay LINE LIVE LINE MOBILE Hadoop Cluster (Data Lake) LINE Ads Platform LINE Creators Market LINE NEWS LINE Pay LINE LIVE LINE MOBILE ETL Analysis BI / Reporting
DATA OPEN • Makes the Hadoop cluster public within LINE • Enables employees to analyze their service’s data as they like • Speeds up their data analysis process and decision making Multi-tenant Hadoop Cluster LINE Ads Platform LINE Creators Market LINE Ads Platform LINE Creators Market
1. SECURITY • Strict access control • Allows employees to access only their service’s data Multi-tenant Hadoop Cluster LINE Ads Platform LINE Creators Market LINE Ads Platform LINE Creators Market
3. FEATURES Skill Role Required Features SQL Programming Data Science X X X Manager Result Sharing O X X Planner Query Result Visualization O O X Engineer ETL O O O Data Scientist Ad Hoc Data Analysis
APACHE ZEPPELIN 0.7.3 : SECURITY • Launches a Spark application with another user account • Cheats Apache Ranger Spark Application : User B Apache Zeppelin HDFS / Apache Ranger User A
APACHE ZEPPELIN 0.7.3 : STABILITY • Runs only on a single server • Does not support the “yarn-cluster” mode • Easy to freeze Apache Zeppelin Server Apache Zeppelin Driver Program 1 Driver Program 2 Driver Program 3 Driver Program 4 Driver Program 5
NOTEBOOK SHARING • Notebooks can be shared within a “space” • “space” : root directory of notebooks for each LINE service • Access rights: “read write”, “read only” Space 1 Read Write Users Read Only Users Notebooks Space 2 Read Write Users Read Only Users Notebooks