Vinitus Web: データプラットフォームの複雑性に立ち向かうデータ品質とリネージュのためのジョブ管理システム / Ingestion Management by "Vinitus Web": Data Quality and Lineage for the complex data platform users
Corporation Joined as a new grad in 2020 Data Engineering > Data Pipeline Engineering > Batch Department Sliver @ League of Legends Macquang Huy Hanoi, Vietnam Data Engineer @ LY Corporation Joined in 2022 LINE Technology Vietnam > Data Platform Dev Team Master @ League of Legends
A Vinitus Web Service B Service C Job Owner 0 1 * * * 0 2 * * * 0 * * * * • Create scheduled job • See execution history • Restart failed jobs Data Users
and Vinitus Web Data Platform Service A Where this data comes from? Can we use it? Data Users User Agreement for Service B User Agreement for Service A Service B Service C
and Vinitus Web Data Platform Service A How is the data ingested? Data Users Service B Service C MySQL MongoDB CAST(`col` AS CHAR) name → `name` spec.color → `color` /dt=20250630 is JST or UTC?
Agreement User Agreement or Which one…? MySQL MongoDB or /dt=20250630 stores 2025-06-30 00:00 ~ 2025-06-30 00:00 JST Or 2025-06-30 00:00 ~ 2025-06-30 00:00 UTC
data • It allows them to find rules and furthermore to quickly identify necessary checks and approvals. • By providing detailed Data Lineage information 2. Ensure consistency in data structure • This will make it easier to use the data without having to worry about differences in the origin of the data. • By a functionality to absorb differences in database types during ingestion How Vinitus Web Helps Data Users
better Data Lineage Data Platform Service A Service B Service C 0 1 * * * 0 2 * * * 0 * * * * Data Lineage information Job Owner is … Columns are CAST-ed like… How is it processed? Data Users It is from Service A’s DB
Owner workload Vinitus takes care of the rest Vinitus Airflow Job-as-Config Key Point Users create an ingestion job through UI Vinitus Providing a fully managed ETL workflow engine Non-expert users can create ingestion jobs (in minutes+),
Operations (1) Schema Evolution Data schemas change over time ! Modifying data types Deleting columns Adding columns Changing partitions CREATE TEMP TABLE INSERT data FROM Original-Table to Temp-Table Backup original data to HDFS DROP ORIGINAL TABLE CREATE ORIGINAL TABLE (new schema) Move data FROM Temp-Table to Original Table @Bi-Monthly Remove backup data @6_Hours Data is deleted completely Data is moved to Trash directory Vinitus Airflow Hadoop viewfs://iu/gov/{DB}/.IUBackup/{TABLE}/{TIME}/...
periods How ? Each backfill request will be sent to Airflow Each backfill request will be handled in dedicated pod Daily Operations (3) Backfill Support Want data for long past periods ! HTTP requests Airflow Backfill
Lineage and Data Consistency • Reduce the burden on the job owner when maintaining data in an accessible state Recap & Next With Vinitus Web: • Support to understand the Data Lineage information • Labels for data, AI • Reduce job owner‘s burden • Data as Product, Data SLO What’s next?