The Original Sin and the Current Situation of ZOZO TOWN's Data Infrastructure

Self introduction After joining VAILY as a new graduate, I
moved to ZOZO, Inc. after going through a complicated business reason (M&A). Immediately after that, I was suddenly summoned by the boss at the time and started to launch the data infrastructure in a complete trial and error situation, and here I am. I was a backend engineer, however, as an infrastructure specialist who worked together in those days was assigned to another project, leaving me a message, “Shio-chan can do infrastructure all right”, I started working for infrastructure. ZOZO, Inc. Technology Division ML / Data Department Data Platform Section Techlead Takehiro Shiozaki

Agenda - Introduction of ZOZOTOWN data infrastructure - The Original
Sin and the Current Situation - Atonement - Lesson & Summary

Introduction of ZOZOTOWN data infrastructure Batch Transfer • Data transfer:
Dataflow • Data WareHouse: BigQuery • Workflow Orchestration: Cloud Composer SQL Server Dataflow BigQuery Cloud Composer

Introduction of ZOZOTOWN data infrastructure Realtime Transfer • CDC data
transfer: Fluentd on GKE • Streaming Insert: Cloud Pub/Sub & Dataflow • Data WareHouse: BigQuery SQL Server Dataflow BigQuery Cloud Pub/Sub Fluentd on GKE

Data volume Several hundreds TB # of tables Several millions
of tables # of records Several trillions of records Introduction of ZOZOTOWN data infrastructure Data volume We hold a large volume of data!

Introduction of ZOZOTOWN data infrastructure # of users 0 100
200 300 400 500 600 2018/6/1 2018/11/1 2019/4/1 2019/9/1 2020/2/1 2020/7/1 2020/12/1 2021/5/1 2021/10/1 2022/3/1 2022/8/1 Hundreds of employees are currently using it, which makes it indispensable for ZOZOTOWN

Original sin and current situation • Original sin is "a
technical choice mistake made in the beginning." • Data infrastructure was small in the very early days • We started building a data infrastructure on GCP in April 2018 • The “current” architecture is a simple one, however, the early days architecture had its “original sin” • From here, I will explain the original sin we have committed and how we have atoned for them • I hope whoever builds the data infrastructure in the future does not commit the same sin

Introduction of ZOZOTOWN data infrastructure early days architecture • Many
systems have appeared compared to the current architecture • Both AWS and GCP are used • Repeated copying of copies • the architecture was complicated, and operation was difficult • Why did this happen…? SQL Server BigQuery SQL Server EC2 S3 EC2

The prequel of the launch of the data infrastructure The
very moment of committing the original sin. • Initially, data was collected into Redshift • However, since we did not have the know-how or personnel to do database tuning, we migrated to BigQuery. • Understanding of data flow at that time • S3: We had reasonable understanding • SQL Server(Intermediate database): We knew it existed, but we were not familiar with it • Upstream: Unknown land, must have existed somewhere • Our Original Sin: Acquired data where it was easy to do so • Under normal circumstances, we should have stepped into an unknown land bravely SQL Server SQL Server Redshift S3 EC2

Poor data quality • Various conversion processes were included unintentionally
• Some tables’ rows were physically deleted, and some tables’ row were not Long transfer time • The transfer process, which should normally be done only once, was being performed multiple times Long lead time to add new tables • A different team managed the intermediate system • it was necessary to work across multiple teams Various negative effects caused by the Original Sin These were barriers that prevented data utilization

Why we selected problematic technologies • It was not clear
whether the data infrastructure was really useful • We had limited man-hours and personnel • We did not have know-how to build a data infrastructure • I was originally a backend engineer who wrote Web API by Ruby on Rails • I believed the self-proclaimed copy • It was a complete copy "within the data used in their work” • Everyone was a liar in the end, but no one was to be blamed In the phase of give it a shot, it was a rational decision to make it (I want to believe it)

Migration of data transfer batch • Focusing on GCP, we
rebuilt from scratch • In the state of NewGame+ • From ETL to ELT • Abolished conversion process during transfer • Transform after storing raw data in BigQuery • Redesigning at security level • For Systems that handle PII, such as mail magazine distribution systems • Implemented pinpointed viewing restrictions by column-level permission management

Migration of data aggregation processing correspondence table for old vs.
new environment - Created a table correspondence table for old vs. new environment - Unsophisticated work to check the difference for each table - If a difference was found from the old environment, becoming the same state with the core database was defined as positive

Migration of data aggregation processing Using audit logs - As
there are too many users and user departments, it was impossible to identify the exact scope of impact • A system created by an engineer who left the company was working without handing over • Aggregation processing which was not noticed even by the user - Using audit logs l JOBS_BY_ORGANIZATION was used to accurately grasp places where the old system was used

Migration of data aggregation processing Visualization of dependencies between datamarts
- Found the keystone by visualizing the data aggregation processing - Find mart (keystone) referenced by many data aggregation processing - Separated data marts that should be migrated with priority and data marts that can be postponed without problems - Batches and BIs with no administrators were also terminated as soon as they were found

Migration of data aggregation processing Delete it as soon as
possible - Retired the old systems gradually - Tables for which reference was not made for a certain period were deleted from the old system - Separate GCP project was provided for those who want to refer to the data of the old system - Only a limited number of people were granted viewing permission - To avoid referencing data from the old system without realizing it

Migration of data aggregation processing For more information - For
specific examples that cannot be introduced here, please see the tech blog - There are many examples of using GCP and BigQuery in addition to migration stories - https://techblog.zozo.com/entry/data- infrastructure-replacement

Lesson • Do not believe in self-proclaimed copies • Source
only data that is absolutely guaranteed to be correct • Acquire data from upstream whenever possible • Make friends with the person in charge of the system that generates the data • Don’t ask people, ask audit Logs • People make mistakes and leave the company • Audit logs illuminate the truth • Don't mix up, it's dangerous • A spoonful of dirty water in a barrel full of wine is still dirty water • Schopenhauer's Law Of Entropy • Distinguish between quality-assured and non-quality-assured data

Summary • The current data infrastructure is indispensable for ZOZOTOWN
• However, we suffered for several years due to a big selection mistake (original sin) we made in the early stages of building it • Atonement took years • I hope this story is used as a teacher by negative example for those who will build data infrastructure in the future

Thank you!

The Original Sin and the Current Situation of Z...

The Original Sin and the Current Situation of ZOZO TOWN's Data Infrastructure

Tech-Verse2022

More Decks by Tech-Verse2022

Other Decks in Technology

Featured

Transcript

Self introduction After joining VAILY as a new graduate, I

Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

Introduction of ZOZOTOWN data infrastructure Batch Transfer • Data transfer:

Introduction of ZOZOTOWN data infrastructure Realtime Transfer • CDC data

Data volume Several hundreds TB # of tables Several millions

Introduction of ZOZOTOWN data infrastructure # of users 0 100

Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

Original sin and current situation • Original sin is "a

Introduction of ZOZOTOWN data infrastructure early days architecture • Many

The prequel of the launch of the data infrastructure The

Poor data quality • Various conversion processes were included unintentionally

Why we selected problematic technologies • It was not clear

Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

Migration of data transfer batch • Focusing on GCP, we

Migration of data aggregation processing correspondence table for old vs.

Migration of data aggregation processing Using audit logs - As

Migration of data aggregation processing Visualization of dependencies between datamarts

Migration of data aggregation processing Delete it as soon as

Migration of data aggregation processing For more information - For

Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

Lesson • Do not believe in self-proclaimed copies • Source

Summary • The current data infrastructure is indispensable for ZOZOTOWN

Thank you!