Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Original Sin and the Current Situation of Z...

The Original Sin and the Current Situation of ZOZO TOWN's Data Infrastructure

Takehiro Shiozaki (ZOZO / Technology Division ML / Data Department Data Platform Section / Tech Lead)

https://tech-verse.me/ja/sessions/230
https://tech-verse.me/en/sessions/230
https://tech-verse.me/ko/sessions/230

Tech-Verse2022

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Self introduction After joining VAILY as a new graduate, I

    moved to ZOZO, Inc. after going through a complicated business reason (M&A). Immediately after that, I was suddenly summoned by the boss at the time and started to launch the data infrastructure in a complete trial and error situation, and here I am. I was a backend engineer, however, as an infrastructure specialist who worked together in those days was assigned to another project, leaving me a message, “Shio-chan can do infrastructure all right”, I started working for infrastructure. ZOZO, Inc. Technology Division ML / Data Department Data Platform Section Techlead Takehiro Shiozaki
  2. Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

    Sin and the Current Situation - Atonement - Lesson & Summary
  3. Introduction of ZOZOTOWN data infrastructure Batch Transfer • Data transfer:

    Dataflow • Data WareHouse: BigQuery • Workflow Orchestration: Cloud Composer SQL Server Dataflow BigQuery Cloud Composer
  4. Introduction of ZOZOTOWN data infrastructure Realtime Transfer • CDC data

    transfer: Fluentd on GKE • Streaming Insert: Cloud Pub/Sub & Dataflow • Data WareHouse: BigQuery SQL Server Dataflow BigQuery Cloud Pub/Sub Fluentd on GKE
  5. Data volume Several hundreds TB # of tables Several millions

    of tables # of records Several trillions of records Introduction of ZOZOTOWN data infrastructure Data volume We hold a large volume of data!
  6. Introduction of ZOZOTOWN data infrastructure # of users 0 100

    200 300 400 500 600 2018/6/1 2018/11/1 2019/4/1 2019/9/1 2020/2/1 2020/7/1 2020/12/1 2021/5/1 2021/10/1 2022/3/1 2022/8/1 Hundreds of employees are currently using it, which makes it indispensable for ZOZOTOWN
  7. Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

    Sin and the Current Situation - Atonement - Lesson & Summary
  8. Original sin and current situation • Original sin is "a

    technical choice mistake made in the beginning." • Data infrastructure was small in the very early days • We started building a data infrastructure on GCP in April 2018 • The “current” architecture is a simple one, however, the early days architecture had its “original sin” • From here, I will explain the original sin we have committed and how we have atoned for them • I hope whoever builds the data infrastructure in the future does not commit the same sin
  9. Introduction of ZOZOTOWN data infrastructure early days architecture • Many

    systems have appeared compared to the current architecture • Both AWS and GCP are used • Repeated copying of copies • the architecture was complicated, and operation was difficult • Why did this happen…? SQL Server BigQuery SQL Server EC2 S3 EC2
  10. The prequel of the launch of the data infrastructure The

    very moment of committing the original sin. • Initially, data was collected into Redshift • However, since we did not have the know-how or personnel to do database tuning, we migrated to BigQuery. • Understanding of data flow at that time • S3: We had reasonable understanding • SQL Server(Intermediate database): We knew it existed, but we were not familiar with it • Upstream: Unknown land, must have existed somewhere • Our Original Sin: Acquired data where it was easy to do so • Under normal circumstances, we should have stepped into an unknown land bravely SQL Server SQL Server Redshift S3 EC2
  11. Poor data quality • Various conversion processes were included unintentionally

    • Some tables’ rows were physically deleted, and some tables’ row were not Long transfer time • The transfer process, which should normally be done only once, was being performed multiple times Long lead time to add new tables • A different team managed the intermediate system • it was necessary to work across multiple teams Various negative effects caused by the Original Sin These were barriers that prevented data utilization
  12. Why we selected problematic technologies • It was not clear

    whether the data infrastructure was really useful • We had limited man-hours and personnel • We did not have know-how to build a data infrastructure • I was originally a backend engineer who wrote Web API by Ruby on Rails • I believed the self-proclaimed copy • It was a complete copy "within the data used in their work” • Everyone was a liar in the end, but no one was to be blamed In the phase of give it a shot, it was a rational decision to make it (I want to believe it)
  13. Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

    Sin and the Current Situation - Atonement - Lesson & Summary
  14. Migration of data transfer batch • Focusing on GCP, we

    rebuilt from scratch • In the state of NewGame+ • From ETL to ELT • Abolished conversion process during transfer • Transform after storing raw data in BigQuery • Redesigning at security level • For Systems that handle PII, such as mail magazine distribution systems • Implemented pinpointed viewing restrictions by column-level permission management
  15. Migration of data aggregation processing correspondence table for old vs.

    new environment - Created a table correspondence table for old vs. new environment - Unsophisticated work to check the difference for each table - If a difference was found from the old environment, becoming the same state with the core database was defined as positive
  16. Migration of data aggregation processing Using audit logs - As

    there are too many users and user departments, it was impossible to identify the exact scope of impact • A system created by an engineer who left the company was working without handing over • Aggregation processing which was not noticed even by the user - Using audit logs l JOBS_BY_ORGANIZATION was used to accurately grasp places where the old system was used
  17. Migration of data aggregation processing Visualization of dependencies between datamarts

    - Found the keystone by visualizing the data aggregation processing - Find mart (keystone) referenced by many data aggregation processing - Separated data marts that should be migrated with priority and data marts that can be postponed without problems - Batches and BIs with no administrators were also terminated as soon as they were found
  18. Migration of data aggregation processing Delete it as soon as

    possible - Retired the old systems gradually - Tables for which reference was not made for a certain period were deleted from the old system - Separate GCP project was provided for those who want to refer to the data of the old system - Only a limited number of people were granted viewing permission - To avoid referencing data from the old system without realizing it
  19. Migration of data aggregation processing For more information - For

    specific examples that cannot be introduced here, please see the tech blog - There are many examples of using GCP and BigQuery in addition to migration stories - https://techblog.zozo.com/entry/data- infrastructure-replacement
  20. Agenda - Introduction of ZOZOTOWN data infrastructure - The Original

    Sin and the Current Situation - Atonement - Lesson & Summary
  21. Lesson • Do not believe in self-proclaimed copies • Source

    only data that is absolutely guaranteed to be correct • Acquire data from upstream whenever possible • Make friends with the person in charge of the system that generates the data • Don’t ask people, ask audit Logs • People make mistakes and leave the company • Audit logs illuminate the truth • Don't mix up, it's dangerous • A spoonful of dirty water in a barrel full of wine is still dirty water • Schopenhauer's Law Of Entropy • Distinguish between quality-assured and non-quality-assured data
  22. Summary • The current data infrastructure is indispensable for ZOZOTOWN

    • However, we suffered for several years due to a big selection mistake (original sin) we made in the early stages of building it • Atonement took years • I hope this story is used as a teacher by negative example for those who will build data infrastructure in the future