Slide 1

Slide 1 text

How we replaced a 10-year-old Perl product using Scala Scala Matsuri 2019 2019/06/29

Slide 2

Slide 2 text

©Hatena Co., Ltd. “The project that is difficult to maintain or extend” ● Long running & Poor documentation ● Inflexible code ● Outdated dependencies ● ... Legacy project? 2 レガシープロジェクト: メンテナンスや拡張が 難しくなったプロジェクト

Slide 3

Slide 3 text

©Hatena Co., Ltd. ● Refactoring ● Re-architecting ● Rewrite Escape from being legacy 3 レガシープロジェクト: メンテナンスや拡張が 難しくなったプロジェクト

Slide 4

Slide 4 text

©Hatena Co., Ltd. ● Refactoring ● Re-architecting ● Rewrite ○ Why and How we replaced a 10-year-old Perl product (Hatena-Bookmark) using Scala Escape from being legacy 4 何故、そしてどのようにScalaを用いてリプ レースしたかについてお話します

Slide 5

Slide 5 text

©Hatena Co., Ltd. ● Rikito taniguchi ● @tanishiking / github: tanishiking ● id:tanishiking24 ● Hatena (2017~) ○ Hatena-Bookmark team About me 5 2017年に入社以来、はてなブックマークのリ プレースに携わってきました。

Slide 6

Slide 6 text

©Hatena Co., Ltd. ● What we did ● Why we decided to perform the Full-Rewrite? ● Why we chose Scala? ● The Big Rewrite / data migration. Agenda 6 今日のアジェンダ

Slide 7

Slide 7 text

©Hatena Co., Ltd. ● What we did ● Why we decided to perform the Full-Rewrite? ● Why we chose Scala? ● The Big Rewrite / data migration. Agenda 7 概要からお話します

Slide 8

Slide 8 text

©Hatena Co., Ltd. Hatena-Bookmark 8 日本国内でサービス展開するソーシャルブッ クマークプラットフォーム ● Social Bookmark Platform in Japan ● Launched at 2005

Slide 9

Slide 9 text

©Hatena Co., Ltd. ● Monolithic Perl application ○ 400000 lines of Perl code (excluding tests) ○ 270000 lines of tests ○ About 70000 lines of HTML template ○ (at November 2016) ● And many git submodules... Hatena-Bookmark 9 モノリシックなPerlアプリケーションとして構築 されていた

Slide 10

Slide 10 text

©Hatena Co., Ltd. ● Inconsistent wording ● Homegrown ORM and Web framework ● The models no longer reflects the real ● Fat model / Fat controller ● Too slow tests ● Complicated release processes ● Difficult to setup develop environment Hatena-Bookmark was “legacy” project 10 ソースコードやDBの肥大化・老朽化によりソ フトウェアの最適化や変更が難しく

Slide 11

Slide 11 text

©Hatena Co., Ltd. We decided to rewrite Hatena-Bookmark using Scala in 2015 !!! Rewrite Hatena-Bookmark using Scala 11 2015年にはてなブックマークのリライトを決め る。Scalaを採用。 Source: Hatena Bookmark in Scala https://www.slideshare.net/oarat/2015-0801-scala (Scala Kansai Summit 2015) ● Reduce the costs of maintaining. ● Optimize the application.

Slide 12

Slide 12 text

©Hatena Co., Ltd. ● Server-side / web frontend engineers ○ 2 - 4 members ○ Develop the new system, and take care of the original system. ● Infrastructure engineer ○ 1 member Team members (engineers) 12 はてなブックマーク(Web)の開発チーム構成

Slide 13

Slide 13 text

©Hatena Co., Ltd. ● Create a new database for the new application ○ (Not sharing the existing database with old application) ○ Requires data migration Brand New Database 13 新システムでは新しいDBを利用し、旧DBの 再利用はしない Original App New App Original DB New DB

Slide 14

Slide 14 text

©Hatena Co., Ltd. New software architecture (overview) 14 Core App Server にScala、ユーザーからの リクエストを処理する部分にPerl Core App Server (Scala) BFF (Perl) Microservices (Go/Python/Perl) Reverse proxy CDN Split to ● Backend For Frontend(Perl) ● Core App Server (Scala) ● (and some microservices)

Slide 15

Slide 15 text

©Hatena Co., Ltd. 4 YEARS LATER 15 4年後...

Slide 16

Slide 16 text

©Hatena Co., Ltd. Now, Hatena-Bookmark’s Core App Server is built on Scala !!! 16 はてなブックマークのCore App Serverは Scalaで動いている!

Slide 17

Slide 17 text

©Hatena Co., Ltd. The original app server is no longer running! 17 旧システムは完全に停止 CPU usage on the original app hosts

Slide 18

Slide 18 text

©Hatena Co., Ltd. Improvements in performance. Benefits of Rewrite 18 rewriteにより250ms以内で返せるreqの割合 が約40%から約90%に The proportion of requests whose response time is smaller than 250ms (40% => 90%) (in a comment list page).

Slide 19

Slide 19 text

©Hatena Co., Ltd. ● Make it quite easy to add/change the features ○ Now, we release the software to production almost everyday. ● Save the substantial amount of computation resource for running an application Benefits of Rewrite 19 サービスへの変更が非常に容易に 計算リソースの大幅な節約

Slide 20

Slide 20 text

©Hatena Co., Ltd. ● Were there any other options than rewrite for revitalising the project? ● Rewrite is not the only option to revitalize the project. ○ Refactoring ○ Re-architecting ○ Full Rewrite Was rewrite the best option? 20 ソフトウェアのフルスクラッチが唯一の選択肢 ではない

Slide 21

Slide 21 text

©Hatena Co., Ltd. ● Risk ○ Usually takes months or even years. ○ Risk of the regressions. ● Overhead ○ We may have to freeze the development on the original software while rewriting. Rewrite is basically undesirable... 21 リライトには数年かかることも、既存プロジェ クトの開発を止めることにも

Slide 22

Slide 22 text

©Hatena Co., Ltd. So, why we decided to rewrite, in spite of the risks ? 22 では何故我々はそんなリスクを承知のうえで リライトという道を選んだのか

Slide 23

Slide 23 text

©Hatena Co., Ltd. ● What we did ● Why we decided to perform the Full-Rewrite? ● Why we chose Scala? ● The Big Rewrite / data migration. Agenda 23 何故リライトという道を選んだのか

Slide 24

Slide 24 text

©Hatena Co., Ltd. ● Homegrown ORM ● The models no longer reflects the real ● Fat model / Fat controller ● and more ... Hatena-Bookmark was “legacy” project 24 ソースコードやDBの肥大化・老朽化によりソ フトウェアの最適化や変更が難しく

Slide 25

Slide 25 text

©Hatena Co., Ltd. ● Designed based on “convention over configuration” ● They had been useful for rapid development, but… ○ No longer maintained. ○ People started to deviate the “convention”... ● Tight coupled with the system. ○ Hinder the large scale refactoring and optimization. (Homegrown) ORM, Web App Framework 25 もうメンテされてない内製フレームワークへの 依存。

Slide 26

Slide 26 text

©Hatena Co., Ltd. In the real world, a single content (entry) may have the multiple URLs. The difference between model and reality (example) 26 現実世界ではひとつのコンテンツが複数の URLを持ちうる http://example.com/ https://example.com/ https://foo.bar/ Entry 301 redirect / canonical Bookmark Bookmark

Slide 27

Slide 27 text

©Hatena Co., Ltd. In the old system, each URL had been modeled to have each different entry. The difference between model and reality (example) 27 旧システムでは各URLはそれぞれ異なるエン トリを指し示す。 http://example.com/ https://example.com/ https://foo.bar/ Entry Bookmark Bookmark Same contents!

Slide 28

Slide 28 text

©Hatena Co., Ltd. ● Fat model ○ The model that has more logics than its own behavior. ○ $ wc -l lib/Hatena/Bookmark/MoCo/Entry.pm ■ 4611 lib/Hatena/Bookmark/MoCo/Entry.pm ● Fat controller ○ The controllers sometimes have the logics that represents model’s behavior. Fat model / Fat controller 28 モデルの振る舞い以上のロジックまで持った モデルが出現

Slide 29

Slide 29 text

©Hatena Co., Ltd. ● Inconsistent wording ○ “favorite” and “follow” mean the same thing. ● Too long test ● Too complicated release process ● Difficult to setup the development environment. and more ... 29 他にも様々な問題が...

Slide 30

Slide 30 text

©Hatena Co., Ltd. So, why we decided to rewrite, in spite of the risks ? Fundamental changes Past failure on refactorings 30 何故リライトという道を選んだのか 理由は主に2つ

Slide 31

Slide 31 text

©Hatena Co., Ltd. Fundamental changes were necessary for making the software keep to thrive... ● Revise DB schema / model ● Remove the dependency on the homegrown ORM and framework. Fundamental changes 31 ソフトウェアに対する根本的な変更が必要だ ということがわかっていた

Slide 32

Slide 32 text

©Hatena Co., Ltd. We’d experienced several times of large scale refactoring ended in failure. ● Tried to replace the framework and gave up. ● Tried to refactor around the database architecture / connection and failed. Past failures on refactoring 32 過去に大規模なリファクタリングを試みようと して失敗

Slide 33

Slide 33 text

©Hatena Co., Ltd. It was virtually impossible to make the system keep to thrive only with refactoring… => Full Rewrite Why Rewrite 33 これらの理由からリライトが最善だと判断

Slide 34

Slide 34 text

©Hatena Co., Ltd. ● What we did ● Why we decided to perform the Full-Rewrite? ● Why we chose Scala? ● The Big Rewrite / data migration. Agenda 34 何故Scalaを選んだのか

Slide 35

Slide 35 text

©Hatena Co., Ltd. ● Well suited for complex problem domain ○ Expressive type system ○ Scalability ○ Type safe ● Concise syntax ● Already adopted Scala in other projects Why Scala for Core App Server ? 35 社内での利用実績、複雑なドメインを簡潔に 表現できる。

Slide 36

Slide 36 text

©Hatena Co., Ltd. New software architecture (overview) 36 新アーキテクチャの概要(再掲) Core App Server (Scala) BFF (Perl) Microservices (Go/Python/Perl) Reverse proxy CDN Split to ● Backend For Frontend(Perl) ● Core App Server (Scala) ● (and some microservices)

Slide 37

Slide 37 text

©Hatena Co., Ltd. ● Hatena has a lot of Perl developers ● Rapid development ○ Easy to use / learn ○ Do not require compiling ● Thin layer Why Perl for BFF? 37 社内での利用実績、Perlエンジニアが多い

Slide 38

Slide 38 text

©Hatena Co., Ltd. Scala isn’t easy to learn… To alleviate the barrier to onboard the project, ● Prepare learning materials ● Try to avoid using “difficult” libraries ○ Monocle / cats / scalaz … ○ Though they are quite useful, they make it more difficult for non-scala engineer to onboard. Learning curve for Scala 38 Scala学習教材の用意、「難しい」ライブラリは できる限り避け参入障壁を下げる

Slide 39

Slide 39 text

©Hatena Co., Ltd. ● Library ○ Scalatra ○ Slick (Plain SQL Query) ○ circe ○ Elastic4s ○ etc ● Cake pattern Tech stacks for Scala 39 Scalaの開発で利用している技術スタック

Slide 40

Slide 40 text

©Hatena Co., Ltd. ● To avoid the problems in the old system, design the architecture based on Domain Driven Design. ● Problems in the old system ○ The gap between models and real world. ○ Fat model / Fat controller. ○ Inconsistent wording. Domain Driven Design 40 旧システムでの課題を解決するためドメイン 駆動設計の徹底

Slide 41

Slide 41 text

©Hatena Co., Ltd. ● Common and rigorous language between developers and all members who are related to the project. ● Domain model name after the ubiquitous languages. Discuss and re-define the ubiquitous languages, share those languages. Ubiquitous Languages 41 ユビキタス言語の再定義 ✅ inconsistent wording

Slide 42

Slide 42 text

©Hatena Co., Ltd. Layered architecture 42 レイヤードアーキテクチャを採用し各レイヤの 責務を明確にする。 ✅ separation of concerns

Slide 43

Slide 43 text

©Hatena Co., Ltd. Dependency inversion principle 43 依存関係逆転の原則 / インフラレイヤの変更 によるほかレイヤへの影響を抑える ✅ ease of database refactoring ...

Slide 44

Slide 44 text

©Hatena Co., Ltd. 44 package domain.repository // Cake pattern trait BookmarkComponent { // Wrap the repository interface def bookmarkLoader: BookmarkLoader trait BookmarkLoader { // Domain repository has only the interface. def find(bookmarkId: BookmarkId): Option[BookmarkEntity] } } package infrastructure trait BookmarkComponent extends domain.repository.BookmarkComponent { // Concrete implementations here def bookmarkLoader: BookmarkLoader = BookmarkLoaderImpl }

Slide 45

Slide 45 text

©Hatena Co., Ltd. ● The model had methods for retrieving and resolving the relationships with other models (in the old system) ○ Fat Model ● Define it as a extension method in domain service (domain relation) (in the new system). Relations between entities 45 エンティティ間の関係の解決

Slide 46

Slide 46 text

©Hatena Co., Ltd. Extension method in domain service 46 package domain.relation trait BookmarkLocationComponent { self: repository.LocationComponent => implicit class BookmarkSeqLocationsRelation( bookmarks: Seq[BookmarkEntity] ) { // In the real system, the return value is something like // Bookmark with { def location: Location } def withLocations: Stream[(BookmarkEntity, Location)] = … } }

Slide 47

Slide 47 text

©Hatena Co., Ltd. ● Well suited for complex problem domain ○ Expressive type system ○ Scalability ○ Type safe ● Concise syntax ● Already adopted Scala in other projects Why Scala for Core App Server ? 47 社内での利用実績、複雑なドメインを簡潔に 表現できる。

Slide 48

Slide 48 text

©Hatena Co., Ltd. ● What we did ● Why we decided to perform the Full-Rewrite? ● Why we chose Scala? ● The Big Rewrite / data migration. Agenda 48 Full-Rewrite、データ移行について

Slide 49

Slide 49 text

©Hatena Co., Ltd. ● Make the system maintainable and easy to change. ● Revise models and DB schema. ● Optimize the system and save the computation resources. Project goal 49 プロジェクトの目標

Slide 50

Slide 50 text

©Hatena Co., Ltd. ● Don’t add any new big feature while rewriting. ● Continue to provide the main features. ● Obsolete some of minor features. Project scope 50 新機能追加はなし、既存機能は基本的に存 続させる(一部廃止はあり)

Slide 51

Slide 51 text

©Hatena Co., Ltd. ● Rewrite all at once ? or ● Incremental rewrite ? THE BIG REWRITE 51 一度にすべて置き換えるか インクリメンタルに置き換えるか

Slide 52

Slide 52 text

©Hatena Co., Ltd. Split the rewriting process into smaller number of phases. ● Aug 2017: Replace comment list page ● Nov 2017: Replace user page ● Mar 2018: Replace top page ● Mar 2018: Replace search feature ● ... Incremental Rewrite 52 一度に全てを置き換えず、何度かに分けて 徐々にリライト

Slide 53

Slide 53 text

©Hatena Co., Ltd. ● Pros ○ Each phase of release clarifies the progress and business value. ○ Safer than a big-bang rewrite. ● Cons ○ We have to run both the new and original system until the rewrite complete. Incremental Rewrite 53 利点: 各フェーズ毎に進捗と成果を可視化 欠点: 新旧両システムを稼働させる必要

Slide 54

Slide 54 text

©Hatena Co., Ltd. LIST ALL THE FEATURES and LIST ALL THE RESOURCES EACH FEATURE DEPENDS (BY READING SOURCE CODE) ● Choose which features to re-implement or not. ● Prioritize based on the dependencies and business impact. ● Group them into the components. ○ Rewrite each group one by one. Thorough investigation on the old system 54 既存システムの全ての機能と依存するリソー スの洗い出し

Slide 55

Slide 55 text

©Hatena Co., Ltd. ● Where are we in the project? ○ The list will help clarifying the progress. ● Encounter an unexpected features / dependencies while the rewrite project… ○ There’s no way to avoid it other than listing all features and dependencies thoroughly before rewrite... Thorough investigation on the old system 55 プロジェクトの進捗を明らかに 想定外の仕様が後で発覚するのを防ぐ

Slide 56

Slide 56 text

©Hatena Co., Ltd. Switch upstream on reverse proxy 56 reverse proxy でリクエストを新/旧システムに 振り分け Listing user comments User page Setting Recommend old nginx Route to old system Route to new system new

Slide 57

Slide 57 text

©Hatena Co., Ltd. Split a component as a microservice 57 一部の機能をマイクロサービスとして分離で きることも Listing user comments User page Setting Recommend old nginx new Split as a microservice

Slide 58

Slide 58 text

©Hatena Co., Ltd. Since we created a new database with brand new DB structure, it was required to migrate all the data in old database to new one. Data migration 58 新アプリケーションのために新しくDBを作っ たのでデータ移行が必要 Original App New App Original DB New DB

Slide 59

Slide 59 text

©Hatena Co., Ltd. Downtime for maintenance ● Stop the service for each data migration. ● Maintenance time might continue several hours. ○ Large scale ○ Complexed ETL process Downtime for maintenance vs zero-downtime 59 メンテナンスを挟むデータ移行と、ゼロダウン タイムでのデータ移行 Zero-downtime ● No downtime ● Require real-time data replication. ● Replication delay.

Slide 60

Slide 60 text

©Hatena Co., Ltd. Data migration with zero-downtime 60 ゼロダウンタイムでのデータ移行することを決 断 ● Considering the required number of downtimes, it wasn’t acceptable to stop the service repeatedly. ● Replication delay was not so critical.

Slide 61

Slide 61 text

©Hatena Co., Ltd. ● 1. Start real-time data migration ○ Replicate the writes on the old system to new system. ● 2. Batch data migration ○ Copy all existing data into the new database. ● 3. Data verification ● 4. Replace Real-time and batch data migration 61 リアルタイムデータ移行とバッチデータ移行で ゼロダウンタイムを実現

Slide 62

Slide 62 text

©Hatena Co., Ltd. ● Aug 2017: Replace comment list page ● Nov 2017: Replace user page ● Mar 2018: Replace top page ● … ● May 2019: Stop the old system Finally, released all the replaces!! 62 2019年5月に全てのデータ移行と置き換え作 業が完了し旧システム停止

Slide 63

Slide 63 text

©Hatena Co., Ltd. ● Great improvements in non-functional requiurements ■ Faster response time ■ Improved algorithms ● Over the estimated development cost ○ It is hard to estimate the exact cost for the rewrite. ○ Rewriting the big legacy software always takes years. ● We didn’t have any big re-work ○ Thanks to the thorough investigation and. Review 63 見積もりより時間がかかってしまった しかし大きな手戻りなく進められた

Slide 64

Slide 64 text

©Hatena Co., Ltd. ● Refactoring or Rewrite? ■ Consider carefully / Refactoring first ■ Rewrite is really powerful but tough ● Solved problems in the old system thanks to Scala! ○ Thank you!!! ● Consider incremental rewrite for big rewrite ○ Clarify the progress / safer / cost ● Thorough research on the original system ○ Prevent big-rework / listing all tasks Summary 64 まとめ

Slide 65

Slide 65 text

©Hatena Co., Ltd. Questions? 65

Slide 66

Slide 66 text

©Hatena Co., Ltd. If we have time I’m gonna talk about data migration deeper. 66 もしまだ時間があればデータ移行についても う少し詳しくお話します。

Slide 67

Slide 67 text

©Hatena Co., Ltd. ● Options ○ Push from Application ○ Push from Datastore ○ Poll old datastore periodically Real-time data migration 67 リアルタイムデータ移行の方法 App or DB からのpush か polling

Slide 68

Slide 68 text

©Hatena Co., Ltd. Push all the updates on the original system to the new app, from the original app. Real-time data migration (From App) 68 旧システムに対する書き込みを旧アプリから 新アプリに対して同期する Original App Original DB New App New DB write enqueue write

Slide 69

Slide 69 text

©Hatena Co., Ltd. ● Pros ○ Easy to validate and transform data so that it fits to the new DB structure. ● Cons ○ Necessary to add code to the original app to send updates to the queue. ○ Need to grasp all the sources of the updates (otherwise, some updates will lost). Real-time data migration (From App) 69 旧システムにおける書き込みの口を全て把握 する必要がある。

Slide 70

Slide 70 text

©Hatena Co., Ltd. Make the old database trigger writes the updates to the queue. Real-time data migration (From DB) 70 旧DBにtriggerを定義してそこからキューに書 き込む方法 Original App Original DB New App New DB write enqueue write

Slide 71

Slide 71 text

©Hatena Co., Ltd. ● Pros ● Don’t have to work on old application ● Comprehensive (No worry about missing updates) ● Cons ○ Need to maintain complexed triggers and UDFs that write the updates to the queue. ○ The migration logic will be regulated by SQL’s expressibility. Real-time data migration (From DB) 71 各テーブルへの書き込みの移行漏れの心配 がないが、複雑なトリガの運用が必須

Slide 72

Slide 72 text

©Hatena Co., Ltd. Fetch the data from the original system periodically. Real-time data migration (Poll) 72 定期的に旧システムからデータを取得し新シ ステムに移行 Original App Original DB New App New DB write write Cron Poll

Slide 73

Slide 73 text

©Hatena Co., Ltd. ● Pros ○ Don’t need to work on the original system ○ Can build the migration system independently. ● Cons ○ Delayed replication. Real-time data migration (Poll) 73 旧システムと独立して移行システムを構築で きる。同期に大きな遅延が起こる。

Slide 74

Slide 74 text

©Hatena Co., Ltd. Push from Application ● It is required to synchronize the data between original and new DB with small delays. ● Complexed data transformation process. Our choice 74 アプリケーションからのpushを採用、遅延の 少なさやデータ構造の変換のため

Slide 75

Slide 75 text

©Hatena Co., Ltd. While real-time data migration replicate the new updates to the original system, batch data migration aims to copy all the existing data in the original system. Batch data migration 75 バッチデータ移行では既存の全てのデータを 新システムに移行する

Slide 76

Slide 76 text

©Hatena Co., Ltd. ● Write idempotent script ○ It is hard to migrate all the data to the new system only with a single trial. ○ We’ll need to re-run our migration again to complete the job. ○ Idempotency will help the cycle of trial and error. Tips for writing a batch data migration script 76 移行スクリプトを冪等にすることで再実行を容 易にできるよにしておく。

Slide 77

Slide 77 text

©Hatena Co., Ltd. ● Estimate the execution time of the batch script ○ Try to estimate how much time our script to run. ○ If it is too long, consider to ■ Running the script on a dedicated server. ■ Scale up original or new database server. ■ Performance optimization on the script. Tips for writing a batch data migration script 77 実行にかかる時間を計算。長すぎる場合は 高速化のための対応を検討。

Slide 78

Slide 78 text

©Hatena Co., Ltd. ● Retry plan ○ The script may stop in the middle of the migration because of an unexpected error. ○ It will save your time to design the script so that it can re-run from the specific point of migration. Tips for writing a batch data migration script 78 スクリプトを任意の点から再開できるようにし ておくと再実行の時間を節約可 Re-run from here Already migrated Not yet migrated

Slide 79

Slide 79 text

©Hatena Co., Ltd. 1. Start real-time data migration 2. Batch data migration 3. Replace the application Steps of data migration 79 リアルタイムとバッチデータ移行の順序 Run batch data migration Start real-time data migration

Slide 80

Slide 80 text

©Hatena Co., Ltd. 1. Start real-time data migration 2. Batch data migration 3. Replace the application If the step1 and 2 reverse, some data won’t be migrated. Steps of data migration - Otherwise... 80 リアルタイムとバッチデータ移行の順序 Run batch data migration Batch data migration Real-time migration Start real-time data migration Data in this period will lost

Slide 81

Slide 81 text

©Hatena Co., Ltd. Risk of data collision (lost update anomaly) for update intensive data. Suppose we are trying to migrate data “X” from original DB to the new DB. Data collision between real-time and batch 81 更新頻度の高いデータではバッチとリアルタ イム移行間でデータ競合のリスク Original DB New DB X = 1

Slide 82

Slide 82 text

©Hatena Co., Ltd. First, batch data migration script reads data X from original DB. Data collision between real-time and batch 82 まず最初にバッチデータ移行スクリプトが データを旧DBから読み込む Original DB New DB X = 1 Batch data migration script X = 1

Slide 83

Slide 83 text

©Hatena Co., Ltd. The X on the original DB is updated to 2, and synchronized to the new DB, before the batch script write the data to the new DB. Data collision between real-time and batch 83 次にバッチスクリプトが新DBにデータを書く 前にリアルタイム移行が起きたとき Original DB New DB X = 2 Update X = 2 Batch data migration script X = 1 X = 2 Real-time data migration

Slide 84

Slide 84 text

©Hatena Co., Ltd. Finally, the batch data migration script overwrites value X in the new DB with X = 1. Data collision between real-time and batch 84 最後にバッチ移行スクリプトが新DBに書き込 みを行うと不整合が起きる。 Original DB New DB X = 2 Batch data migration script X = 1 X = 1 The value X should be equal to the X in the original DB... Update X = 2 Update on the original DB lost

Slide 85

Slide 85 text

©Hatena Co., Ltd. Compare their updated_at before write to the new DB, and adopt the newer value as the resulting data. To avoid the Lost Update 85 データの更新時刻を比較して新しい方を採用 することで不整合を防ぐ。 Original DB New DB Batch data migration script X = 2 updated_at = 1970-01-01 12:00:01 X = 1 updated_at = 1970-01-01 12:00:00 X = 2 updated_at = 1970-01-01 12:00:01 Do not update because the existing data is newer.

Slide 86

Slide 86 text

©Hatena Co., Ltd. Though the Lost Update anomaly will occur on the update intensive data, in the most cases, the probability of data collision might be ignorable and it is sufficient to validate and re-run the data migration (only if the migration went wrong). Should we always implement it? 86 更新頻度の低いデータでは起こりにくいので 多少無視できる