$30 off During Our Annual Pro Sale. View Details »

How we replaced a 10-year-old Perl product using Scala

How we replaced a 10-year-old Perl product using Scala

Rikito Taniguchi

June 29, 2019
Tweet

More Decks by Rikito Taniguchi

Other Decks in Technology

Transcript

  1. How we replaced a
    10-year-old Perl
    product using Scala
    Scala Matsuri 2019
    2019/06/29

    View Slide

  2. ©Hatena Co., Ltd.
    “The project that is difficult to maintain or extend”
    ● Long running & Poor documentation
    ● Inflexible code
    ● Outdated dependencies
    ● ...
    Legacy project?
    2
    レガシープロジェクト: メンテナンスや拡張が
    難しくなったプロジェクト

    View Slide

  3. ©Hatena Co., Ltd.
    ● Refactoring
    ● Re-architecting
    ● Rewrite
    Escape from being legacy
    3
    レガシープロジェクト: メンテナンスや拡張が
    難しくなったプロジェクト

    View Slide

  4. ©Hatena Co., Ltd.
    ● Refactoring
    ● Re-architecting
    ● Rewrite
    ○ Why and How we replaced a 10-year-old Perl
    product (Hatena-Bookmark) using Scala
    Escape from being legacy
    4
    何故、そしてどのようにScalaを用いてリプ
    レースしたかについてお話します

    View Slide

  5. ©Hatena Co., Ltd.
    ● Rikito taniguchi
    ● @tanishiking / github: tanishiking
    ● id:tanishiking24
    ● Hatena (2017~)
    ○ Hatena-Bookmark team
    About me
    5
    2017年に入社以来、はてなブックマークのリ
    プレースに携わってきました。

    View Slide

  6. ©Hatena Co., Ltd.
    ● What we did
    ● Why we decided to perform the Full-Rewrite?
    ● Why we chose Scala?
    ● The Big Rewrite / data migration.
    Agenda
    6
    今日のアジェンダ

    View Slide

  7. ©Hatena Co., Ltd.
    ● What we did
    ● Why we decided to perform the Full-Rewrite?
    ● Why we chose Scala?
    ● The Big Rewrite / data migration.
    Agenda
    7
    概要からお話します

    View Slide

  8. ©Hatena Co., Ltd.
    Hatena-Bookmark
    8
    日本国内でサービス展開するソーシャルブッ
    クマークプラットフォーム
    ● Social Bookmark Platform in Japan
    ● Launched at 2005

    View Slide

  9. ©Hatena Co., Ltd.
    ● Monolithic Perl application
    ○ 400000 lines of Perl code (excluding tests)
    ○ 270000 lines of tests
    ○ About 70000 lines of HTML template
    ○ (at November 2016)
    ● And many git submodules...
    Hatena-Bookmark
    9
    モノリシックなPerlアプリケーションとして構築
    されていた

    View Slide

  10. ©Hatena Co., Ltd.
    ● Inconsistent wording
    ● Homegrown ORM and Web framework
    ● The models no longer reflects the real
    ● Fat model / Fat controller
    ● Too slow tests
    ● Complicated release processes
    ● Difficult to setup develop environment
    Hatena-Bookmark was “legacy” project
    10
    ソースコードやDBの肥大化・老朽化によりソ
    フトウェアの最適化や変更が難しく

    View Slide

  11. ©Hatena Co., Ltd.
    We decided to rewrite Hatena-Bookmark using
    Scala in 2015 !!!
    Rewrite Hatena-Bookmark using Scala
    11
    2015年にはてなブックマークのリライトを決め
    る。Scalaを採用。
    Source: Hatena Bookmark in Scala
    https://www.slideshare.net/oarat/2015-0801-scala
    (Scala Kansai Summit 2015)
    ● Reduce the costs of
    maintaining.
    ● Optimize the
    application.

    View Slide

  12. ©Hatena Co., Ltd.
    ● Server-side / web frontend engineers
    ○ 2 - 4 members
    ○ Develop the new system, and take care of the
    original system.
    ● Infrastructure engineer
    ○ 1 member
    Team members (engineers)
    12
    はてなブックマーク(Web)の開発チーム構成

    View Slide

  13. ©Hatena Co., Ltd.
    ● Create a new database for the new application
    ○ (Not sharing the existing database with old
    application)
    ○ Requires data migration
    Brand New Database
    13
    新システムでは新しいDBを利用し、旧DBの
    再利用はしない
    Original
    App
    New
    App
    Original
    DB
    New
    DB

    View Slide

  14. ©Hatena Co., Ltd.
    New software architecture (overview)
    14
    Core App Server にScala、ユーザーからの
    リクエストを処理する部分にPerl
    Core App Server
    (Scala)
    BFF (Perl)
    Microservices
    (Go/Python/Perl)
    Reverse proxy
    CDN
    Split to
    ● Backend For Frontend(Perl)
    ● Core App Server (Scala)
    ● (and some microservices)

    View Slide

  15. ©Hatena Co., Ltd.
    4
    YEARS
    LATER
    15
    4年後...

    View Slide

  16. ©Hatena Co., Ltd.
    Now, Hatena-Bookmark’s
    Core App Server is built on
    Scala !!!
    16
    はてなブックマークのCore App Serverは
    Scalaで動いている!

    View Slide

  17. ©Hatena Co., Ltd.
    The original app server is no longer running!
    17
    旧システムは完全に停止
    CPU usage on the original app hosts

    View Slide

  18. ©Hatena Co., Ltd.
    Improvements in performance.
    Benefits of Rewrite
    18
    rewriteにより250ms以内で返せるreqの割合
    が約40%から約90%に
    The proportion of requests whose response time is smaller
    than 250ms (40% => 90%) (in a comment list page).

    View Slide

  19. ©Hatena Co., Ltd.
    ● Make it quite easy to add/change the features
    ○ Now, we release the software to production almost
    everyday.
    ● Save the substantial amount of computation
    resource for running an application
    Benefits of Rewrite
    19
    サービスへの変更が非常に容易に
    計算リソースの大幅な節約

    View Slide

  20. ©Hatena Co., Ltd.
    ● Were there any other options than rewrite for
    revitalising the project?
    ● Rewrite is not the only option to revitalize the
    project.
    ○ Refactoring
    ○ Re-architecting
    ○ Full Rewrite
    Was rewrite the best option?
    20
    ソフトウェアのフルスクラッチが唯一の選択肢
    ではない

    View Slide

  21. ©Hatena Co., Ltd.
    ● Risk
    ○ Usually takes months or even years.
    ○ Risk of the regressions.
    ● Overhead
    ○ We may have to freeze the development on the
    original software while rewriting.
    Rewrite is basically undesirable...
    21
    リライトには数年かかることも、既存プロジェ
    クトの開発を止めることにも

    View Slide

  22. ©Hatena Co., Ltd.
    So, why we decided to
    rewrite, in spite of
    the risks ?
    22
    では何故我々はそんなリスクを承知のうえで
    リライトという道を選んだのか

    View Slide

  23. ©Hatena Co., Ltd.
    ● What we did
    ● Why we decided to perform the Full-Rewrite?
    ● Why we chose Scala?
    ● The Big Rewrite / data migration.
    Agenda
    23
    何故リライトという道を選んだのか

    View Slide

  24. ©Hatena Co., Ltd.
    ● Homegrown ORM
    ● The models no longer reflects the real
    ● Fat model / Fat controller
    ● and more ...
    Hatena-Bookmark was “legacy” project
    24
    ソースコードやDBの肥大化・老朽化によりソ
    フトウェアの最適化や変更が難しく

    View Slide

  25. ©Hatena Co., Ltd.
    ● Designed based on “convention over configuration”
    ● They had been useful for rapid development, but…
    ○ No longer maintained.
    ○ People started to deviate the “convention”...
    ● Tight coupled with the system.
    ○ Hinder the large scale refactoring and optimization.
    (Homegrown) ORM, Web App Framework
    25
    もうメンテされてない内製フレームワークへの
    依存。

    View Slide

  26. ©Hatena Co., Ltd.
    In the real world, a
    single content (entry)
    may have the multiple
    URLs.
    The difference between model and reality (example)
    26
    現実世界ではひとつのコンテンツが複数の
    URLを持ちうる
    http://example.com/
    https://example.com/
    https://foo.bar/
    Entry
    301 redirect /
    canonical
    Bookmark
    Bookmark

    View Slide

  27. ©Hatena Co., Ltd.
    In the old system, each
    URL had been modeled
    to have each different
    entry.
    The difference between model and reality (example)
    27
    旧システムでは各URLはそれぞれ異なるエン
    トリを指し示す。
    http://example.com/
    https://example.com/
    https://foo.bar/
    Entry
    Bookmark
    Bookmark
    Same contents!

    View Slide

  28. ©Hatena Co., Ltd.
    ● Fat model
    ○ The model that has more logics than its own
    behavior.
    ○ $ wc -l lib/Hatena/Bookmark/MoCo/Entry.pm
    ■ 4611 lib/Hatena/Bookmark/MoCo/Entry.pm
    ● Fat controller
    ○ The controllers sometimes have the logics that
    represents model’s behavior.
    Fat model / Fat controller
    28
    モデルの振る舞い以上のロジックまで持った
    モデルが出現

    View Slide

  29. ©Hatena Co., Ltd.
    ● Inconsistent wording
    ○ “favorite” and “follow” mean the same thing.
    ● Too long test
    ● Too complicated release process
    ● Difficult to setup the development environment.
    and more ...
    29
    他にも様々な問題が...

    View Slide

  30. ©Hatena Co., Ltd.
    So, why we decided to rewrite, in
    spite of the risks ?
    Fundamental changes
    Past failure on refactorings
    30
    何故リライトという道を選んだのか
    理由は主に2つ

    View Slide

  31. ©Hatena Co., Ltd.
    Fundamental changes were
    necessary for making the software
    keep to thrive...
    ● Revise DB schema / model
    ● Remove the dependency on the homegrown ORM and
    framework.
    Fundamental changes
    31
    ソフトウェアに対する根本的な変更が必要だ
    ということがわかっていた

    View Slide

  32. ©Hatena Co., Ltd.
    We’d experienced several times of
    large scale refactoring ended in
    failure.
    ● Tried to replace the framework and gave up.
    ● Tried to refactor around the database architecture /
    connection and failed.
    Past failures on refactoring
    32
    過去に大規模なリファクタリングを試みようと
    して失敗

    View Slide

  33. ©Hatena Co., Ltd.
    It was virtually impossible to make the system keep to
    thrive only with refactoring…
    => Full Rewrite
    Why Rewrite
    33
    これらの理由からリライトが最善だと判断

    View Slide

  34. ©Hatena Co., Ltd.
    ● What we did
    ● Why we decided to perform the Full-Rewrite?
    ● Why we chose Scala?
    ● The Big Rewrite / data migration.
    Agenda
    34
    何故Scalaを選んだのか

    View Slide

  35. ©Hatena Co., Ltd.
    ● Well suited for complex problem domain
    ○ Expressive type system
    ○ Scalability
    ○ Type safe
    ● Concise syntax
    ● Already adopted Scala in other projects
    Why Scala for Core App Server ?
    35
    社内での利用実績、複雑なドメインを簡潔に
    表現できる。

    View Slide

  36. ©Hatena Co., Ltd.
    New software architecture (overview)
    36
    新アーキテクチャの概要(再掲)
    Core App Server
    (Scala)
    BFF (Perl)
    Microservices
    (Go/Python/Perl)
    Reverse proxy
    CDN
    Split to
    ● Backend For Frontend(Perl)
    ● Core App Server (Scala)
    ● (and some microservices)

    View Slide

  37. ©Hatena Co., Ltd.
    ● Hatena has a lot of Perl developers
    ● Rapid development
    ○ Easy to use / learn
    ○ Do not require compiling
    ● Thin layer
    Why Perl for BFF?
    37
    社内での利用実績、Perlエンジニアが多い

    View Slide

  38. ©Hatena Co., Ltd.
    Scala isn’t easy to learn…
    To alleviate the barrier to onboard the project,
    ● Prepare learning materials
    ● Try to avoid using “difficult” libraries
    ○ Monocle / cats / scalaz …
    ○ Though they are quite useful, they make it more
    difficult for non-scala engineer to onboard.
    Learning curve for Scala
    38
    Scala学習教材の用意、「難しい」ライブラリは
    できる限り避け参入障壁を下げる

    View Slide

  39. ©Hatena Co., Ltd.
    ● Library
    ○ Scalatra
    ○ Slick (Plain SQL Query)
    ○ circe
    ○ Elastic4s
    ○ etc
    ● Cake pattern
    Tech stacks for Scala
    39
    Scalaの開発で利用している技術スタック

    View Slide

  40. ©Hatena Co., Ltd.
    ● To avoid the problems in the old system, design the
    architecture based on Domain Driven Design.
    ● Problems in the old system
    ○ The gap between models and real world.
    ○ Fat model / Fat controller.
    ○ Inconsistent wording.
    Domain Driven Design
    40
    旧システムでの課題を解決するためドメイン
    駆動設計の徹底

    View Slide

  41. ©Hatena Co., Ltd.
    ● Common and rigorous language between developers
    and all members who are related to the project.
    ● Domain model name after the ubiquitous languages.
    Discuss and re-define the ubiquitous languages, share
    those languages.
    Ubiquitous Languages
    41
    ユビキタス言語の再定義
    ✅ inconsistent wording

    View Slide

  42. ©Hatena Co., Ltd.
    Layered architecture
    42
    レイヤードアーキテクチャを採用し各レイヤの
    責務を明確にする。
    ✅ separation of concerns

    View Slide

  43. ©Hatena Co., Ltd.
    Dependency inversion principle
    43
    依存関係逆転の原則 / インフラレイヤの変更
    によるほかレイヤへの影響を抑える
    ✅ ease of database refactoring ...

    View Slide

  44. ©Hatena Co., Ltd.
    44
    package domain.repository // Cake pattern
    trait BookmarkComponent { // Wrap the repository interface
    def bookmarkLoader: BookmarkLoader
    trait BookmarkLoader {
    // Domain repository has only the interface.
    def find(bookmarkId: BookmarkId): Option[BookmarkEntity]
    }
    }
    package infrastructure
    trait BookmarkComponent
    extends domain.repository.BookmarkComponent {
    // Concrete implementations here
    def bookmarkLoader: BookmarkLoader = BookmarkLoaderImpl
    }

    View Slide

  45. ©Hatena Co., Ltd.
    ● The model had methods for retrieving and resolving the
    relationships with other models (in the old system)
    ○ Fat Model
    ● Define it as a extension method in domain service
    (domain relation) (in the new system).
    Relations between entities
    45
    エンティティ間の関係の解決

    View Slide

  46. ©Hatena Co., Ltd.
    Extension method in domain service
    46
    package domain.relation
    trait BookmarkLocationComponent {
    self: repository.LocationComponent =>
    implicit class BookmarkSeqLocationsRelation(
    bookmarks: Seq[BookmarkEntity]
    ) {
    // In the real system, the return value is something like
    // Bookmark with { def location: Location }
    def withLocations: Stream[(BookmarkEntity, Location)] = …
    }
    }

    View Slide

  47. ©Hatena Co., Ltd.
    ● Well suited for complex problem domain
    ○ Expressive type system
    ○ Scalability
    ○ Type safe
    ● Concise syntax
    ● Already adopted Scala in other projects
    Why Scala for Core App Server ?
    47
    社内での利用実績、複雑なドメインを簡潔に
    表現できる。

    View Slide

  48. ©Hatena Co., Ltd.
    ● What we did
    ● Why we decided to perform the Full-Rewrite?
    ● Why we chose Scala?
    ● The Big Rewrite / data migration.
    Agenda
    48
    Full-Rewrite、データ移行について

    View Slide

  49. ©Hatena Co., Ltd.
    ● Make the system maintainable and easy to change.
    ● Revise models and DB schema.
    ● Optimize the system and save the computation
    resources.
    Project goal
    49
    プロジェクトの目標

    View Slide

  50. ©Hatena Co., Ltd.
    ● Don’t add any new big feature while rewriting.
    ● Continue to provide the main features.
    ● Obsolete some of minor features.
    Project scope
    50
    新機能追加はなし、既存機能は基本的に存
    続させる(一部廃止はあり)

    View Slide

  51. ©Hatena Co., Ltd.
    ● Rewrite all at once ? or
    ● Incremental rewrite ?
    THE BIG REWRITE
    51
    一度にすべて置き換えるか
    インクリメンタルに置き換えるか

    View Slide

  52. ©Hatena Co., Ltd.
    Split the rewriting process into smaller number of phases.
    ● Aug 2017: Replace comment list page
    ● Nov 2017: Replace user page
    ● Mar 2018: Replace top page
    ● Mar 2018: Replace search feature
    ● ...
    Incremental Rewrite
    52
    一度に全てを置き換えず、何度かに分けて
    徐々にリライト

    View Slide

  53. ©Hatena Co., Ltd.
    ● Pros
    ○ Each phase of release clarifies the progress and
    business value.
    ○ Safer than a big-bang rewrite.
    ● Cons
    ○ We have to run both the new and original system
    until the rewrite complete.
    Incremental Rewrite
    53
    利点: 各フェーズ毎に進捗と成果を可視化
    欠点: 新旧両システムを稼働させる必要

    View Slide

  54. ©Hatena Co., Ltd.
    LIST ALL THE FEATURES and LIST ALL THE RESOURCES
    EACH FEATURE DEPENDS (BY READING SOURCE CODE)
    ● Choose which features to re-implement or not.
    ● Prioritize based on the dependencies and business
    impact.
    ● Group them into the components.
    ○ Rewrite each group one by one.
    Thorough investigation on the old system
    54
    既存システムの全ての機能と依存するリソー
    スの洗い出し

    View Slide

  55. ©Hatena Co., Ltd.
    ● Where are we in the project?
    ○ The list will help clarifying the progress.
    ● Encounter an unexpected features / dependencies
    while the rewrite project…
    ○ There’s no way to avoid it other than listing all features
    and dependencies thoroughly before rewrite...
    Thorough investigation on the old system
    55
    プロジェクトの進捗を明らかに
    想定外の仕様が後で発覚するのを防ぐ

    View Slide

  56. ©Hatena Co., Ltd.
    Switch upstream on reverse proxy
    56
    reverse proxy でリクエストを新/旧システムに
    振り分け
    Listing user comments
    User page
    Setting Recommend
    old
    nginx
    Route to old
    system
    Route to new
    system
    new

    View Slide

  57. ©Hatena Co., Ltd.
    Split a component as a microservice
    57
    一部の機能をマイクロサービスとして分離で
    きることも
    Listing user comments
    User page
    Setting
    Recommend
    old
    nginx
    new
    Split as a
    microservice

    View Slide

  58. ©Hatena Co., Ltd.
    Since we created a new database with brand new DB
    structure, it was required to migrate all the data in old
    database to new one.
    Data migration
    58
    新アプリケーションのために新しくDBを作っ
    たのでデータ移行が必要
    Original
    App
    New
    App
    Original
    DB
    New
    DB

    View Slide

  59. ©Hatena Co., Ltd.
    Downtime for maintenance
    ● Stop the service for each
    data migration.
    ● Maintenance time might
    continue several hours.
    ○ Large scale
    ○ Complexed ETL process
    Downtime for maintenance vs zero-downtime
    59
    メンテナンスを挟むデータ移行と、ゼロダウン
    タイムでのデータ移行
    Zero-downtime
    ● No downtime
    ● Require real-time data
    replication.
    ● Replication delay.

    View Slide

  60. ©Hatena Co., Ltd.
    Data migration with zero-downtime
    60
    ゼロダウンタイムでのデータ移行することを決

    ● Considering the required number of downtimes, it
    wasn’t acceptable to stop the service repeatedly.
    ● Replication delay was not so critical.

    View Slide

  61. ©Hatena Co., Ltd.
    ● 1. Start real-time data migration
    ○ Replicate the writes on the old system to new
    system.
    ● 2. Batch data migration
    ○ Copy all existing data into the new database.
    ● 3. Data verification
    ● 4. Replace
    Real-time and batch data migration
    61
    リアルタイムデータ移行とバッチデータ移行で
    ゼロダウンタイムを実現

    View Slide

  62. ©Hatena Co., Ltd.
    ● Aug 2017: Replace comment list page
    ● Nov 2017: Replace user page
    ● Mar 2018: Replace top page
    ● …
    ● May 2019: Stop the old system
    Finally, released all the replaces!!
    62
    2019年5月に全てのデータ移行と置き換え作
    業が完了し旧システム停止

    View Slide

  63. ©Hatena Co., Ltd.
    ● Great improvements in non-functional
    requiurements
    ■ Faster response time
    ■ Improved algorithms
    ● Over the estimated development cost
    ○ It is hard to estimate the exact cost for the rewrite.
    ○ Rewriting the big legacy software always takes years.
    ● We didn’t have any big re-work
    ○ Thanks to the thorough investigation and.
    Review
    63
    見積もりより時間がかかってしまった
    しかし大きな手戻りなく進められた

    View Slide

  64. ©Hatena Co., Ltd.
    ● Refactoring or Rewrite?
    ■ Consider carefully / Refactoring first
    ■ Rewrite is really powerful but tough
    ● Solved problems in the old system thanks to Scala!
    ○ Thank you!!!
    ● Consider incremental rewrite for big rewrite
    ○ Clarify the progress / safer / cost
    ● Thorough research on the original system
    ○ Prevent big-rework / listing all tasks
    Summary
    64
    まとめ

    View Slide

  65. ©Hatena Co., Ltd.
    Questions?
    65

    View Slide

  66. ©Hatena Co., Ltd.
    If we have time
    I’m gonna talk about
    data migration deeper.
    66
    もしまだ時間があればデータ移行についても
    う少し詳しくお話します。

    View Slide

  67. ©Hatena Co., Ltd.
    ● Options
    ○ Push from Application
    ○ Push from Datastore
    ○ Poll old datastore periodically
    Real-time data migration
    67
    リアルタイムデータ移行の方法
    App or DB からのpush か polling

    View Slide

  68. ©Hatena Co., Ltd.
    Push all the updates on the original system to the new app,
    from the original app.
    Real-time data migration (From App)
    68
    旧システムに対する書き込みを旧アプリから
    新アプリに対して同期する
    Original
    App
    Original
    DB
    New
    App
    New
    DB
    write enqueue
    write

    View Slide

  69. ©Hatena Co., Ltd.
    ● Pros
    ○ Easy to validate and transform data so that it fits to
    the new DB structure.
    ● Cons
    ○ Necessary to add code to the original app to send
    updates to the queue.
    ○ Need to grasp all the sources of the updates
    (otherwise, some updates will lost).
    Real-time data migration (From App)
    69
    旧システムにおける書き込みの口を全て把握
    する必要がある。

    View Slide

  70. ©Hatena Co., Ltd.
    Make the old database trigger writes the updates to the
    queue.
    Real-time data migration (From DB)
    70
    旧DBにtriggerを定義してそこからキューに書
    き込む方法
    Original
    App
    Original
    DB
    New
    App
    New
    DB
    write
    enqueue
    write

    View Slide

  71. ©Hatena Co., Ltd.
    ● Pros
    ● Don’t have to work on old application
    ● Comprehensive (No worry about missing updates)
    ● Cons
    ○ Need to maintain complexed triggers and UDFs that
    write the updates to the queue.
    ○ The migration logic will be regulated by SQL’s
    expressibility.
    Real-time data migration (From DB)
    71
    各テーブルへの書き込みの移行漏れの心配
    がないが、複雑なトリガの運用が必須

    View Slide

  72. ©Hatena Co., Ltd.
    Fetch the data from the original system periodically.
    Real-time data migration (Poll)
    72
    定期的に旧システムからデータを取得し新シ
    ステムに移行
    Original
    App
    Original
    DB
    New
    App
    New
    DB
    write
    write
    Cron
    Poll

    View Slide

  73. ©Hatena Co., Ltd.
    ● Pros
    ○ Don’t need to work on the original system
    ○ Can build the migration system independently.
    ● Cons
    ○ Delayed replication.
    Real-time data migration (Poll)
    73
    旧システムと独立して移行システムを構築で
    きる。同期に大きな遅延が起こる。

    View Slide

  74. ©Hatena Co., Ltd.
    Push from Application
    ● It is required to synchronize the data between original
    and new DB with small delays.
    ● Complexed data transformation process.
    Our choice
    74
    アプリケーションからのpushを採用、遅延の
    少なさやデータ構造の変換のため

    View Slide

  75. ©Hatena Co., Ltd.
    While real-time data migration replicate the new updates
    to the original system, batch data migration aims to copy
    all the existing data in the original system.
    Batch data migration
    75
    バッチデータ移行では既存の全てのデータを
    新システムに移行する

    View Slide

  76. ©Hatena Co., Ltd.
    ● Write idempotent script
    ○ It is hard to migrate all the data to the new system
    only with a single trial.
    ○ We’ll need to re-run our migration again to
    complete the job.
    ○ Idempotency will help the cycle of trial and error.
    Tips for writing a batch data migration script
    76
    移行スクリプトを冪等にすることで再実行を容
    易にできるよにしておく。

    View Slide

  77. ©Hatena Co., Ltd.
    ● Estimate the execution time of the batch script
    ○ Try to estimate how much time our script to run.
    ○ If it is too long, consider to
    ■ Running the script on a dedicated server.
    ■ Scale up original or new database server.
    ■ Performance optimization on the script.
    Tips for writing a batch data migration script
    77
    実行にかかる時間を計算。長すぎる場合は
    高速化のための対応を検討。

    View Slide

  78. ©Hatena Co., Ltd.
    ● Retry plan
    ○ The script may stop in the
    middle of the migration because
    of an unexpected error.
    ○ It will save your time to design
    the script so that it can re-run
    from the specific point of
    migration.
    Tips for writing a batch data migration script
    78
    スクリプトを任意の点から再開できるようにし
    ておくと再実行の時間を節約可
    Re-run
    from
    here
    Already migrated
    Not yet migrated

    View Slide

  79. ©Hatena Co., Ltd.
    1. Start real-time data migration
    2. Batch data migration
    3. Replace the application
    Steps of data migration
    79
    リアルタイムとバッチデータ移行の順序
    Run batch
    data
    migration
    Start real-time
    data
    migration

    View Slide

  80. ©Hatena Co., Ltd.
    1. Start real-time data migration
    2. Batch data migration
    3. Replace the application
    If the step1 and 2
    reverse, some data
    won’t be migrated.
    Steps of data migration - Otherwise...
    80
    リアルタイムとバッチデータ移行の順序
    Run batch
    data
    migration
    Batch data migration
    Real-time migration
    Start real-time
    data
    migration
    Data in this
    period will
    lost

    View Slide

  81. ©Hatena Co., Ltd.
    Risk of data collision (lost update anomaly) for update
    intensive data.
    Suppose we are trying to migrate data “X” from original DB
    to the new DB.
    Data collision between real-time and batch
    81
    更新頻度の高いデータではバッチとリアルタ
    イム移行間でデータ競合のリスク
    Original
    DB
    New
    DB
    X = 1

    View Slide

  82. ©Hatena Co., Ltd.
    First, batch data migration script reads data X from original
    DB.
    Data collision between real-time and batch
    82
    まず最初にバッチデータ移行スクリプトが
    データを旧DBから読み込む
    Original
    DB
    New
    DB
    X = 1
    Batch data
    migration script
    X = 1

    View Slide

  83. ©Hatena Co., Ltd.
    The X on the original DB is updated to 2, and synchronized
    to the new DB, before the batch script write the data to the
    new DB.
    Data collision between real-time and batch
    83
    次にバッチスクリプトが新DBにデータを書く
    前にリアルタイム移行が起きたとき
    Original
    DB
    New
    DB
    X = 2
    Update
    X = 2
    Batch data
    migration script
    X = 1
    X = 2
    Real-time data
    migration

    View Slide

  84. ©Hatena Co., Ltd.
    Finally, the batch data migration script overwrites value X
    in the new DB with X = 1.
    Data collision between real-time and batch
    84
    最後にバッチ移行スクリプトが新DBに書き込
    みを行うと不整合が起きる。
    Original
    DB
    New
    DB
    X = 2
    Batch data
    migration script
    X = 1
    X = 1
    The value X
    should be
    equal to the X
    in the original
    DB...
    Update
    X = 2
    Update on the
    original DB
    lost

    View Slide

  85. ©Hatena Co., Ltd.
    Compare their updated_at before write to the new DB,
    and adopt the
    newer value
    as the resulting
    data.
    To avoid the Lost Update
    85
    データの更新時刻を比較して新しい方を採用
    することで不整合を防ぐ。
    Original
    DB
    New
    DB
    Batch data
    migration script
    X = 2
    updated_at = 1970-01-01 12:00:01
    X = 1
    updated_at = 1970-01-01 12:00:00
    X = 2
    updated_at = 1970-01-01 12:00:01
    Do not update
    because the
    existing data is
    newer.

    View Slide

  86. ©Hatena Co., Ltd.
    Though the Lost Update anomaly will occur on the update
    intensive data, in the most cases, the probability of data
    collision might be ignorable and it is sufficient to validate
    and re-run the data migration (only if the migration went
    wrong).
    Should we always implement it?
    86
    更新頻度の低いデータでは起こりにくいので
    多少無視できる

    View Slide