Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Migration with Confidence

Data Migration with Confidence

At RedDotRubyConf 2017

Juanito Fatas

June 22, 2017
Tweet

More Decks by Juanito Fatas

Other Decks in Programming

Transcript

  1. Data Migration
    with Confidence
    Juanito Fatas
    RedDotRubyConf 2017

    View full-size slide

  2. @JuanitoFatas
    Ramen Specialist

    View full-size slide

  3. Spanish name
    From Taiwan
    Live in Tokyo

    View full-size slide

  4. I became a salaryman
    #

    View full-size slide

  5. Cookpad Global
    Cookpad Japan

    View full-size slide

  6. Made in Japan
    $

    View full-size slide

  7. Reserve for
    local jokes
    %

    View full-size slide

  8. Data Migration?

    View full-size slide

  9. Schema Migration

    Data Migration

    View full-size slide

  10. Schema Migration
    Alter Schemas
    over time
    https://en.wikipedia.org/wiki/Schema_migration

    View full-size slide

  11. Data Migration
    https://en.wikipedia.org/wiki/Data_migration
    Transfer data
    from to
    A
    System
    B
    System

    View full-size slide

  12. Existing Data
    Migration

    View full-size slide

  13. Data Migration to
    Existing System

    View full-size slide

  14. Data Migration
    with Confidence
    @JuanitoFatas
    Specialist of cookpad
    RedDotRubyConf 2017

    View full-size slide

  15. Why Migration?

    View full-size slide

  16. Rewrote
    for Clients

    View full-size slide

  17. New Partner
    joins company

    View full-size slide

  18. Data Migration

    View full-size slide

  19. ~
    Get All Data
    to our system
    Simple Goal

    View full-size slide

  20. ~
    Import data
    Modeling
    Migrate
    After migrate

    View full-size slide

  21. ~
    Get the Data
    Provider API
    Data Dump

    View full-size slide

  22. Provider API
    & Generic
    Migration Code

    View full-size slide

  23. Data Dump
    & Generic
    Migration Code

    View full-size slide

  24. HOWTO
    Data Migration

    View full-size slide

  25. ~
    Start with
    a rake task

    View full-size slide

  26. ~
    lib/tasks/data_migration.rake

    View full-size slide

  27. ~
    lib/data_migration.rb

    View full-size slide

  28. ~
    Import dump
    to local

    View full-size slide

  29. ~
    GBs-size file

    View full-size slide

  30. monthly users
    62
    30M
    countries
    ~

    View full-size slide

  31. ~
    Add delay to the
    SQL dump
    for production

    View full-size slide

  32. ~
    Add sleep()
    before INSERT INTO

    View full-size slide

  33. ~
    Editing huge file

    View full-size slide

  34. ~
    Enumerable#lazy
    https://ruby-doc.org/core-2.4.1/Enumerable.html#method-i-lazy

    View full-size slide

  35. ~
    set accordingly for
    staging & production

    View full-size slide

  36. ~
    Modeling
    Database

    View full-size slide

  37. ~
    With these 5 methods, you can model anything.

    View full-size slide

  38. ~
    Map data to
    your current
    system

    View full-size slide

  39. ~
    Sometimes as easy as

    View full-size slide

  40. ~
    As HTML in the
    recipes table
    field ‘steps’
    ,
    Sometimes…

    View full-size slide

  41. ~
    Setup
    Test Suite

    View full-size slide

  42. ~
    - Why Tests?
    The migration
    code only used once

    View full-size slide

  43. ~
    Better code
    through
    boring tests

    View full-size slide

  44. ~
    TDD to
    Get Things Done

    View full-size slide

  45. ~
    Modeling
    Tests
    Repeat

    View full-size slide

  46. ~
    Use all methods
    that raises
    exception

    View full-size slide

  47. ~
    Fail Fast
    to find all errors

    View full-size slide

  48. ~
    Example
    Migrate Recipes

    View full-size slide

  49. ~
    Add more
    migrators to
    migrate

    View full-size slide

  50. ~
    Data Integrity

    View full-size slide

  51. ~
    Transaction

    View full-size slide

  52. ~
    Idempotent
    Operation

    View full-size slide

  53. ~
    Run Migration
    many times

    View full-size slide

  54. ~
    Produce the
    Same Result

    View full-size slide

  55. ~
    f(x) = f(x)

    View full-size slide

  56. ~
    Upsert
    Update or Insert

    View full-size slide

  57. ~
    MySQL ON DUPLICATE KEY UPDATE
    PostgreSQL ON CONFLICT UPDATE
    PostgreSQL 9.5+
    seamusabshere/upsert

    View full-size slide

  58. ~
    Data
    Accuracy

    View full-size slide

  59. ~
    Manually
    Check

    View full-size slide

  60. ~
    Automated
    Check

    View full-size slide

  61. ~
    Example
    Check users with
    most Recipes

    View full-size slide

  62. ~
    To check more things
    Add
    more Checker
    objects

    View full-size slide

  63. ~
    Use many
    small objects
    to compose

    View full-size slide

  64. ~
    Objects
    Everywhere

    View full-size slide

  65. For better object design

    View full-size slide

  66. For better object design

    View full-size slide

  67. Background
    Jobs

    View full-size slide

  68. Workers =
    CPU cores

    View full-size slide

  69. Designated
    Queues

    View full-size slide

  70. Log Every
    Unexpected Error

    View full-size slide

  71. For Better
    handling of Errors

    View full-size slide

  72. Run against
    all data to be
    migrated

    View full-size slide

  73. Fix every error
    you can before
    real migration

    View full-size slide

  74. Retry
    mechanism

    View full-size slide

  75. Foreign Key
    Constraints
    Locks

    View full-size slide

  76. MySQL
    deadlock
    Results in

    View full-size slide

  77. Automatic Retry
    # Rails 4: ActiveRecord::StatementInvalid

    View full-size slide

  78. Make sure what
    should be Retry

    View full-size slide

  79. retry_on
    discard_on
    ActiveJob::Exceptions

    View full-size slide

  80. Automatic Retry

    View full-size slide

  81. Examine & Retry

    View full-size slide

  82. Status
    Reporting

    View full-size slide

  83. Report
    every minute

    View full-size slide

  84. Monitoring
    CPU Usage

    View full-size slide

  85. ~
    Performance

    View full-size slide

  86. Performance
    is a Rabbit hole

    View full-size slide

  87. Preload
    associations

    View full-size slide

  88. Minimize scope
    of transaction

    View full-size slide

  89. Transaction
    Isolation Levels
    https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html
    https://www.postgresql.org/docs/current/static/transaction-iso.html

    View full-size slide

  90. Avoid
    unnecessary
    callbacks

    View full-size slide

  91. Example
    You can touch
    http://api.rubyonrails.org/classes/ActiveRecord/NoTouching/ClassMethods.html#method-i-no_touching
    after migration

    View full-size slide

  92. Process multiple
    records in one job

    View full-size slide

  93. Cache data
    in Memory

    View full-size slide

  94. Cache data
    in Redis

    View full-size slide

  95. Migrate
    Important
    things first

    View full-size slide

  96. First 10000 users
    w/ most recipes

    View full-size slide

  97. Scale up the
    Database

    View full-size slide

  98. Decrease the
    workers
    /0

    View full-size slide

  99. Bulk Insert
    Bulk Upsert*
    * Only MySQL supports bulk upsert
    zdennis/activerecord-import

    View full-size slide

  100. Every change to
    make it fast

    View full-size slide

  101. Run the WHOLE
    migration again

    View full-size slide

  102. Keep CPU usage
    max at 75%
    all

    View full-size slide

  103. ~
    Post Migration

    View full-size slide

  104. Update all
    necessities

    View full-size slide

  105. Redirect tables
    Cookpad
    Redirect programs
    Server redirects
    Provider

    View full-size slide

  106. Redirection Service
    cookpad/mirin

    View full-size slide

  107. Cases of
    Email

    View full-size slide

  108. Remove duplicate
    emails before
    migration

    View full-size slide

  109. Remove invalid
    emails before
    migration

    View full-size slide

  110. downcase
    all the emails

    View full-size slide

  111. ~
    Get
    Site Dump

    View full-size slide

  112. ~
    100GB generated
    on EC2*
    EC2 has bandwidth limits

    View full-size slide

  113. ~
    scp takes days
    ONLY if nothing failed within days

    View full-size slide

  114. ~
    delivers
    encrypted disk

    View full-size slide

  115. Migrate Millions
    of records

    View full-size slide

  116. AR + transaction
    bulk in/upsert
    activerecord-import
    load data in file


    View full-size slide

  117. Weeks
    Month-ish

    View full-size slide

  118. Run low priority
    job to migrate
    them

    View full-size slide

  119. When migrated
    User signed in

    View full-size slide

  120. Migrate their data
    in high priority

    View full-size slide

  121. Migrate
    100K
    photos

    View full-size slide

  122. How our image work

    View full-size slide

  123. Design so it
    produces
    the same hash
    4

    View full-size slide

  124. Set the
    designated hash
    during migration
    instead of upload, generate hash

    View full-size slide

  125. Benchmark how
    long to finish all

    View full-size slide

  126. Migrate them
    X days before
    in low priority

    View full-size slide

  127. 99% photos
    won’t change

    View full-size slide

  128. Migrate users
    password to
    secure auth

    View full-size slide

  129. Figure out
    what algorithm(s)
    was(ere) used

    View full-size slide

  130. When
    migrated user
    signed in

    View full-size slide

  131. System’s password
    auth will fail

    View full-size slide

  132. Fallback to
    Legacy Auth

    View full-size slide

  133. When password
    matched from
    legacy auth

    View full-size slide

  134. Set his password
    through secure
    password scheme

    View full-size slide

  135. Migration
    ~
    The Future

    View full-size slide

  136. Migration done
    in Ghost Table
    fashion

    View full-size slide

  137. Data Dump
    & Generic
    Migration Code
    Only need to modeling database

    View full-size slide

  138. Rails provides
    sharp tools
    thanks to rails core team

    View full-size slide

  139. Use Small objects
    to make your code more
    readable &
    maintainable

    View full-size slide

  140. Abstraction is the
    God of Programming!


    — Matthew Mongeau @halogenandtoast

    View full-size slide

  141. Schedule >> Fast

    View full-size slide

  142. Schedule >> Fast
    Integrity >> schedule

    View full-size slide

  143. Data
    Migration
    sounds hard

    View full-size slide

  144. Keep it Simple
    Made it Easy

    View full-size slide

  145. Do the Simplest Things


    — Winston Teo Yon Wei @winstonyw

    View full-size slide

  146. Enjoy ☕
    Thank you!

    View full-size slide