Data Migration with Confidence

Data Migration with Confidence

At RedDotRubyConf 2017

771951f55ed37335f238e1a80dfda9cd?s=128

Juanito Fatas

June 22, 2017
Tweet

Transcript

  1. Data Migration with Confidence Juanito Fatas RedDotRubyConf 2017

  2. @JuanitoFatas Ramen Specialist

  3. Spanish name From Taiwan Live in Tokyo

  4. I became a salaryman #

  5. Cookpad Global Cookpad Japan

  6. Made in Japan $

  7. Reserve for local jokes %

  8. Data Migration?

  9. Schema Migration
 Data Migration

  10. Schema Migration Alter Schemas over time https://en.wikipedia.org/wiki/Schema_migration

  11. Data Migration https://en.wikipedia.org/wiki/Data_migration Transfer data from to A System B

    System
  12. Existing Data Migration

  13. None
  14. Data Migration to Existing System

  15. Data Migration with Confidence @JuanitoFatas Specialist of cookpad RedDotRubyConf 2017

  16. Why Migration?

  17. Rewrote for Clients

  18. New Partner joins company

  19. Data Migration

  20. ~ Get All Data to our system Simple Goal

  21. ~ Import data Modeling Migrate After migrate

  22. ~ Get the Data Provider API Data Dump

  23. Provider API & Generic Migration Code ✅

  24. Data Dump & Generic Migration Code

  25. HOWTO Data Migration

  26. ~ Start with a rake task

  27. ~

  28. ~ lib/tasks/data_migration.rake

  29. ~ lib/data_migration.rb

  30. ~ Import dump to local

  31. ~ GBs-size file

  32. monthly users 62 30M countries ~

  33. ~ Add delay to the SQL dump for production

  34. ~ Add sleep() before INSERT INTO

  35. ~ Editing huge file

  36. ~ +

  37. ~

  38. ~ Enumerable#lazy https://ruby-doc.org/core-2.4.1/Enumerable.html#method-i-lazy

  39. ~

  40. ~

  41. ~ set accordingly for staging & production

  42. ~ Modeling Database

  43. ~

  44. ~ With these 5 methods, you can model anything.

  45. ~ Map data to your current system

  46. ~ Sometimes as easy as

  47. ~ As HTML in the recipes table field ‘steps’ ,

    Sometimes…
  48. ~

  49. ~

  50. ~ Setup Test Suite

  51. ~

  52. ~

  53. ~ - Why Tests? The migration code only used once

  54. ~ Better code through boring tests

  55. ~ TDD to Get Things Done

  56. ~ Modeling Tests Repeat

  57. Migration

  58. ~ Use all methods that raises exception

  59. ~ Fail Fast to find all errors

  60. ~ Example Migrate Recipes

  61. ~

  62. ~

  63. None
  64. ~

  65. ~ Add more migrators to migrate

  66. ~ Data Integrity

  67. ~ Transaction

  68. ~ Idempotent Operation

  69. ~

  70. ~ Run Migration many times

  71. ~ Produce the Same Result

  72. ~ f(x) = f(x)

  73. ~ Upsert Update or Insert

  74. ~ MySQL ON DUPLICATE KEY UPDATE PostgreSQL ON CONFLICT UPDATE

    PostgreSQL 9.5+ seamusabshere/upsert
  75. ~

  76. ~ Data Accuracy

  77. ~ Manually Check

  78. ~ Automated Check

  79. ~ Example Check users with most Recipes

  80. ~

  81. ~

  82. ~

  83. ~

  84. ~

  85. ~ To check more things Add more Checker objects

  86. ~ Use many small objects to compose

  87. ~ Objects Everywhere

  88. For better object design

  89. For better object design

  90. Background Jobs

  91. Workers = CPU cores

  92. Designated Queues

  93. None
  94. None
  95. Log Every Unexpected Error

  96. None
  97. None
  98. For Better handling of Errors

  99. Run against all data to be migrated

  100. Fix every error you can before real migration

  101. ~ Tools

  102. Retry mechanism

  103. Foreign Key Constraints Locks

  104. MySQL deadlock Results in

  105. Automatic Retry # Rails 4: ActiveRecord::StatementInvalid

  106. Make sure what should be Retry

  107. retry_on discard_on ActiveJob::Exceptions

  108. Automatic Retry

  109. Examine & Retry

  110. In Resque

  111. None
  112. Status Reporting

  113. None
  114. None
  115. Report every minute

  116. Monitoring CPU Usage

  117. None
  118. ~ Performance

  119. Performance is a Rabbit hole

  120. Preload associations

  121. Minimize scope of transaction

  122. Transaction Isolation Levels https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html https://www.postgresql.org/docs/current/static/transaction-iso.html

  123. Avoid unnecessary callbacks

  124. None
  125. Example You can touch http://api.rubyonrails.org/classes/ActiveRecord/NoTouching/ClassMethods.html#method-i-no_touching after migration

  126. Process multiple records in one job

  127. None
  128. Cache data in Memory

  129. Cache data in Redis

  130. Migrate Important things first

  131. First 10000 users w/ most recipes

  132. IO bound

  133. Scale up the Database

  134. Decrease the workers /0

  135. Bulk Insert Bulk Upsert* * Only MySQL supports bulk upsert

    zdennis/activerecord-import
  136. Every change to make it fast

  137. Run the WHOLE migration again

  138. Keep CPU usage max at 75% all

  139. ~ Post Migration

  140. Update all necessities

  141. Redirects

  142. Redirect tables Cookpad Redirect programs Server redirects Provider

  143. Redirection Service cookpad/mirin

  144. ~ Stories

  145. Cases of Email

  146. Remove duplicate emails before migration

  147. Remove invalid emails before migration

  148. downcase all the emails

  149. ~ Get Site Dump

  150. ~ 100GB generated on EC2* EC2 has bandwidth limits

  151. ~ scp takes days ONLY if nothing failed within days

  152. ~ delivers encrypted disk

  153. Migrate Millions of records

  154. AR + transaction bulk in/upsert activerecord-import load data in file

  155. Weeks Month-ish

  156. Run low priority job to migrate them

  157. When migrated User signed in

  158. Migrate their data in high priority

  159. None
  160. Migrate 100K photos

  161. How our image work

  162. Design so it produces the same hash 4

  163. Set the designated hash during migration instead of upload, generate

    hash
  164. Benchmark how long to finish all

  165. X days?

  166. Migrate them X days before in low priority

  167. 99% photos won’t change

  168. Migrate users password to secure auth

  169. Figure out what algorithm(s) was(ere) used

  170. When migrated user signed in

  171. System’s password auth will fail

  172. Fallback to Legacy Auth

  173. None
  174. When password matched from legacy auth

  175. Set his password through secure password scheme

  176. None
  177. Migration ~ The Future

  178. Migration done in Ghost Table fashion

  179. Data Dump & Generic Migration Code Only need to modeling

    database
  180. ~ Takeaways

  181. Rails provides sharp tools thanks to rails core team

  182. Use Small objects to make your code more readable &

    maintainable
  183. Abstraction is the God of Programming! “ ” — Matthew

    Mongeau @halogenandtoast
  184. Schedule >> Fast

  185. Schedule >> Fast Integrity >> schedule

  186. Data Migration sounds hard

  187. Keep it Simple Made it Easy

  188. Do the Simplest Things “ ” — Winston Teo Yon

    Wei @winstonyw
  189. Enjoy ☕ Thank you!