Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Data Migration with Confidence Juanito Fatas RedDotRubyConf 2017
Slide 2
Slide 2 text
@JuanitoFatas Ramen Specialist
Slide 3
Slide 3 text
Spanish name From Taiwan Live in Tokyo
Slide 4
Slide 4 text
I became a salaryman #
Slide 5
Slide 5 text
Cookpad Global Cookpad Japan
Slide 6
Slide 6 text
Made in Japan $
Slide 7
Slide 7 text
Reserve for local jokes %
Slide 8
Slide 8 text
Data Migration?
Slide 9
Slide 9 text
Schema Migration Data Migration
Slide 10
Slide 10 text
Schema Migration Alter Schemas over time https://en.wikipedia.org/wiki/Schema_migration
Slide 11
Slide 11 text
Data Migration https://en.wikipedia.org/wiki/Data_migration Transfer data from to A System B System
Slide 12
Slide 12 text
Existing Data Migration
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
Data Migration to Existing System
Slide 15
Slide 15 text
Data Migration with Confidence @JuanitoFatas Specialist of cookpad RedDotRubyConf 2017
Slide 16
Slide 16 text
Why Migration?
Slide 17
Slide 17 text
Rewrote for Clients
Slide 18
Slide 18 text
New Partner joins company
Slide 19
Slide 19 text
Data Migration
Slide 20
Slide 20 text
~ Get All Data to our system Simple Goal
Slide 21
Slide 21 text
~ Import data Modeling Migrate After migrate
Slide 22
Slide 22 text
~ Get the Data Provider API Data Dump
Slide 23
Slide 23 text
Provider API & Generic Migration Code ✅
Slide 24
Slide 24 text
Data Dump & Generic Migration Code
Slide 25
Slide 25 text
HOWTO Data Migration
Slide 26
Slide 26 text
~ Start with a rake task
Slide 27
Slide 27 text
~
Slide 28
Slide 28 text
~ lib/tasks/data_migration.rake
Slide 29
Slide 29 text
~ lib/data_migration.rb
Slide 30
Slide 30 text
~ Import dump to local
Slide 31
Slide 31 text
~ GBs-size file
Slide 32
Slide 32 text
monthly users 62 30M countries ~
Slide 33
Slide 33 text
~ Add delay to the SQL dump for production
Slide 34
Slide 34 text
~ Add sleep() before INSERT INTO
Slide 35
Slide 35 text
~ Editing huge file
Slide 36
Slide 36 text
~ +
Slide 37
Slide 37 text
~
Slide 38
Slide 38 text
~ Enumerable#lazy https://ruby-doc.org/core-2.4.1/Enumerable.html#method-i-lazy
Slide 39
Slide 39 text
~
Slide 40
Slide 40 text
~
Slide 41
Slide 41 text
~ set accordingly for staging & production
Slide 42
Slide 42 text
~ Modeling Database
Slide 43
Slide 43 text
~
Slide 44
Slide 44 text
~ With these 5 methods, you can model anything.
Slide 45
Slide 45 text
~ Map data to your current system
Slide 46
Slide 46 text
~ Sometimes as easy as
Slide 47
Slide 47 text
~ As HTML in the recipes table field ‘steps’ , Sometimes…
Slide 48
Slide 48 text
~
Slide 49
Slide 49 text
~
Slide 50
Slide 50 text
~ Setup Test Suite
Slide 51
Slide 51 text
~
Slide 52
Slide 52 text
~
Slide 53
Slide 53 text
~ - Why Tests? The migration code only used once
Slide 54
Slide 54 text
~ Better code through boring tests
Slide 55
Slide 55 text
~ TDD to Get Things Done
Slide 56
Slide 56 text
~ Modeling Tests Repeat
Slide 57
Slide 57 text
Migration
Slide 58
Slide 58 text
~ Use all methods that raises exception
Slide 59
Slide 59 text
~ Fail Fast to find all errors
Slide 60
Slide 60 text
~ Example Migrate Recipes
Slide 61
Slide 61 text
~
Slide 62
Slide 62 text
~
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
~
Slide 65
Slide 65 text
~ Add more migrators to migrate
Slide 66
Slide 66 text
~ Data Integrity
Slide 67
Slide 67 text
~ Transaction
Slide 68
Slide 68 text
~ Idempotent Operation
Slide 69
Slide 69 text
~
Slide 70
Slide 70 text
~ Run Migration many times
Slide 71
Slide 71 text
~ Produce the Same Result
Slide 72
Slide 72 text
~ f(x) = f(x)
Slide 73
Slide 73 text
~ Upsert Update or Insert
Slide 74
Slide 74 text
~ MySQL ON DUPLICATE KEY UPDATE PostgreSQL ON CONFLICT UPDATE PostgreSQL 9.5+ seamusabshere/upsert
Slide 75
Slide 75 text
~
Slide 76
Slide 76 text
~ Data Accuracy
Slide 77
Slide 77 text
~ Manually Check
Slide 78
Slide 78 text
~ Automated Check
Slide 79
Slide 79 text
~ Example Check users with most Recipes
Slide 80
Slide 80 text
~
Slide 81
Slide 81 text
~
Slide 82
Slide 82 text
~
Slide 83
Slide 83 text
~
Slide 84
Slide 84 text
~
Slide 85
Slide 85 text
~ To check more things Add more Checker objects
Slide 86
Slide 86 text
~ Use many small objects to compose
Slide 87
Slide 87 text
~ Objects Everywhere
Slide 88
Slide 88 text
For better object design
Slide 89
Slide 89 text
For better object design
Slide 90
Slide 90 text
Background Jobs
Slide 91
Slide 91 text
Workers = CPU cores
Slide 92
Slide 92 text
Designated Queues
Slide 93
Slide 93 text
No content
Slide 94
Slide 94 text
No content
Slide 95
Slide 95 text
Log Every Unexpected Error
Slide 96
Slide 96 text
No content
Slide 97
Slide 97 text
No content
Slide 98
Slide 98 text
For Better handling of Errors
Slide 99
Slide 99 text
Run against all data to be migrated
Slide 100
Slide 100 text
Fix every error you can before real migration
Slide 101
Slide 101 text
~ Tools
Slide 102
Slide 102 text
Retry mechanism
Slide 103
Slide 103 text
Foreign Key Constraints Locks
Slide 104
Slide 104 text
MySQL deadlock Results in
Slide 105
Slide 105 text
Automatic Retry # Rails 4: ActiveRecord::StatementInvalid
Slide 106
Slide 106 text
Make sure what should be Retry
Slide 107
Slide 107 text
retry_on discard_on ActiveJob::Exceptions
Slide 108
Slide 108 text
Automatic Retry
Slide 109
Slide 109 text
Examine & Retry
Slide 110
Slide 110 text
In Resque
Slide 111
Slide 111 text
No content
Slide 112
Slide 112 text
Status Reporting
Slide 113
Slide 113 text
No content
Slide 114
Slide 114 text
No content
Slide 115
Slide 115 text
Report every minute
Slide 116
Slide 116 text
Monitoring CPU Usage
Slide 117
Slide 117 text
No content
Slide 118
Slide 118 text
~ Performance
Slide 119
Slide 119 text
Performance is a Rabbit hole
Slide 120
Slide 120 text
Preload associations
Slide 121
Slide 121 text
Minimize scope of transaction
Slide 122
Slide 122 text
Transaction Isolation Levels https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html https://www.postgresql.org/docs/current/static/transaction-iso.html
Slide 123
Slide 123 text
Avoid unnecessary callbacks
Slide 124
Slide 124 text
No content
Slide 125
Slide 125 text
Example You can touch http://api.rubyonrails.org/classes/ActiveRecord/NoTouching/ClassMethods.html#method-i-no_touching after migration
Slide 126
Slide 126 text
Process multiple records in one job
Slide 127
Slide 127 text
No content
Slide 128
Slide 128 text
Cache data in Memory
Slide 129
Slide 129 text
Cache data in Redis
Slide 130
Slide 130 text
Migrate Important things first
Slide 131
Slide 131 text
First 10000 users w/ most recipes
Slide 132
Slide 132 text
IO bound
Slide 133
Slide 133 text
Scale up the Database
Slide 134
Slide 134 text
Decrease the workers /0
Slide 135
Slide 135 text
Bulk Insert Bulk Upsert* * Only MySQL supports bulk upsert zdennis/activerecord-import
Slide 136
Slide 136 text
Every change to make it fast
Slide 137
Slide 137 text
Run the WHOLE migration again
Slide 138
Slide 138 text
Keep CPU usage max at 75% all
Slide 139
Slide 139 text
~ Post Migration
Slide 140
Slide 140 text
Update all necessities
Slide 141
Slide 141 text
Redirects
Slide 142
Slide 142 text
Redirect tables Cookpad Redirect programs Server redirects Provider
Slide 143
Slide 143 text
Redirection Service cookpad/mirin
Slide 144
Slide 144 text
~ Stories
Slide 145
Slide 145 text
Cases of Email
Slide 146
Slide 146 text
Remove duplicate emails before migration
Slide 147
Slide 147 text
Remove invalid emails before migration
Slide 148
Slide 148 text
downcase all the emails
Slide 149
Slide 149 text
~ Get Site Dump
Slide 150
Slide 150 text
~ 100GB generated on EC2* EC2 has bandwidth limits
Slide 151
Slide 151 text
~ scp takes days ONLY if nothing failed within days
Slide 152
Slide 152 text
~ delivers encrypted disk
Slide 153
Slide 153 text
Migrate Millions of records
Slide 154
Slide 154 text
AR + transaction bulk in/upsert activerecord-import load data in file
Slide 155
Slide 155 text
Weeks Month-ish
Slide 156
Slide 156 text
Run low priority job to migrate them
Slide 157
Slide 157 text
When migrated User signed in
Slide 158
Slide 158 text
Migrate their data in high priority
Slide 159
Slide 159 text
No content
Slide 160
Slide 160 text
Migrate 100K photos
Slide 161
Slide 161 text
How our image work
Slide 162
Slide 162 text
Design so it produces the same hash 4
Slide 163
Slide 163 text
Set the designated hash during migration instead of upload, generate hash
Slide 164
Slide 164 text
Benchmark how long to finish all
Slide 165
Slide 165 text
X days?
Slide 166
Slide 166 text
Migrate them X days before in low priority
Slide 167
Slide 167 text
99% photos won’t change
Slide 168
Slide 168 text
Migrate users password to secure auth
Slide 169
Slide 169 text
Figure out what algorithm(s) was(ere) used
Slide 170
Slide 170 text
When migrated user signed in
Slide 171
Slide 171 text
System’s password auth will fail
Slide 172
Slide 172 text
Fallback to Legacy Auth
Slide 173
Slide 173 text
No content
Slide 174
Slide 174 text
When password matched from legacy auth
Slide 175
Slide 175 text
Set his password through secure password scheme
Slide 176
Slide 176 text
No content
Slide 177
Slide 177 text
Migration ~ The Future
Slide 178
Slide 178 text
Migration done in Ghost Table fashion
Slide 179
Slide 179 text
Data Dump & Generic Migration Code Only need to modeling database
Slide 180
Slide 180 text
~ Takeaways
Slide 181
Slide 181 text
Rails provides sharp tools thanks to rails core team
Slide 182
Slide 182 text
Use Small objects to make your code more readable & maintainable
Slide 183
Slide 183 text
Abstraction is the God of Programming! “ ” — Matthew Mongeau @halogenandtoast
Slide 184
Slide 184 text
Schedule >> Fast
Slide 185
Slide 185 text
Schedule >> Fast Integrity >> schedule
Slide 186
Slide 186 text
Data Migration sounds hard
Slide 187
Slide 187 text
Keep it Simple Made it Easy
Slide 188
Slide 188 text
Do the Simplest Things “ ” — Winston Teo Yon Wei @winstonyw
Slide 189
Slide 189 text
Enjoy ☕ Thank you!