Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data Migration with Confidence
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Juanito Fatas
June 22, 2017
Programming
3
940
Data Migration with Confidence
At RedDotRubyConf 2017
Juanito Fatas
June 22, 2017
Tweet
Share
More Decks by Juanito Fatas
See All by Juanito Fatas
My Open Source Journey
juanitofatas
1
3.2k
NSDanger
juanitofatas
1
180
How to build deppbot
juanitofatas
3
610
Introducing Danger
juanitofatas
0
350
Twemoji 3.0 in the making and announcement beyond SG50
juanitofatas
0
790
Continuous Updates
juanitofatas
0
150
Ruby Asia and dat bacon cannon
juanitofatas
1
270
Update Early, Update Often
juanitofatas
1
1.1k
RSpec for Practical Rubyist
juanitofatas
11
850
Other Decks in Programming
See All in Programming
AIフル活用時代だからこそ学んでおきたい働き方の心得
shinoyu
0
130
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
610
ぼくの開発環境2026
yuzneri
0
220
FOSDEM 2026: STUNMESH-go: Building P2P WireGuard Mesh Without Self-Hosted Infrastructure
tjjh89017
0
170
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
16
6.1k
OSSとなったswift-buildで Xcodeのビルドを差し替えられるため 自分でXcodeを直せる時代になっている ダイアモンド問題編
yimajo
3
620
CSC307 Lecture 09
javiergs
PRO
1
830
責任感のあるCloudWatchアラームを設計しよう
akihisaikeda
3
170
Best-Practices-for-Cortex-Analyst-and-AI-Agent
ryotaroikeda
1
100
0→1 フロントエンド開発 Tips🚀 #レバテックMeetup
bengo4com
0
560
Honoを使ったリモートMCPサーバでAIツールとの連携を加速させる!
tosuri13
1
180
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
1
230
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
92
Building AI with AI
inesmontani
PRO
1
690
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
57
50k
Evolving SEO for Evolving Search Engines
ryanjones
0
120
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
730
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.1k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Exploring anti-patterns in Rails
aemeredith
2
250
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
450
Raft: Consensus for Rubyists
vanstee
141
7.3k
Transcript
Data Migration with Confidence Juanito Fatas RedDotRubyConf 2017
@JuanitoFatas Ramen Specialist
Spanish name From Taiwan Live in Tokyo
I became a salaryman #
Cookpad Global Cookpad Japan
Made in Japan $
Reserve for local jokes %
Data Migration?
Schema Migration Data Migration
Schema Migration Alter Schemas over time https://en.wikipedia.org/wiki/Schema_migration
Data Migration https://en.wikipedia.org/wiki/Data_migration Transfer data from to A System B
System
Existing Data Migration
None
Data Migration to Existing System
Data Migration with Confidence @JuanitoFatas Specialist of cookpad RedDotRubyConf 2017
Why Migration?
Rewrote for Clients
New Partner joins company
Data Migration
~ Get All Data to our system Simple Goal
~ Import data Modeling Migrate After migrate
~ Get the Data Provider API Data Dump
Provider API & Generic Migration Code ✅
Data Dump & Generic Migration Code
HOWTO Data Migration
~ Start with a rake task
~
~ lib/tasks/data_migration.rake
~ lib/data_migration.rb
~ Import dump to local
~ GBs-size file
monthly users 62 30M countries ~
~ Add delay to the SQL dump for production
~ Add sleep() before INSERT INTO
~ Editing huge file
~ +
~
~ Enumerable#lazy https://ruby-doc.org/core-2.4.1/Enumerable.html#method-i-lazy
~
~
~ set accordingly for staging & production
~ Modeling Database
~
~ With these 5 methods, you can model anything.
~ Map data to your current system
~ Sometimes as easy as
~ As HTML in the recipes table field ‘steps’ ,
Sometimes…
~
~
~ Setup Test Suite
~
~
~ - Why Tests? The migration code only used once
~ Better code through boring tests
~ TDD to Get Things Done
~ Modeling Tests Repeat
Migration
~ Use all methods that raises exception
~ Fail Fast to find all errors
~ Example Migrate Recipes
~
~
None
~
~ Add more migrators to migrate
~ Data Integrity
~ Transaction
~ Idempotent Operation
~
~ Run Migration many times
~ Produce the Same Result
~ f(x) = f(x)
~ Upsert Update or Insert
~ MySQL ON DUPLICATE KEY UPDATE PostgreSQL ON CONFLICT UPDATE
PostgreSQL 9.5+ seamusabshere/upsert
~
~ Data Accuracy
~ Manually Check
~ Automated Check
~ Example Check users with most Recipes
~
~
~
~
~
~ To check more things Add more Checker objects
~ Use many small objects to compose
~ Objects Everywhere
For better object design
For better object design
Background Jobs
Workers = CPU cores
Designated Queues
None
None
Log Every Unexpected Error
None
None
For Better handling of Errors
Run against all data to be migrated
Fix every error you can before real migration
~ Tools
Retry mechanism
Foreign Key Constraints Locks
MySQL deadlock Results in
Automatic Retry # Rails 4: ActiveRecord::StatementInvalid
Make sure what should be Retry
retry_on discard_on ActiveJob::Exceptions
Automatic Retry
Examine & Retry
In Resque
None
Status Reporting
None
None
Report every minute
Monitoring CPU Usage
None
~ Performance
Performance is a Rabbit hole
Preload associations
Minimize scope of transaction
Transaction Isolation Levels https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html https://www.postgresql.org/docs/current/static/transaction-iso.html
Avoid unnecessary callbacks
None
Example You can touch http://api.rubyonrails.org/classes/ActiveRecord/NoTouching/ClassMethods.html#method-i-no_touching after migration
Process multiple records in one job
None
Cache data in Memory
Cache data in Redis
Migrate Important things first
First 10000 users w/ most recipes
IO bound
Scale up the Database
Decrease the workers /0
Bulk Insert Bulk Upsert* * Only MySQL supports bulk upsert
zdennis/activerecord-import
Every change to make it fast
Run the WHOLE migration again
Keep CPU usage max at 75% all
~ Post Migration
Update all necessities
Redirects
Redirect tables Cookpad Redirect programs Server redirects Provider
Redirection Service cookpad/mirin
~ Stories
Cases of Email
Remove duplicate emails before migration
Remove invalid emails before migration
downcase all the emails
~ Get Site Dump
~ 100GB generated on EC2* EC2 has bandwidth limits
~ scp takes days ONLY if nothing failed within days
~ delivers encrypted disk
Migrate Millions of records
AR + transaction bulk in/upsert activerecord-import load data in file
Weeks Month-ish
Run low priority job to migrate them
When migrated User signed in
Migrate their data in high priority
None
Migrate 100K photos
How our image work
Design so it produces the same hash 4
Set the designated hash during migration instead of upload, generate
hash
Benchmark how long to finish all
X days?
Migrate them X days before in low priority
99% photos won’t change
Migrate users password to secure auth
Figure out what algorithm(s) was(ere) used
When migrated user signed in
System’s password auth will fail
Fallback to Legacy Auth
None
When password matched from legacy auth
Set his password through secure password scheme
None
Migration ~ The Future
Migration done in Ghost Table fashion
Data Dump & Generic Migration Code Only need to modeling
database
~ Takeaways
Rails provides sharp tools thanks to rails core team
Use Small objects to make your code more readable &
maintainable
Abstraction is the God of Programming! “ ” — Matthew
Mongeau @halogenandtoast
Schedule >> Fast
Schedule >> Fast Integrity >> schedule
Data Migration sounds hard
Keep it Simple Made it Easy
Do the Simplest Things “ ” — Winston Teo Yon
Wei @winstonyw
Enjoy ☕ Thank you!