Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Testing Rails at Scale
Emil Stolarsky
May 04, 2016
Technology
2
3.9k
Testing Rails at Scale
Emil Stolarsky
May 04, 2016
Tweet
Share
More Decks by Emil Stolarsky
See All by Emil Stolarsky
How Not to Go Boom: Lessons for SREs from Oil Refineries
es
0
34
Incident insights from NASA, NTSB, and the CDC
es
0
220
Flash Sale Engineering
es
0
46
Other Decks in Technology
See All in Technology
バッファープールが大きいMySQL v5.7でDROP DATABASEが詰まった原因と対策 / Causes and Remedies for DROP DATABASE Stuck in MySQL v5.7 with Large Buffer Pool
line_developers
PRO
4
830
Istioを活用したセキュアなマイクロサービスの実現/Secure Microservices with Istio
ido_kara_deru
3
430
Oracle Cloud Infrastructure:2022年7月度サービス・アップデート
oracle4engineer
PRO
0
200
cobra は便利になっている
nwiizo
0
140
2022 COSCUP - GKE Backend Cluster 除雷分享
brentchang
0
120
聊聊 Cgo 的二三事
david74chou
0
330
CityGMLとFBXの連携で地理空間のエンタメ化
soh_mitian
0
750
疎ベクトル検索と密ベクトル検索: 第68回 Machine Learning 15minutes! Broadcast
keyakkie
1
250
20220731 如何跟隨開源技術保持你的職涯發展
pichuang
0
120
Micro frontends and micro services
kashif98
0
150
質の良い”カイゼン”の為の質の良い「振り返り」
shirayanagiryuji
0
130
PMMやプロダクト関係者と協働するために役割を整理した話 / 20220810_pdmtipslt
rakus_dev
0
120
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
239
11k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
6
570
The Web Native Designer (August 2011)
paulrobertlloyd
75
2k
Designing for humans not robots
tammielis
242
24k
Building a Scalable Design System with Sketch
lauravandoore
448
30k
Reflections from 52 weeks, 52 projects
jeffersonlam
337
17k
Building Better People: How to give real-time feedback that sticks.
wjessup
344
17k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
14
3.8k
Stop Working from a Prison Cell
hatefulcrawdad
262
17k
Designing Experiences People Love
moore
130
22k
Principles of Awesome APIs and How to Build Them.
keavy
113
15k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
212
20k
Transcript
Testing Rails at Scale BY @E MILS TO LARSK Y
2
3 Shopify 243,000+ S H O P S $14B+ TOTA
L G M V 300M+ U N I Q U E V I S I T S / M O N T H 1000+ E M P LOY E E S
4 CI Systems
5 Scheduler Compute
6 Scheduler Compute
7 Scheduler Compute
8 Managed Provider
9 Managed Provider • Multi-tenant • Closed system • Examples
– CircleCI, Codeship, Hosted TravisCI
10 Unmanaged Provider
11 Unmanaged Provider • Self-hosted • Open system • Examples
– Jenkins, TravisCI, Strider
12 Daily CI Stats 50,000+ C O N TA I
N E R S B O OT E D F O R T E S T I N G 700 B U I L D S 42,000+ T E S T S P E R B U I L D 5 min B U I L D T I M E
13 Shopify using a Hosted Provider • 20+ minute build
times • Flakiness from resource starvation • Expensive
14 A N EW HOPE
15 Beginning of a Journey • Bring build times under
5 minutes • Restore confidence in our CI • Maintain current budget
16
17 c4.8xlarge Webhooks Code push Agent Instructions
18 Compute Cluster 5.4 TB M E M O R
Y 3240 C P U C O R E S 90 F L E E T S I Z E AT P E A K c4.8xlarge I N S TA N C E T Y P E
19 Instances • AWS Hosted • Managed with Chef •
Memory bound • IO Optimizations
20 SCROOGE
21 Auto Scaling with Scrooge c4.8xlarge Capacity Requirements Scrooge Boot/Shutdown
Nodes c4.8xlarge
• AWS specific optimizations • Improve utilization • Not one
size fits all 22 Optimizing Cost
23 Graphing Productivity Active Buildkite Agents
24 Graphing Productivity Active Buildkite Agents ? ? ?
25 Graphing Productivity Active Buildkite Agents ? ? Lunch rush
#1
26 Graphing Productivity Active Buildkite Agents ? Commit + Push
Lunch rush #1
27 Graphing Productivity Active Buildkite Agents Lunch rush #2 Commit
+ Push Lunch rush #1
28 Docker • Boot speedup • Test isolation • Distribution
29 Building Containers with Locutus • Implements custom docker build
API • Single EC2 machine • Forced debt repayment
30 Test Distribution • Tests allocated based on container index
• Ruby tests and browser tests are run on seperate containers • Outliers inflated build times
31 Artifacts • Artifacts are uploaded to S3 by Buildkite
Agents • Events log into Kafka & StatsD • Data tools are used to identify flaky tests
32 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Pull Revision
D OC K ER S T R IK ES BAC
K
34 Rebel base is under Attack • Shipping second provider
brought confusion • Locutus capacity issues • Tests times were still high
35 Battling Confusion • Botched rollout • Instability further eroded
developer confidence
36 Clustering Locutus • Make it linearly scalable • Keep
it stateless(-ish)
37 Locutus Diagram Worker Worker Worker Worker Worker Pool Cache
Ring Coordinator Docker Registry Container push New containers
38 Test Distribution v2 • Loads all tests into Redis
• Containers pull work off queue • No more container specialization
39 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Code push webhook
40 RETU RN OF TH E STA BLE B UI
LD
41 Docker • No one tests starting 10,000’s of containers/day
• Instability further eroded developer confidence • Every new version of docker had major bugs
42 Handling Infrastructure Failures • At non-trivial scale, you’re guaranteed
failures • Swallow infrastructure failures, never test failures • We still see 100+ container failures a day
43 Treating Servers as Pets 1. Wait for reports to
stream in of build issues 2. Flag node as in maintenance 3. Manually take node out of rotation 4. ssh into the node and follow playbook steps to cleanup disk
44 Treating Servers as Cattle 1. Auto detect the failures
2. Node removes itself from rotation 3. Node runs script to cleanup disk
45 I love the internet.
46
47
48 Test Distribution v3 • Containers record the tests they
ran • Allow flakey tests to be rerun • Ensure no tests are lost
49 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Code push webhook
CONCU LSO N
51 Don’t build your own CI • Build times <10
minutes • Small application
52 Build your own CI • Build times >15 minutes
• Monolithic Application • Parallelization Limits
53 Lessons Learned • Commit 100% • Beware of Rabbit
holes • Pets vs. Cattle
54 Blank Slide Thanks! Fo llow m e o n
Tw it te r @Em ilSt ol arsky
55 Credits • Image of shipping containers: https://goo.gl/bXCn1X, https://goo.gl/cDDnYy •
Images of Google DCs: https://goo.gl/UHVRc • Image of bank vault: https://goo.gl/fFN5EJ • Locutus: http://goo.gl/UyoJxx • Warehouse: https://goo.gl/5DiiR1 • Egyptian Temple: https://goo.gl/GjbLcq • Star wars: http://goo.gl/474wYG • Sinking container ship: http://goo.gl/U7rdR8, http://goo.gl/wlzlrm • Cats: http://goo.gl/9p2JXo, https://goo.gl/Ylhl60 • Cattle: http://goo.gl/IBdXmx • Star Wars: http://goo.gl/LatPEj • Creative Commons License: https://goo.gl/sZ7V7x