Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Testing Rails at Scale
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Emil Stolarsky
May 04, 2016
Technology
4.8k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Testing Rails at Scale
Emil Stolarsky
May 04, 2016
More Decks by Emil Stolarsky
See All by Emil Stolarsky
How Not to Go Boom: Lessons for SREs from Oil Refineries
es
0
96
Incident insights from NASA, NTSB, and the CDC
es
0
640
Flash Sale Engineering
es
0
97
Other Decks in Technology
See All in Technology
AI Agentをシステムに組み込む前にゆるく向き合ってみる
hayama17
0
170
AIチャットの改善から見えた、良いAI体験とは / What Constitutes a Good AI Experience: Insights from Improving AI Chat
kubode
0
130
「軸足」は 固定しなくていい - 熱量と強みで描く、しなやかなキャリアの形
kakehashi
PRO
1
280
Zenoh on Zephyr on LiteX
takasehideki
2
130
Hatena Engineer Seminar 37 jj1uzh
jj1uzh
0
150
OTel × Datadog で 「AI活用」を計測し、改善に繋げる
shihochan
2
1.1k
Flow 不死:AI 時代 DevOps 的不變本質
cheng_wei_chen
2
550
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
150
40代で“やっとエンジニアになれた”――閉じた学びを開き、空の青さを知る / 20260628 Naoki Takahashi
shift_evolve
PRO
4
1.1k
組織における AI-DLC 実践
askul
0
160
MySQL & MySQL HeatWave Report - June 2026
freshdaz
0
200
AWS Summit の片隅で、体育座りしながらコミュニティがにぎわう理由を考えた
k_adachi_01
2
210
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
174
15k
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
210
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
201
75k
Context Engineering - Making Every Token Count
addyosmani
9
990
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.5k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
170
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Skip the Path - Find Your Career Trail
mkilby
1
150
Faster Mobile Websites
deanohume
310
32k
Leo the Paperboy
mayatellez
7
1.9k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
150
Transcript
Testing Rails at Scale BY @E MILS TO LARSK Y
2
3 Shopify 243,000+ S H O P S $14B+ TOTA
L G M V 300M+ U N I Q U E V I S I T S / M O N T H 1000+ E M P LOY E E S
4 CI Systems
5 Scheduler Compute
6 Scheduler Compute
7 Scheduler Compute
8 Managed Provider
9 Managed Provider • Multi-tenant • Closed system • Examples
– CircleCI, Codeship, Hosted TravisCI
10 Unmanaged Provider
11 Unmanaged Provider • Self-hosted • Open system • Examples
– Jenkins, TravisCI, Strider
12 Daily CI Stats 50,000+ C O N TA I
N E R S B O OT E D F O R T E S T I N G 700 B U I L D S 42,000+ T E S T S P E R B U I L D 5 min B U I L D T I M E
13 Shopify using a Hosted Provider • 20+ minute build
times • Flakiness from resource starvation • Expensive
14 A N EW HOPE
15 Beginning of a Journey • Bring build times under
5 minutes • Restore confidence in our CI • Maintain current budget
16
17 c4.8xlarge Webhooks Code push Agent Instructions
18 Compute Cluster 5.4 TB M E M O R
Y 3240 C P U C O R E S 90 F L E E T S I Z E AT P E A K c4.8xlarge I N S TA N C E T Y P E
19 Instances • AWS Hosted • Managed with Chef •
Memory bound • IO Optimizations
20 SCROOGE
21 Auto Scaling with Scrooge c4.8xlarge Capacity Requirements Scrooge Boot/Shutdown
Nodes c4.8xlarge
• AWS specific optimizations • Improve utilization • Not one
size fits all 22 Optimizing Cost
23 Graphing Productivity Active Buildkite Agents
24 Graphing Productivity Active Buildkite Agents ? ? ?
25 Graphing Productivity Active Buildkite Agents ? ? Lunch rush
#1
26 Graphing Productivity Active Buildkite Agents ? Commit + Push
Lunch rush #1
27 Graphing Productivity Active Buildkite Agents Lunch rush #2 Commit
+ Push Lunch rush #1
28 Docker • Boot speedup • Test isolation • Distribution
29 Building Containers with Locutus • Implements custom docker build
API • Single EC2 machine • Forced debt repayment
30 Test Distribution • Tests allocated based on container index
• Ruby tests and browser tests are run on seperate containers • Outliers inflated build times
31 Artifacts • Artifacts are uploaded to S3 by Buildkite
Agents • Events log into Kafka & StatsD • Data tools are used to identify flaky tests
32 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Pull Revision
D OC K ER S T R IK ES BAC
K
34 Rebel base is under Attack • Shipping second provider
brought confusion • Locutus capacity issues • Tests times were still high
35 Battling Confusion • Botched rollout • Instability further eroded
developer confidence
36 Clustering Locutus • Make it linearly scalable • Keep
it stateless(-ish)
37 Locutus Diagram Worker Worker Worker Worker Worker Pool Cache
Ring Coordinator Docker Registry Container push New containers
38 Test Distribution v2 • Loads all tests into Redis
• Containers pull work off queue • No more container specialization
39 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Code push webhook
40 RETU RN OF TH E STA BLE B UI
LD
41 Docker • No one tests starting 10,000’s of containers/day
• Instability further eroded developer confidence • Every new version of docker had major bugs
42 Handling Infrastructure Failures • At non-trivial scale, you’re guaranteed
failures • Swallow infrastructure failures, never test failures • We still see 100+ container failures a day
43 Treating Servers as Pets 1. Wait for reports to
stream in of build issues 2. Flag node as in maintenance 3. Manually take node out of rotation 4. ssh into the node and follow playbook steps to cleanup disk
44 Treating Servers as Cattle 1. Auto detect the failures
2. Node removes itself from rotation 3. Node runs script to cleanup disk
45 I love the internet.
46
47
48 Test Distribution v3 • Containers record the tests they
ran • Allow flakey tests to be rerun • Ensure no tests are lost
49 Capacity Requirements Scrooge Boot/Shutdown Nodes Agent Instructions Webhooks Pull
Containers Code push webhook
CONCU LSO N
51 Don’t build your own CI • Build times <10
minutes • Small application
52 Build your own CI • Build times >15 minutes
• Monolithic Application • Parallelization Limits
53 Lessons Learned • Commit 100% • Beware of Rabbit
holes • Pets vs. Cattle
54 Blank Slide Thanks! Fo llow m e o n
Tw it te r @Em ilSt ol arsky
55 Credits • Image of shipping containers: https://goo.gl/bXCn1X, https://goo.gl/cDDnYy •
Images of Google DCs: https://goo.gl/UHVRc • Image of bank vault: https://goo.gl/fFN5EJ • Locutus: http://goo.gl/UyoJxx • Warehouse: https://goo.gl/5DiiR1 • Egyptian Temple: https://goo.gl/GjbLcq • Star wars: http://goo.gl/474wYG • Sinking container ship: http://goo.gl/U7rdR8, http://goo.gl/wlzlrm • Cats: http://goo.gl/9p2JXo, https://goo.gl/Ylhl60 • Cattle: http://goo.gl/IBdXmx • Star Wars: http://goo.gl/LatPEj • Creative Commons License: https://goo.gl/sZ7V7x