Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Square Stays Up
Search
pui
July 18, 2012
Technology
3
280
How Square Stays Up
I talk I gave on the tools and processes Square uses to stay stable and available
pui
July 18, 2012
Tweet
Share
Other Decks in Technology
See All in Technology
はじめての転職講座/The Guide of First Career Change
kwappa
5
4.3k
ユーザー課題を愛し抜く――AI時代のPdM価値
kakehashi
PRO
1
130
AI関数が早くなったので試してみよう
kumakura
0
330
Amazon Qで2Dゲームを作成してみた
siromi
0
160
生成AIによるソフトウェア開発の収束地点 - Hack Fes 2025
vaaaaanquish
34
15k
UDDのススメ - 拡張版 -
maguroalternative
1
600
プロジェクトマネジメントは不確実性との対話だ
hisashiwatanabe
0
100
GISエンジニアよ 現場に行け!
sudataka
1
140
全員が手を動かす組織へ - 生成AIが変えるTVerの開発現場 / everyone-codes-genai-transforms-tver-development
tohae
0
230
LLM 機能を支える Langfuse / ClickHouse のサーバレス化
yuu26
9
2.5k
九州の人に知ってもらいたいGISスポット / gis spot in kyushu 2025
sakaik
0
180
Autonomous Database Serverless 技術詳細 / adb-s_technical_detail_jp
oracle4engineer
PRO
18
52k
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
460
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Docker and Python
trallard
45
3.5k
Building Applications with DynamoDB
mza
96
6.5k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.3k
The World Runs on Bad Software
bkeepers
PRO
70
11k
Making Projects Easy
brettharned
117
6.3k
Faster Mobile Websites
deanohume
309
31k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
A designer walks into a library…
pauljervisheath
207
24k
Transcript
How Square Stays Up Tools and Processes Square Uses to
Maintain Stability and Availability
@pui_ling Erica Kwan
1 2 3 4 Developing Deploying Monitoring On-calling
Developing 1
We pair program (sometimes)
We solo, then get a code review (other times)
Why?
PCI Compliance Read all about it: http://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard
It is also good practice
git checkout -b topic-branch do work* git checkout master git
merge --no-ff topic-branch
A clean commit history helps
A super good git workflow: http://sandofsky.com/blog/git-workflow.html
git rebase --interactive
git rebase protip: config rebase.autosquash = true
git commit -m “squash! Monkeys”
pick 8374d8e Monkeys squash 8374d8e squash! Monkeys pick 259a7e6 Better
monkeys
Deploying 2
We deploy lots
but there are processes around deploys
Some history
We do canary deploys
None
Our full deploys do rolling restarts
And automatically run integration tests
Monitoring 3
We use common monitoring tools
We have application level checks
We have custom metrics dashboards
Graphite (whisper) + Cubism.js http://square.github.com/cubism/ http://d3js.org/ More info:
Horizon Graph http://vis.berkeley.edu/papers/horizon/
None
On-Calling 4
Engineers are responsible for their work
Ad-hoc at first
First real on-call rotations were simple
Original escalation path:
Engineer 1
Engineer 2
@jack
General on-call could not be responsible for everything
Now, every engineering team has an on-call rotation
Process is still evolving
Do these 4 things well all the time
@pui_ling /pui