Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Square Stays Up
Search
pui
July 18, 2012
Technology
3
280
How Square Stays Up
I talk I gave on the tools and processes Square uses to stay stable and available
pui
July 18, 2012
Tweet
Share
Other Decks in Technology
See All in Technology
小さく始めるBCP ― 多プロダクト環境で始める最初の一歩
kekke_n
1
380
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
340
SREじゃなかった僕らがenablingを通じて「SRE実践者」になるまでのリアル / SRE Kaigi 2026
aeonpeople
6
2.2k
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
180
We Built for Predictability; The Workloads Didn’t Care
stahnma
0
140
クレジットカード決済基盤を支えるSRE - 厳格な監査とSRE運用の両立 (SRE Kaigi 2026)
capytan
6
2.7k
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.8k
Tebiki Engineering Team Deck
tebiki
0
24k
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
230
AI駆動PjMの理想像 と現在地 -実践例を添えて-
masahiro_okamura
1
110
usermode linux without MMU - fosdem2026 kernel devroom
thehajime
0
230
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
Featured
See All Featured
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
640
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
51
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
Agile that works and the tools we love
rasmusluckow
331
21k
From π to Pie charts
rasagy
0
120
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.6k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
2.1k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
210
Navigating Weather and Climate Data
rabernat
0
100
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.2k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
120
Transcript
How Square Stays Up Tools and Processes Square Uses to
Maintain Stability and Availability
@pui_ling Erica Kwan
1 2 3 4 Developing Deploying Monitoring On-calling
Developing 1
We pair program (sometimes)
We solo, then get a code review (other times)
Why?
PCI Compliance Read all about it: http://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard
It is also good practice
git checkout -b topic-branch do work* git checkout master git
merge --no-ff topic-branch
A clean commit history helps
A super good git workflow: http://sandofsky.com/blog/git-workflow.html
git rebase --interactive
git rebase protip: config rebase.autosquash = true
git commit -m “squash! Monkeys”
pick 8374d8e Monkeys squash 8374d8e squash! Monkeys pick 259a7e6 Better
monkeys
Deploying 2
We deploy lots
but there are processes around deploys
Some history
We do canary deploys
None
Our full deploys do rolling restarts
And automatically run integration tests
Monitoring 3
We use common monitoring tools
We have application level checks
We have custom metrics dashboards
Graphite (whisper) + Cubism.js http://square.github.com/cubism/ http://d3js.org/ More info:
Horizon Graph http://vis.berkeley.edu/papers/horizon/
None
On-Calling 4
Engineers are responsible for their work
Ad-hoc at first
First real on-call rotations were simple
Original escalation path:
Engineer 1
Engineer 2
@jack
General on-call could not be responsible for everything
Now, every engineering team has an on-call rotation
Process is still evolving
Do these 4 things well all the time
@pui_ling /pui