Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Square Stays Up
Search
pui
July 18, 2012
Technology
3
280
How Square Stays Up
I talk I gave on the tools and processes Square uses to stay stable and available
pui
July 18, 2012
Tweet
Share
Other Decks in Technology
See All in Technology
CZII - CryoET Object Identification 参加振り返り・解法共有
tattaka
0
240
開発者が自律的に AWS Security Hub findings に 対応する仕組みと AWS re:Invent 2024 登壇体験談 / Developers autonomously report AWS Security Hub findings Corresponding mechanism and AWS re:Invent 2024 presentation experience
kaminashi
0
190
リアルタイム分析データベースで実現する SQLベースのオブザーバビリティ
mikimatsumoto
0
950
まだ間に合う! エンジニアのための生成AIアプリ開発入門 on AWS
minorun365
PRO
4
580
ビジネスと現場活動をつなぐソフトウェアエンジニアリング~とあるスタートアッププロダクトの成長記録より~
mizunori
0
210
AWSでRAGを実現する上で感じた3つの大事なこと
ymae
3
1k
テストアーキテクチャ設計で実現する高品質で高スピードな開発の実践 / Test Architecture Design in Practice
ropqa
3
710
Building Products in the LLM Era
ymatsuwitter
10
4.4k
Classmethod AI Talks(CATs) #15 司会進行スライド(2025.02.06) / classmethod-ai-talks-aka-cats_moderator-slides_vol15_2025-02-06
shinyaa31
0
170
Culture Deck
optfit
0
330
【Developers Summit 2025】プロダクトエンジニアから学ぶ、 ユーザーにより高い価値を届ける技術
niwatakeru
2
890
Datadog APM におけるトレース収集の流れ及び Retention Filters のはなし / datadog-apm-trace-retention-filters
k6s4i53rx
0
320
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
41
7.2k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
The Invisible Side of Design
smashingmag
299
50k
Navigating Team Friction
lara
183
15k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
366
25k
Thoughts on Productivity
jonyablonski
69
4.5k
Making the Leap to Tech Lead
cromwellryan
133
9.1k
RailsConf 2023
tenderlove
29
1k
How GitHub (no longer) Works
holman
313
140k
Producing Creativity
orderedlist
PRO
343
39k
The Cost Of JavaScript in 2023
addyosmani
47
7.3k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
29
4.6k
Transcript
How Square Stays Up Tools and Processes Square Uses to
Maintain Stability and Availability
@pui_ling Erica Kwan
1 2 3 4 Developing Deploying Monitoring On-calling
Developing 1
We pair program (sometimes)
We solo, then get a code review (other times)
Why?
PCI Compliance Read all about it: http://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard
It is also good practice
git checkout -b topic-branch do work* git checkout master git
merge --no-ff topic-branch
A clean commit history helps
A super good git workflow: http://sandofsky.com/blog/git-workflow.html
git rebase --interactive
git rebase protip: config rebase.autosquash = true
git commit -m “squash! Monkeys”
pick 8374d8e Monkeys squash 8374d8e squash! Monkeys pick 259a7e6 Better
monkeys
Deploying 2
We deploy lots
but there are processes around deploys
Some history
We do canary deploys
None
Our full deploys do rolling restarts
And automatically run integration tests
Monitoring 3
We use common monitoring tools
We have application level checks
We have custom metrics dashboards
Graphite (whisper) + Cubism.js http://square.github.com/cubism/ http://d3js.org/ More info:
Horizon Graph http://vis.berkeley.edu/papers/horizon/
None
On-Calling 4
Engineers are responsible for their work
Ad-hoc at first
First real on-call rotations were simple
Original escalation path:
Engineer 1
Engineer 2
@jack
General on-call could not be responsible for everything
Now, every engineering team has an on-call rotation
Process is still evolving
Do these 4 things well all the time
@pui_ling /pui