Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How Square Stays Up
Search
pui
July 18, 2012
Technology
3
280
How Square Stays Up
I talk I gave on the tools and processes Square uses to stay stable and available
pui
July 18, 2012
Tweet
Share
Other Decks in Technology
See All in Technology
Hyperledger Fabric(再)入門
gakumura
3
6.6k
CysharpのOSS群から見るModern C#の現在地
neuecc
2
4.2k
New Relicを活用したSREの最初のステップ / NRUG OKINAWA VOL.3
isaoshimizu
3
730
Android 15 でウィジェットピッカーのプレビュー画像をGlanceで魅せたい/nikkei-tech-talk-27-1
nikkei_engineer_recruiting
0
110
DynamoDB でスロットリングが発生したとき/when_throttling_occurs_in_dynamodb_short
emiki
0
320
OS 標準のデザインシステムを超えて - より柔軟な Flutter テーマ管理 | FlutterKaigi 2024
ronnnnn
1
430
20241120_JAWS_東京_ランチタイムLT#17_AWS認定全冠の先へ
tsumita
2
340
複雑なState管理からの脱却
sansantech
PRO
1
190
【Startup CTO of the Year 2024 / Audience Award】アセンド取締役CTO 丹羽健
niwatakeru
0
2.2k
SREが投資するAIOps ~ペアーズにおけるLLM for Developerへの取り組み~
takumiogawa
5
1.3k
日経電子版のStoreKit2フルリニューアル
shimastripe
3
180
LLMの気持ちになってRAGのことを考えてみよう
john_smith
0
160
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
329
21k
Teambox: Starting and Learning
jrom
133
8.8k
Thoughts on Productivity
jonyablonski
67
4.3k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
246
1.3M
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.1k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
What's in a price? How to price your products and services
michaelherold
243
12k
No one is an island. Learnings from fostering a developers community.
thoeni
19
3k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
229
52k
Transcript
How Square Stays Up Tools and Processes Square Uses to
Maintain Stability and Availability
@pui_ling Erica Kwan
1 2 3 4 Developing Deploying Monitoring On-calling
Developing 1
We pair program (sometimes)
We solo, then get a code review (other times)
Why?
PCI Compliance Read all about it: http://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard
It is also good practice
git checkout -b topic-branch do work* git checkout master git
merge --no-ff topic-branch
A clean commit history helps
A super good git workflow: http://sandofsky.com/blog/git-workflow.html
git rebase --interactive
git rebase protip: config rebase.autosquash = true
git commit -m “squash! Monkeys”
pick 8374d8e Monkeys squash 8374d8e squash! Monkeys pick 259a7e6 Better
monkeys
Deploying 2
We deploy lots
but there are processes around deploys
Some history
We do canary deploys
None
Our full deploys do rolling restarts
And automatically run integration tests
Monitoring 3
We use common monitoring tools
We have application level checks
We have custom metrics dashboards
Graphite (whisper) + Cubism.js http://square.github.com/cubism/ http://d3js.org/ More info:
Horizon Graph http://vis.berkeley.edu/papers/horizon/
None
On-Calling 4
Engineers are responsible for their work
Ad-hoc at first
First real on-call rotations were simple
Original escalation path:
Engineer 1
Engineer 2
@jack
General on-call could not be responsible for everything
Now, every engineering team has an on-call rotation
Process is still evolving
Do these 4 things well all the time
@pui_ling /pui