Slide 1

Slide 1 text

Scaling Deployment at Etsy Daniel Schauenberg [email protected] @mrtazz Thursday, October 10, 13

Slide 2

Slide 2 text

Thursday, October 10, 13

Slide 3

Slide 3 text

August 2013 •1.8 billion page views •5,483,399 items sold •$109.1 million of goods sold •> 30 million members •> 1 million active shops http://www.etsy.com/blog/news/2013/etsy-statistics-august-2013-weather-report/ | Items by RockerDollJewellery, ZulamimiLand, codice, 42Things Thursday, October 10, 13

Slide 4

Slide 4 text

LAMMP Item by TheBackPackShoppe Thursday, October 10, 13

Slide 5

Slide 5 text

Item by FrankelPhotos Monolithic App Thursday, October 10, 13

Slide 6

Slide 6 text

No Branching Item by NurseryWallArt Thursday, October 10, 13

Slide 7

Slide 7 text

Deploy Frequency Thursday, October 10, 13

Slide 8

Slide 8 text

First Day Item by flowersandfleurons Thursday, October 10, 13

Slide 9

Slide 9 text

Thursday, October 10, 13

Slide 10

Slide 10 text

IRC Thursday, October 10, 13

Slide 11

Slide 11 text

Thursday, October 10, 13

Slide 12

Slide 12 text

Developer VMs •KVM •Dev version of full Etsy stack •Chef •DevTools Thursday, October 10, 13

Slide 13

Slide 13 text

Thursday, October 10, 13

Slide 14

Slide 14 text

Item by codecards Thursday, October 10, 13

Slide 15

Slide 15 text

% review -r dschauenberg Thursday, October 10, 13

Slide 16

Slide 16 text

automatically assigned automatically assigned Thursday, October 10, 13

Slide 17

Slide 17 text

Try Item by CSSDesign Thursday, October 10, 13

Slide 18

Slide 18 text

Actually ... Thursday, October 10, 13

Slide 19

Slide 19 text

The Bobs Item by Signz Thursday, October 10, 13

Slide 20

Slide 20 text

The Bobs •LXC containers on buildtests •Multiple SSDs •Labels for heavy/any execution •One heavy executor per disk Thursday, October 10, 13

Slide 21

Slide 21 text

CI/Try •~260 Bobs •Mostly for try •Constant monitoring for slow tests Thursday, October 10, 13

Slide 22

Slide 22 text

push train Item by decomodwalls Thursday, October 10, 13

Slide 23

Slide 23 text

#push •IRC channel to organize push trains •Join a train if you want to deploy changes •Schedule is planned via the channel topic •First in the train is the driver (controls the deploy) •Opening hours: 7am - 10pm NYC time Thursday, October 10, 13

Slide 24

Slide 24 text

#push kseever* + jameslee | jpaul | DanielConvissor (c) Thursday, October 10, 13

Slide 25

Slide 25 text

#push bateman* + krunal* + enorris* | tristan (c) + jameslee (c) + jlaster (c) | dawa + corey + sandosh + jklein + magera + seth_home + mpascual + nathan | bateman | russp (c) Thursday, October 10, 13

Slide 26

Slide 26 text

pushbot •.join •.in •.good •.done Thursday, October 10, 13

Slide 27

Slide 27 text

pushbot Thursday, October 10, 13

Slide 28

Slide 28 text

Item by EsalonPhotography Thursday, October 10, 13

Slide 29

Slide 29 text

Deployinator Thursday, October 10, 13

Slide 30

Slide 30 text

Thursday, October 10, 13

Slide 31

Slide 31 text

Thursday, October 10, 13

Slide 32

Slide 32 text

stale commits Thursday, October 10, 13

Slide 33

Slide 33 text

version checks buttons disabled buttons disabled Thursday, October 10, 13

Slide 34

Slide 34 text

version checks Thursday, October 10, 13

Slide 35

Slide 35 text

lock down deploys Thursday, October 10, 13

Slide 36

Slide 36 text

https://github.com/etsy/deployinator Thursday, October 10, 13

Slide 37

Slide 37 text

Downsides •Deploys not atomic on the request level •Limbo during the time of the local rsync •Common strategy was to split commits into 3 deploys Thursday, October 10, 13

Slide 38

Slide 38 text

Item by Geographicsart Thursday, October 10, 13

Slide 39

Slide 39 text

Atomic Deploys Thursday, October 10, 13

Slide 40

Slide 40 text

Basic Idea Yin Yang Active Docroot Thursday, October 10, 13

Slide 41

Slide 41 text

Basic Idea Yin Yang Active Docroot rsync Thursday, October 10, 13

Slide 42

Slide 42 text

Basic Idea Yin Yang Active Docroot Thursday, October 10, 13

Slide 43

Slide 43 text

Basic Idea Yin Yang Active Docroot Thursday, October 10, 13

Slide 44

Slide 44 text

Problems •Symlink swap during requests •Code needs to be guaranteed to finish on the docroot it started •Code inclusion mid request Thursday, October 10, 13

Slide 45

Slide 45 text

etsy/mod_realdoc •Apache post_read_request hook •Whole request works on realpath of docroot •Caches realpath for 2s Thursday, October 10, 13

Slide 46

Slide 46 text

ini_set('include_path', $_SERVER['DOCUMENT_ROOT'].'/../include'); Thursday, October 10, 13

Slide 47

Slide 47 text

etsy/incpath •PHP module to set the incpath •Gets docroot from Apache or realpath() itself •Looks for a pattern to replace in include_path •Restores include_path at the end of the request Thursday, October 10, 13

Slide 48

Slide 48 text

What did we get? •Remove functions and call site in same deploy •No restarts necessary •Opcode caches stay warm for files that don’t change between 2 deploys Thursday, October 10, 13

Slide 49

Slide 49 text

Things to watch out for •Code that uses full path names to scripts •Atomic symlink swapping with `mv -T` •Realpath caching to not stress the filesystem •Opcode cache needs to fit 2x code size •Only request atomicity Thursday, October 10, 13

Slide 50

Slide 50 text

The Plateau Item by finandfancy Thursday, October 10, 13

Slide 51

Slide 51 text

The Plateau •Regular deploys took ~15 mins •Config deploys about half •10am - 6pm => ~ 32 deploys •Long waiting times Thursday, October 10, 13

Slide 52

Slide 52 text

Item by KlaireWarren Thursday, October 10, 13

Slide 53

Slide 53 text

Waiting for push queue Waiting for push queue Thursday, October 10, 13

Slide 54

Slide 54 text

Split The Queues Item by KlaireWarren Thursday, October 10, 13

Slide 55

Slide 55 text

HELLO SPLIT QUEUES HELLO SPLIT QUEUES Thursday, October 10, 13

Slide 56

Slide 56 text

Dashboards deploy lines deploy lines Thursday, October 10, 13

Slide 57

Slide 57 text

Supergrep Thursday, October 10, 13

Slide 58

Slide 58 text

Thursday, October 10, 13

Slide 59

Slide 59 text

Summary •Current setup has scaled to ~150 people •Constantly trying to improve the speed of deployment •Find weak parts in the process and make them more robust/faster •Bring Dev closer to Prod •Not being able to deploy has the same status as the site being down Thursday, October 10, 13

Slide 60

Slide 60 text

codeascraft.etsy.com www.etsy.com/codeascraft/talks etsy.github.com www.etsy.com/careers Thursday, October 10, 13

Slide 61

Slide 61 text

Questions? Thursday, October 10, 13

Slide 62

Slide 62 text

Scaling Deployment at Etsy Daniel Schauenberg [email protected] @mrtazz Thursday, October 10, 13