Slide 1

Slide 1 text

݆ૉ How Gogobot Works? Avi Tzurel Kenso Monday, November 12, 12

Slide 2

Slide 2 text

Who Am I? http://twitter.com/kensodev http://github.com/kensodev http://avi.io http://kensodev.com Monday, November 12, 12

Slide 3

Slide 3 text

I Scale Gogobot Monday, November 12, 12

Slide 4

Slide 4 text

What is Gogobot? Monday, November 12, 12

Slide 5

Slide 5 text

Social Recommendation Engine For Travel Monday, November 12, 12

Slide 6

Slide 6 text

Personalized Recommendations From Friends, not strangers Monday, November 12, 12

Slide 7

Slide 7 text

The Gogobot Architecture Monday, November 12, 12

Slide 8

Slide 8 text

How we make shit work? Monday, November 12, 12

Slide 9

Slide 9 text

CDN + Reverse Proxy Monday, November 12, 12

Slide 10

Slide 10 text

CDN • Edge servers across the world (hundreds) • Caches static assets • CSS, JS, Images • Caches full pages (with smart expire API) Monday, November 12, 12

Slide 11

Slide 11 text

Reverse Proxy • Super fast connection from edges means users get a local experience, with minimal latency • Cache miss? get from load balancer • Logged out traffic almost never hits or affects logged in traffic • scale differently, control differently Monday, November 12, 12

Slide 12

Slide 12 text

CDN + Reverse Proxy Monday, November 12, 12

Slide 13

Slide 13 text

CDN + Reverse Proxy Front End Back End Monday, November 12, 12

Slide 14

Slide 14 text

Front End • Serves user facing content • Communicates with cache layer and the DB • No heavy lifting, respond to the user as fast as possible Monday, November 12, 12

Slide 15

Slide 15 text

Back End • Serves realtime services (Facebook, Twitter) • Heavy lifting thrown at it from the Front End Machines • Hosts workers for the queue service Monday, November 12, 12

Slide 16

Slide 16 text

CDN + Reverse Proxy Front End Back End Monday, November 12, 12

Slide 17

Slide 17 text

Front End Back End Cache Monday, November 12, 12

Slide 18

Slide 18 text

Cache • Memcached cluster • Memcached 1.4+ • 6+ Machines • 4.5 - 15K operations per second • Hosted on Amazon ElastiCache • Solved tons of problems with memcached dying Monday, November 12, 12

Slide 19

Slide 19 text

Front End Back End Cache Monday, November 12, 12

Slide 20

Slide 20 text

Cache MySql MongoDB Redis SOLR Monday, November 12, 12

Slide 21

Slide 21 text

MySql • Master + 2 Slaves • 64G memory for each machine with 400G storage • Hourly backups • Used as the main persistence layer for the site • Reads are from slaves, writes are from master • Logged out traffic will never have access the master Monday, November 12, 12

Slide 22

Slide 22 text

MySql • EBS snapshots are used as backups • Multi region support for Amazon (1a, 1b, 1c) Monday, November 12, 12

Slide 23

Slide 23 text

MongoDB • 9 Shard • 3 Replica sets in each shards • 16HD (100G) raid on each machine • 64G memory for each machine to provide out of memory index for all queries • Used for the Graph Engine + Scoring system Monday, November 12, 12

Slide 24

Slide 24 text

Redis • Used for key+value store • Cache tagging solution on top of memcached • Queue services is hosted on Redis • Master / Slave replica • different redis cluster for different things Monday, November 12, 12

Slide 25

Slide 25 text

Redis • Different Redis clusters for each need • Indexer • Cache tagging • Realtime push with Node.js to the client • When one down, others behave normally Monday, November 12, 12

Slide 26

Slide 26 text

SOLR • Search index • NoSQL (Schema Less) • Master/Slave • Slave on each app machine, single master • Eventual consistency Monday, November 12, 12

Slide 27

Slide 27 text

Numbers • 400m+ graph users • 10K triggers per user in the scoring system • Grew ~170X last 18 months • Announced 1m registered users 4 months ago • hit 2m registered users 2 weeks after • ~70K registrations avg per day Monday, November 12, 12

Slide 28

Slide 28 text

Numbers • HUGE growth in a relatively short time Monday, November 12, 12

Slide 29

Slide 29 text

99.9% uptime Monday, November 12, 12

Slide 30

Slide 30 text

~200ms avg server response time Monday, November 12, 12

Slide 31

Slide 31 text

Logged out user gets a page in 30ms Monday, November 12, 12

Slide 32

Slide 32 text

How can you manage all of this? Monday, November 12, 12

Slide 33

Slide 33 text

Not too technical Monday, November 12, 12

Slide 34

Slide 34 text

Embed it in your culture • Developers should support end users • Get Satisfaction for example • Bugs / Problems must be engaged early and often • Be completely and utterly open Monday, November 12, 12

Slide 35

Slide 35 text

Communication • Single and agreed line of communication for everything • Chatroom through the day • Pull requests for code review • Email for updates • SMS notification for urgent stuff Monday, November 12, 12

Slide 36

Slide 36 text

No QA! Monday, November 12, 12

Slide 37

Slide 37 text

TDD Monday, November 12, 12

Slide 38

Slide 38 text

Twitter Driven Deployment Monday, November 12, 12

Slide 39

Slide 39 text

Use UTest for manual testing Daily Monday, November 12, 12

Slide 40

Slide 40 text

Chatroom Monday, November 12, 12

Slide 41

Slide 41 text

Monday, November 12, 12

Slide 42

Slide 42 text

Keep others involved Monday, November 12, 12

Slide 43

Slide 43 text

Ask Questions Monday, November 12, 12

Slide 44

Slide 44 text

AUTOMATE! Monday, November 12, 12

Slide 45

Slide 45 text

Build and Deploy Monday, November 12, 12

Slide 46

Slide 46 text

Monday, November 12, 12

Slide 47

Slide 47 text

the Gbot Monday, November 12, 12

Slide 48

Slide 48 text

gbot deploy production Monday, November 12, 12

Slide 49

Slide 49 text

What can he do? (if you ask him nicely) Or sudo Monday, November 12, 12

Slide 50

Slide 50 text

Gbot • Deploy anything, anywhere, anytime • Tell Jokes • Remind us about bugs • Run custom builds • Alert on server issues • cheer us up Monday, November 12, 12

Slide 51

Slide 51 text

Monday, November 12, 12

Slide 52

Slide 52 text

tweet - Returns a link to a tweet about Monday, November 12, 12

Slide 53

Slide 53 text

Monitor Monday, November 12, 12

Slide 54

Slide 54 text

Monitor • Monit • CPU, Memory, Server health • God • process lifecycle Monday, November 12, 12

Slide 55

Slide 55 text

Not just Up / Down Monday, November 12, 12

Slide 56

Slide 56 text

Measure Monday, November 12, 12

Slide 57

Slide 57 text

Measure • NewRelic • measure performance of everything • Ruby • MySql • Memcached • External Services Monday, November 12, 12

Slide 58

Slide 58 text

Monday, November 12, 12

Slide 59

Slide 59 text

Monday, November 12, 12

Slide 60

Slide 60 text

Share! Monday, November 12, 12

Slide 61

Slide 61 text

Share internally and externally Monday, November 12, 12

Slide 62

Slide 62 text

Monday, November 12, 12

Slide 63

Slide 63 text

github.com/gogobot/laptop Monday, November 12, 12

Slide 64

Slide 64 text

Chef Recipes Monday, November 12, 12

Slide 65

Slide 65 text

Sharing • Share your ideas • talk about features, future plans • plan together • open source your brain Monday, November 12, 12

Slide 66

Slide 66 text

Thank you! Monday, November 12, 12

Slide 67

Slide 67 text

Questions? Monday, November 12, 12