Slide 1

Slide 1 text

Thank you to Jonathan, Duncan, Jonathan, Marije, Jennifer, Lessfuss, and Cape Town. ROCKS! Friday, January 27, 12

Slide 2

Slide 2 text

basho @pharkmillups themarkphillips.com [email protected] Mark Phillips Friday, January 27, 12

Slide 3

Slide 3 text

Building Healthy Distributed Systems ScaleConf January 27, 2012 basho Friday, January 27, 12

Slide 4

Slide 4 text

“A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal.” [1] What is a distributed system? basho Friday, January 27, 12

Slide 5

Slide 5 text

Distributed, Scalable, Fault Tolerant No central coordinator; Easy to setup and operate basho Friday, January 27, 12

Slide 6

Slide 6 text

Distributed, Scalable, Fault Tolerant Horizontally Scalable; Add commodity hardware to get more [throughput | processing | storage]. basho Friday, January 27, 12

Slide 7

Slide 7 text

Distributed, Scalable, Fault Tolerant Always Available No Single Point of Failure Self-healing basho Friday, January 27, 12

Slide 8

Slide 8 text

basho Friday, January 27, 12

Slide 9

Slide 9 text

basho { • Founded in 2007 • Collapsed in 2008 • “Pivoted” in 2009 • Commercial Sponsors of Riak, an Open Source, NoSQL Database • Sells Closed Source Extensions to Riak in the form licenses Friday, January 27, 12

Slide 10

Slide 10 text

2009 2010 2011 14 60 25 Year on Year Growth basho Friday, January 27, 12

Slide 11

Slide 11 text

basho Office Locations Friday, January 27, 12

Slide 12

Slide 12 text

Actual Employee Distribution basho Friday, January 27, 12

Slide 13

Slide 13 text

“A distributed [company] consists of multiple autonomous [team members] that communicate [and collaborate] through various [channels]. The [team members] interact with each other in order to achieve a common goal.” What is a distributed [company]? basho Friday, January 27, 12

Slide 14

Slide 14 text

Hiring where the talent is means we don’t sacrifice great hires for location, but it also presents various hurdles when attempting to build culture and community. basho basho Friday, January 27, 12

Slide 15

Slide 15 text

1. Make Basho into a Powerhouse 2. Professional Development 3. Employee Happiness 4. Deliver Exceptional Product Common Goals for Basho basho Friday, January 27, 12

Slide 16

Slide 16 text

Internal Communication and Collaboration • Real-time Chat (Jabber, Camp Fire) • Skype (or some for of video chat) • Yammer • GitHub • AgileZen • Email (sort of) • Documentation basho Friday, January 27, 12

Slide 17

Slide 17 text

Good Meetings basho • Quarterly In-person “Summits” • Bi-Monthly, Non-Mandatory Company All Hands • Stands up, Scrum Friday, January 27, 12

Slide 18

Slide 18 text

Make Documentation Part of Your Culture basho • Inside Jokes • Internal Talks • Design Documents • Product Ideas • Product Feedback • New Hire Processes • Everything Else Friday, January 27, 12

Slide 19

Slide 19 text

Open Source Your Code. And Use GitHub. basho • Contributes Directly to Developer Happiness • Makes Your Company’s Product Better • Great Marketing • Use a Permissive License (http://bit.ly/clJyDO) (http://bit.ly/v3OMEf) “Open Source Almost Everything” “Why Your Company Should Have a Permissive Open Source Policy” Friday, January 27, 12

Slide 20

Slide 20 text

Friday, January 27, 12

Slide 21

Slide 21 text

basho basho Hiring Should Not Happen In A Vacuum Friday, January 27, 12

Slide 22

Slide 22 text

Poor Culture Rots a Company from within and Lessens its Resiliency basho Friday, January 27, 12

Slide 23

Slide 23 text

basho Company Fault Tolerance • New CEO + Massive Growth = New Challenges • Our System is Constantly Improving Friday, January 27, 12

Slide 24

Slide 24 text

2012 1** Planned Growth basho Friday, January 27, 12

Slide 25

Slide 25 text

basho DS2: The Riak Community Friday, January 27, 12

Slide 26

Slide 26 text

“A distributed [community] consists of multiple autonomous [members] that communicate [and collaborate] through various [channels]. The [members] interact with each other in order to achieve a common goal.” What is a distributed [community]? basho Friday, January 27, 12

Slide 27

Slide 27 text

Community Friday, January 27, 12

Slide 28

Slide 28 text

basho Why Build A Community? Friday, January 27, 12

Slide 29

Slide 29 text

Grassroots Marketing, Branding, Awareness: basho Friday, January 27, 12

Slide 30

Slide 30 text

Code Contributions and Bug Fixes : basho 176 names in our THANKS file 1600 hours contributed from Oct 2010 - Sept 2011 Friday, January 27, 12

Slide 31

Slide 31 text

Support: basho Friday, January 27, 12

Slide 32

Slide 32 text

Revenue: basho 75% of new customers in 2011 came from the Open Source Community Friday, January 27, 12

Slide 33

Slide 33 text

Importance of Community for Community Members basho •Working, Quality Code •Recognition and Praise •Desire to Contribute •Jobs (whether they like it or not) •Skills Acquisition Friday, January 27, 12

Slide 34

Slide 34 text

Communication and Collaboration in a Distributed [Community] basho •IRC •Mailing List •Twitter •Riak Recap •Meetups •Q & A Sites •Blogs •Books •Conferences •Actual Meetings •GitHub •Drinking Friday, January 27, 12

Slide 35

Slide 35 text

Riak Recap basho Friday, January 27, 12

Slide 36

Slide 36 text

Books basho http://riakhandbook.com/ Friday, January 27, 12

Slide 37

Slide 37 text

Meetups and Drinking basho Friday, January 27, 12

Slide 38

Slide 38 text

GitHub basho Friday, January 27, 12

Slide 39

Slide 39 text

Give Things Away basho Friday, January 27, 12

Slide 40

Slide 40 text

Build Communities Regardless basho Friday, January 27, 12

Slide 41

Slide 41 text

basho Community Fault Tolerance Friday, January 27, 12

Slide 42

Slide 42 text

DS3: Riak-based Distributed System basho Friday, January 27, 12

Slide 43

Slide 43 text

“A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal.” What is a distributed system? basho Friday, January 27, 12

Slide 44

Slide 44 text

• a database • a key/value store • distributed • fault-tolerant • scalable • Dynamo-inspired • used by startups • used by FORTUNE 100 companies • written (primarily) in Erlang • pronounced “REE-awk” • not the right fit for every project and app basho { Friday, January 27, 12

Slide 45

Slide 45 text

1000s of Deployments Friday, January 27, 12

Slide 46

Slide 46 text

basho Friday, January 27, 12

Slide 47

Slide 47 text

basho Common Goals for Voxer’s System 1. Serve and Receive App Traffic 2. Perform Queries When Needed 3. Don’t Go Down 4. Scale Out to Meet Demand 5. Low, Consistent Response Times Friday, January 27, 12

Slide 48

Slide 48 text

Voxer’s Initial Riak Cluster Stats (Oct 2011) •11 Riak Nodes •Modest Data Set Size (100s of Gs) •~20,000 Peak Concurrent Users •~4,000,000 Daily Total Requests Then something happened... basho Friday, January 27, 12

Slide 49

Slide 49 text

Friday, January 27, 12

Slide 50

Slide 50 text

basho Friday, January 27, 12

Slide 51

Slide 51 text

Voxer’s Current Riak Cluster Stats • 41 Node Cluster for User Data • 37 Node Cluster to serve app traffic • ~350G/day of user data being added daily • 100,000s of concurrent users at peak • Went from 11 to about 80 nodes in a month • At one point adding three nodes/day basho Friday, January 27, 12

Slide 52

Slide 52 text

basho Voxer’s Fault Tolerance • Have lost a lot of nodes in production • TCP Incast Problem [2] • LevelDB merge issues • Lots of other shit went wrong but it’s still running :) Friday, January 27, 12

Slide 53

Slide 53 text

“Scalability is the ability of a system, network, or process, to handle growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.”[3] basho Friday, January 27, 12

Slide 54

Slide 54 text

Present System Health Dictates Future Ability to Scale basho Friday, January 27, 12

Slide 55

Slide 55 text

credit: http://blogs.ajc.com/jeff-schultz-blog/files/2009/06/closedsign.png basho Distributed [ Companies | Communities | Systems ] are all susceptible to downtime. Friday, January 27, 12

Slide 56

Slide 56 text

Capacity Plan or Perish basho Friday, January 27, 12

Slide 57

Slide 57 text

Everything Is Distributed Now basho Friday, January 27, 12

Slide 58

Slide 58 text

basho @pharkmillups themarkphillips.com [email protected] Mark Phillips Questions? Friday, January 27, 12

Slide 59

Slide 59 text

basho References 1. http://en.wikipedia.org/wiki/Distributed_computing 2. http://www.snookles.com/slf-blog/2012/01/05/tcp-incast-what-is-it/ 3. http://en.wikipedia.org/wiki/Scalability Friday, January 27, 12