Slide 1

Slide 1 text

HOW TO STAY UP WHEN YOUR SERVER GOES DOWN (that includes AWS too)

Slide 2

Slide 2 text

WHY YOU NEED TO CARE?

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

•Customers don’t care

Slide 5

Slide 5 text

•Customers don’t care •Lost sales

Slide 6

Slide 6 text

•Customers don’t care •Lost sales •Bad rep

Slide 7

Slide 7 text

• A step by step magical formula that will ensure 100% uptime WHAT THIS TALK IS NOT

Slide 8

Slide 8 text

YOU CAN ONLY Reduce your chances of going down

Slide 9

Slide 9 text

FIND THE RIGHT BALANCE Between costs and how much you need to stay up.

Slide 10

Slide 10 text

THAT SAID There are cheap sources for low powered back up servers

Slide 11

Slide 11 text

WHAT THIS TALK REALLY IS

Slide 12

Slide 12 text

WHAT THIS TALK REALLY IS Introduction to distributed systems

Slide 13

Slide 13 text

DISTRIBUTED SYSTEMS

Slide 14

Slide 14 text

DISTRIBUTED SYSTEMS A collection of independent computers that appear to the users of the system as a single computer - Tanenbaum

Slide 15

Slide 15 text

CORE IDEAS

Slide 16

Slide 16 text

YUP It is one of those hardcore CS stuff that requires both practical ability and theoretical knowledge

Slide 17

Slide 17 text

DESIGN GOALS

Slide 18

Slide 18 text

DESIGN GOALS • Fault tolerant

Slide 19

Slide 19 text

DESIGN GOALS • Fault tolerant • Transparency

Slide 20

Slide 20 text

DESIGN GOALS • Fault tolerant • Transparency • Scalable

Slide 21

Slide 21 text

DESIGN GOALS • Fault tolerant • Transparency • Scalable • Open

Slide 22

Slide 22 text

DESIGN GOALS • Fault tolerant • Transparency • Scalable • Open • Secure

Slide 23

Slide 23 text

FAULT TOLERANT System should function even during failures of some parts Also known as making resources available

Slide 24

Slide 24 text

TRANSPARENCY Same response no matter where you access it from

Slide 25

Slide 25 text

SCALABLE Easily adaptable to increased load

Slide 26

Slide 26 text

OPENNESS Easy to extend and reimplement certain parts Set of rules to describe the standard behaviour

Slide 27

Slide 27 text

SECURE Network, authentication, access control

Slide 28

Slide 28 text

DISTRIBUTED SYSTEMS for your webapp

Slide 29

Slide 29 text

KEY CONCEPTS

Slide 30

Slide 30 text

REPLICATION

Slide 31

Slide 31 text

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. From Wikipedia [http://en.wikipedia.org/wiki/Replication_(computing)]

Slide 32

Slide 32 text

Human-speak : Storing your data multiple times in multiple locations and praying hard that all of them do not die at the same time

Slide 33

Slide 33 text

1 2 3 4 Database Cluster

Slide 34

Slide 34 text

1 2 3 4 Database Cluster Master Slave Slave Slave

Slide 35

Slide 35 text

1 2 3 4 Database Cluster Down Slave Slave Slave

Slide 36

Slide 36 text

1 2 3 4 Database Cluster Down Slave Master Slave

Slide 37

Slide 37 text

•Reads on any of the nodes •Writes only on master node •Works on the principle of eventually correct data

Slide 38

Slide 38 text

SHARDING

Slide 39

Slide 39 text

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than splitting by columns (which is what normalization and vertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

Slide 40

Slide 40 text

How MongoDB does it? Splitting of data into logical Chunks based on a predefined Shard Key.

Slide 41

Slide 41 text

Machine 1 Machine 2 Machine 3 Alabama → Arizona Colorado → Florida Arkansas → California Indiana → Kansas Idaho → Illinois Georgia → Hawaii Maryland → Michigan Kentucky → Maine Minnesota → Missouri Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah Vermont → West Virgina Wisconsin → Wyoming TLDR VERSION

Slide 42

Slide 42 text

WHAT HAS THIS GOT TO DO WITH STAYING UP?

Slide 43

Slide 43 text

Machine 1 Machine 2 Machine 3 Alabama → Arizona Colorado → Florida Arkansas → California Indiana → Kansas Idaho → Illinois Georgia → Hawaii Maryland → Michigan Kentucky → Maine Minnesota → Missouri Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah Vermont → West Virgina Wisconsin → Wyoming These users are still fine!

Slide 44

Slide 44 text

MERGING THE TWO

Slide 45

Slide 45 text

Machine 1 Machine 2 Machine 3 Alabama → Arizona Colorado → Florida Arkansas → California Indiana → Kansas Idaho → Illinois Georgia → Hawaii Maryland → Michigan Kentucky → Maine Minnesota → Missouri Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah Vermont → West Virgina Wisconsin → Wyoming Machine 4 Machine 5 Machine 6 Alabama → Arizona Colorado → Florida Arkansas → California Indiana → Kansas Idaho → Illinois Georgia → Hawaii Maryland → Michigan Kentucky → Maine Minnesota → Missouri Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah Vermont → West Virgina Wisconsin → Wyoming

Slide 46

Slide 46 text

Machine 1 Machine 2 Machine 3 Alabama → Arizona Colorado → Florida Arkansas → California Indiana → Kansas Idaho → Illinois Georgia → Hawaii Maryland → Michigan Kentucky → Maine Minnesota → Missouri Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah Vermont → West Virgina Wisconsin → Wyoming Machine 4 Machine 5 Machine 6 Alabama → Arizona Colorado → Florida Arkansas → California Indiana → Kansas Idaho → Illinois Georgia → Hawaii Maryland → Michigan Kentucky → Maine Minnesota → Missouri Montana → Montana Nebraska → New Jersey Ohio → Pennsylvania New Mexico → North Dakota Rhode Island → South Dakota Tennessee → Utah Vermont → West Virgina Wisconsin → Wyoming

Slide 47

Slide 47 text

HOW SOME COMPANIES DO IT?

Slide 48

Slide 48 text

NETFLIX

Slide 49

Slide 49 text

SELF INFLICTED PAIN

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Chaos Monkey Latency Monkey Conformity Monkey Doctor Monkey Janitor Monkey Security Monkey Chaos Gorilla NETFLIX SIMIAN ARMY

Slide 52

Slide 52 text

CHAOS MONKEY • Shuts down instances randomly • Allows the team to learn about weaknesses in the system • Gets the team to build auto recovery systems

Slide 53

Slide 53 text

LATENCY MONKEY • Introduces artificial delays in api client-server communication layer • Simulates server degradation • Useful for testing fault tolerance of new services

Slide 54

Slide 54 text

CONFORMITY MONKEY • Shuts down servers that don’t adhere to best practices • Forces the service owner to relaunch them properly

Slide 55

Slide 55 text

DOCTOR MONKEY • Runs checks on health of servers • Remove unhealthy servers from service and alerts service owner

Slide 56

Slide 56 text

SECURITY MONKEY • Scans for security violations or vulnerabilities • Ensures that SSL and DRM certs are not expiring

Slide 57

Slide 57 text

CHAOS GORILLA • Shuts down all instances within an availability zone • Protects themselves from electric storms

Slide 58

Slide 58 text

WHAT CAN YOU DO?

Slide 59

Slide 59 text

RUN YOUR OWN SIMIAN ARMY https://github.com/Netflix/SimianArmy

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

•Data storage is geo replicated •Built-in high availability features •Tech support hotline without paying for more

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

• mongod  -­‐-­‐fork  -­‐-­‐replSet  demo  -­‐-­‐logpath  /tmp/mongo.log

Slide 65

Slide 65 text

•rs.initiate() •rs.add('mongodb2.cloudapp.net:27017')  

Slide 66

Slide 66 text

DEMO

Slide 67

Slide 67 text

STUFF I HAVE RUNNING ON HA

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

Q&A

Slide 71

Slide 71 text

THE END Enjoy your rest of the day :D