Slide 1

Slide 1 text

Architecting & Launching the Halo 4 Services SRECON ‘15

Slide 2

Slide 2 text

Caitie McCaffrey! Distributed Systems Engineer @Caitie CaitieM.com

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

• Halo Services Overview • Architectural Challenges • Orleans Basics • Tales From Production

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Presence Statistics Title Files Cheat Detection User Generated Content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Halo:CE - 6.43 million Halo 2 - 8.49 million Halo 3 - 11.87 million Halo 3: ODST - 6.22 million Halo Reach - 9.52 million

Slide 12

Slide 12 text

$220 million in sales ! 1 million players online Day One

Slide 13

Slide 13 text

$300 million in sales ! 4 million players online ! 31.4 million hours Week One

Slide 14

Slide 14 text

11.6 million players ! 1.5 billion games ! 270 million hours Overall

Slide 15

Slide 15 text

Architectural Challenges

Slide 16

Slide 16 text

Load Patterns Load Patterns

Slide 17

Slide 17 text

Azure Worker Roles Azure Table Azure Blob Azure Service Bus

Slide 18

Slide 18 text

Always Available

Slide 19

Slide 19 text

Low Latency & High Concurrency

Slide 20

Slide 20 text

Stateless 3 Tier ! Architecture

Slide 21

Slide 21 text

Latency Issues

Slide 22

Slide 22 text

Add A Cache

Slide 23

Slide 23 text

Concurrency 
 Issues

Slide 24

Slide 24 text

Data Locality

Slide 25

Slide 25 text

The Actor Model A framework & basis for reasoning about concurrency A Universal Modular Actor Formalism for Artificial Intelligence ! Carl Hewitt, Peter Bishop, Richard Steiger (1973)

Slide 26

Slide 26 text

Send A Message Create a New Actor Change Internal

Slide 27

Slide 27 text

State-full Services

Slide 28

Slide 28 text

Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin Orleans: Distributed Virtual Actors for Programmability and Scalability eXtreme Computing Group MSR

Slide 29

Slide 29 text

“Orleans is a runtime and programming model for building distributed systems, based on the actor model”

Slide 30

Slide 30 text

Virtual Actors “An Orleans actor always exists, virtually. It cannot be explicitly created or destroyed”

Slide 31

Slide 31 text

Virtual Actors • Perpetual Existence • Automatic Instantiation • Location Transparency • Automatic Scale out

Slide 32

Slide 32 text

Runtime • Messaging • Hosting • Execution

Slide 33

Slide 33 text

Orleans Programming Model

Slide 34

Slide 34 text

Reliability “Orleans manages all aspects of reliability automatically”

Slide 35

Slide 35 text

TOO!

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

TOO!

Slide 38

Slide 38 text

TOO!

Slide 39

Slide 39 text

Performance & Scalability

Slide 40

Slide 40 text

“Orleans applications run at very high CPU Utilization. We have run load tests with full saturation of 25 servers for many days at 90%+ CPU utilization without any instability”

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Load Patterns Load Patterns

Slide 43

Slide 43 text

Orleans is AP

Slide 44

Slide 44 text

• Statefull Services • Virtual Actor Abstraction • Self Healing Frameworks

Slide 45

Slide 45 text

Orleans & Halo

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Get Orleans https://github.com/dotnet/orleans! 
 
 
 


Slide 49

Slide 49 text

Tales From Production

Slide 50

Slide 50 text

DevOps! noun ! 1. The Decisions You Make Now Will Affect the Quality of Sleep You Get Later

Slide 51

Slide 51 text

Load Patterns Load Patterns

Slide 52

Slide 52 text

Story: No Data Like Prod Data aka Halo 4 launch night was not the first time Azure & Orleans saw Production Data

Slide 53

Slide 53 text

New Technology • Orleans: MSR Technology • Azure • Dispatcher

Slide 54

Slide 54 text

Halo Reach: Presence Service

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

Memory Leak

Slide 57

Slide 57 text

Practice DevOps

Slide 58

Slide 58 text

Story: Validate Dependencies aka the time we broke Azure Service Bus

Slide 59

Slide 59 text

STOP WHAT YOU’RE DOING!!!!

Slide 60

Slide 60 text

WHAT WERE YOU DOING???

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

Backup the Backup

Slide 64

Slide 64 text

Story: Clients are Jerks aka remember that time the game DOS’d us at Launch

Slide 65

Slide 65 text

Different Priorities

Slide 66

Slide 66 text

Release Valves

Slide 67

Slide 67 text

Back Pressure

Slide 68

Slide 68 text

Protect Your Services

Slide 69

Slide 69 text

Let’s Wrap it Up

Slide 70

Slide 70 text

Distributed Systems is hard

Slide 71

Slide 71 text

CAP Theorem aka why we can’t have nice things

Slide 72

Slide 72 text

Know You’re Tradeoffs hint: you are making one whether you know it or not

Slide 73

Slide 73 text

Consistency or Availability

Slide 74

Slide 74 text

Questions @Caitie