Slide 1

Slide 1 text

BUILDING THE WORLD'S LARGEST WEBSITES with Consul and Terraform

Slide 2

Slide 2 text

SETH VARGO @sethvargo

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

CHALLENGE #0 EVOLUTION OF THE MODERN DATACENTER

Slide 5

Slide 5 text

RISING DATACENTER COMPLEXITY DC

Slide 6

Slide 6 text

RISING DATACENTER COMPLEXITY DC

Slide 7

Slide 7 text

RISING DATACENTER COMPLEXITY DC VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM

Slide 8

Slide 8 text

RISING DATACENTER COMPLEXITY DC VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C

Slide 9

Slide 9 text

RISING DATACENTER COMPLEXITY DC DNS Database CDN

Slide 10

Slide 10 text

RISING DATACENTER COMPLEXITY DC-01 DC-02

Slide 11

Slide 11 text

RISING DATACENTER COMPLEXITY DC-01 DC-02 VM VM VM VM VM VM VM VM C C C C C C C C C C C C C C C C C C C C C C C C

Slide 12

Slide 12 text

RISING DATACENTER COMPLEXITY IaaS PaaS SaaS

Slide 13

Slide 13 text

RISING DATACENTER COMPLEXITY

Slide 14

Slide 14 text

CHALLENGE #1 DECENTRALIZED SERVICE CONFIG

Slide 15

Slide 15 text

CONFIG MGMT SERVER TRADITIONAL SERVICE CONFIGURATION Pull-based, long intervals, computationally expensive WEB 1 WEB 2 WEB N 14:00 14:07 14:03

Slide 16

Slide 16 text

CONSUL

Slide 17

Slide 17 text

SERVICE DISCOVERY LOAD BALANCING HEALTH CHECKING KEY-VALUE CONFIGURATION SOLVES THE 4 BASIC PROBLEMS

Slide 18

Slide 18 text

CONSUL CONSUL K/V + CONSUL-TEMPLATE Push-based, “instant”, predictable computational cost WEB 1 WEB 2 WEB N 14:00:00.311 14:00:00.731 14:00:00.415

Slide 19

Slide 19 text

DISTRIBUTED K/V STORE Allows for per-datacenter configuration

Slide 20

Slide 20 text

CONSUL-TEMPLATE Template Example global daemon maxconn {{key "haproxy/maxconn"}} defaults mode {{key "haproxy/mode"}}{{range ls "haproxy/timeouts"}} timeout {{.Key}} {{.Value}}{{end}} listen http-in bind *:8000{{range service "release.web"}} server {{.Node}} {{.Address}}:{{.Port}}{{end}}

Slide 21

Slide 21 text

CONSUL-TEMPLATE Execute (as a service) $ consul-template \ -consul demo.consul.io \ -template “haproxy.ctmpl:/etc/haproxy/haproxy.conf:restart haproxy” -dry

Slide 22

Slide 22 text

STEP BY STEP 1. Config management tooling lays down configuration template 2. consul-template runs as a service 3. Edge triggers config changes, restarts service

Slide 23

Slide 23 text

CHALLENGE #2 SCALABLE SERVICE DISCOVERY

Slide 24

Slide 24 text

ZERO TTL DNS Long-held connections to minimize DNS overhead Zero TTL ensures most up-to-date information

Slide 25

Slide 25 text

RESILIENCY Low-TTL DNS records Ensures availability even if Consul is unavailable Required for short-held connections since DNS lookup overhead is too high with zero TTL

Slide 26

Slide 26 text

CONSUL AGENT OPTION #1: CONSUL SETTINGS Per-service, stale reads on non-leaders WEB PROCESS DNS query CONSUL 
 LEADER CONSUL 
 STANDBY

Slide 27

Slide 27 text

CONSUL AGENT OPTION #2: DNSMASQ + CONSUL Global, works if Consul is down WEB PROCESS DNS query CONSUL 
 LEADER CONSUL 
 STANDBY DNSMASQ

Slide 28

Slide 28 text

CONSUL AGENT OPTION #2: DNSMASQ + CONSUL Global, works if Consul is down WEB PROCESS DNS query CONSUL 
 LEADER CONSUL 
 STANDBY DNSMASQ

Slide 29

Slide 29 text

CONSUL AGENT OPTION #3: APPLICATION-LEVEL CACHE Works if almost everything is down, strict control over cache times WEB PROCESS DNS query CONSUL 
 LEADER CONSUL 
 STANDBY IN-MEM CACHE

Slide 30

Slide 30 text

CHALLENGE #3 MONITORING AT SCALE

Slide 31

Slide 31 text

MONITORING SERVICE TRADITIONAL MONITORING Pushes information into a silo WEB 1 WEB 2 WEB N

Slide 32

Slide 32 text

MONITORING SERVICE TRADITIONAL MONITORING Pushes information into a silo WEB 1 WEB 2 WEB N

Slide 33

Slide 33 text

MONITORING SERVICE TRADITIONAL MONITORING Pushes information into a silo WEB 1 WEB 2 WEB N

Slide 34

Slide 34 text

MONITORING SERVICE TRADITIONAL MONITORING Pushes information into a silo WEB 1 WEB 2 WEB N

Slide 35

Slide 35 text

MONITORING SERVICE TRADITIONAL MONITORING Pushes information into a silo WEB 1 WEB 2 WEB N U

Slide 36

Slide 36 text

MONITORING SERVICE TRADITIONAL MONITORING Pushes information into a silo WEB 1 WEB 2 WEB N U F F

Slide 37

Slide 37 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N

Slide 38

Slide 38 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N

Slide 39

Slide 39 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6

Slide 40

Slide 40 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6

Slide 41

Slide 41 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6

Slide 42

Slide 42 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul

Slide 43

Slide 43 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul

Slide 44

Slide 44 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul

Slide 45

Slide 45 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.5 10.0.1.6 host: web.service.consul

Slide 46

Slide 46 text

CONSUL CONSUL MONITORING Removes unhealthy nodes from service discovery layer WEB 1 WEB 2 WEB N dig web.service.consul 10.0.1.4 10.0.1.5 10.0.1.6 host: web.service.consul

Slide 47

Slide 47 text

ATLAS CONSUL CONSUL MONITORING + ALERTING via Atlas WEB 1 WEB 2 WEB N

Slide 48

Slide 48 text

CONSUL MONITORING + ALERTING Atlas UI

Slide 49

Slide 49 text

CONSUL MONITORING + ALERTING Atlas UI

Slide 50

Slide 50 text

CONSUL MONITORING + ALERTING Atlas UI

Slide 51

Slide 51 text

CONSUL MONITORING + ALERTING Atlas UI

Slide 52

Slide 52 text

CONSUL MONITORING + ALERTING History

Slide 53

Slide 53 text

CHALLENGE #4 SERVICE RESILIENCY 
 VIA DISTRIBUTED LOCKING

Slide 54

Slide 54 text

CONSUL LOCK Allows for a new kind of "HA" demo  master consul lock [options] prefix child...

Slide 55

Slide 55 text

CONSUL LOCK Making standby HA much simpler CONSUL VAULT 1 VAULT 2 VAULT 3

Slide 56

Slide 56 text

CONSUL LOCK Making standby HA much simpler CONSUL VAULT 1 VAULT 2 VAULT 3 L L

Slide 57

Slide 57 text

CONSUL LOCK Making standby HA much simpler CONSUL VAULT 1 VAULT 2 VAULT 3 L

Slide 58

Slide 58 text

CONSUL LOCK Making standby HA much simpler CONSUL VAULT 1 VAULT 2 VAULT 3 L LEADER ELECTION

Slide 59

Slide 59 text

CONSUL LOCK Solves the "exactly one of these must always be running" problem

Slide 60

Slide 60 text

VM C C C C VM C C C C VM C C C C VM C C C C VM C C C C ROLLING RESTARTS/UPGRADES

Slide 61

Slide 61 text

CHALLENGE #5 SERVICE ORCHESTRATION 
 VIA EVENTS & WATCHES

Slide 62

Slide 62 text

CONSUL CONSUL EVENTS Edge-triggered, sent to all nodes, computationally cheap WEB 1 WEB 2 DATABASE consul event deploy

Slide 63

Slide 63 text

CONSUL CONSUL WATCH Watch and execute script for specific events WEB 1 WEB 2 DATABASE consul event deploy

Slide 64

Slide 64 text

CONSUL CONSUL EXEC Run arbitrary commands on nodes WEB 1 WEB 2 DATABASE consul exec -service=web ./script.sh

Slide 65

Slide 65 text

CONSUL WATCH Wait for event, then do something demo  master consul watch -type=event -name=deploy ./deploy.sh

Slide 66

Slide 66 text

Deploys Operational tasks Configuring external services USE CASES

Slide 67

Slide 67 text

CHALLENGE #6 DETERMINISTIC LARGE- SCALE INFRASTRUCTURE CHANGE

Slide 68

Slide 68 text

LARGE SCALE UPDATE PROBLEMS UNEXPECTED INTER-DEPENDENCIES CROSS-CLOUD CHANGES ORDERING FOR MINIMAL DISRUPTION EXPECTED TIME FOR COMPLETE ROLLOUT

Slide 69

Slide 69 text

WHAT IF I ASKED YOU TO...

Slide 70

Slide 70 text

WHAT IF I ASKED YOU TO... CREATE AN EPHEMERAL ENVIRONMENT (STAGING, ETC)?

Slide 71

Slide 71 text

WHAT IF I ASKED YOU TO... CREATE AN EPHEMERAL ENVIRONMENT (STAGING, ETC)? UPDATE AN EXISTING COMPLEX APPLICATION?

Slide 72

Slide 72 text

WHAT IF I ASKED YOU TO... CREATE AN EPHEMERAL ENVIRONMENT (STAGING, ETC)? UPDATE AN EXISTING COMPLEX APPLICATION? DOCUMENT YOUR INFRASTRUCTURE ARCHITECTURE?

Slide 73

Slide 73 text

WHAT IF I ASKED YOU TO... CREATE AN EPHEMERAL ENVIRONMENT (STAGING, ETC)? UPDATE AN EXISTING COMPLEX APPLICATION? DOCUMENT YOUR INFRASTRUCTURE ARCHITECTURE? DELEGATE SOME OPS TO SMALLER TEAMS (CORE VS. APP IT)?

Slide 74

Slide 74 text

TERRAFORM

Slide 75

Slide 75 text

TERRAFORM'S GOAL

Slide 76

Slide 76 text

PROVIDE A SINGLE WORKFLOW

Slide 77

Slide 77 text

WITH A UNIFIED VIEW

Slide 78

Slide 78 text

USING INFRASTRUCTURE AS CODE

Slide 79

Slide 79 text

THAT CAN BE ITERATED AND CHANGED SAFELY

Slide 80

Slide 80 text

CAPABLE OF COMPLEX N-TIER APPLICATIONS

Slide 81

Slide 81 text

HOW?

Slide 82

Slide 82 text

DIGITAL OCEAN DROPLET WITH DNS USING DNS SIMPLE resource "digitalocean_droplet" "web" { name = "tf-web" size = "512mb" image = "centos-5-8-x32" region = "sfo1" } resource "dnsimple_record" "hello" { domain = "example.com" name = "test" value = "${digitalocean_droplet.web.ipv4_address}" type = "A" }

Slide 83

Slide 83 text

DIGITAL OCEAN DROPLET WITH DNS USING DNS SIMPLE resource "digitalocean_droplet" "web" { name = "tf-web" size = "512mb" image = "centos-5-8-x32" region = "sfo1" } resource "dnsimple_record" "hello" { domain = "example.com" name = "test" value = "${digitalocean_droplet.web.ipv4_address}" type = "A" }

Slide 84

Slide 84 text

DIGITAL OCEAN DROPLET WITH DNS USING DNS SIMPLE resource "digitalocean_droplet" "web" { name = "tf-web" size = "512mb" image = "centos-5-8-x32" region = "sfo1" } resource "dnsimple_record" "hello" { domain = "example.com" name = "test" value = "${digitalocean_droplet.web.ipv4_address}" type = "A" }

Slide 85

Slide 85 text

DIGITAL OCEAN DROPLET WITH DNS USING DNS SIMPLE resource "digitalocean_droplet" "web" { name = "tf-web" size = "512mb" image = "centos-5-8-x32" region = "sfo1" } resource "dnsimple_record" "hello" { domain = "example.com" name = "test" value = "${digitalocean_droplet.web.ipv4_address}" type = "A" }

Slide 86

Slide 86 text

HUMAN-FRIENDLY CONFIG* * JSON-COMPATIBLE FOR NON-HUMANS

Slide 87

Slide 87 text

VCS-FRIENDLY FORMAT

Slide 88

Slide 88 text

ENTIRE INFRASTRUCTURE... IN A SINGLE TEXT FILE

Slide 89

Slide 89 text

TERRAFORM PLAN What are you going to do? demo  master terraform plan + digitalocean_droplet.web backups: "" => "" image: "" => "centos-5-8-x32" ipv4_address: "" => "" ipv4_address_private: "" => "" name: "" => "tf-web" private_networking: "" => "" region: "" => "sfo1" size: "" => "512mb" status: "" => ""

Slide 90

Slide 90 text

TERRAFORM GRAPH What order are you going to do things? demo  master terraform graph digraph { compound = "true" newrank = "true" subgraph "root" { "[root] aws_instance.haproxy" [label = "aws_instance.haproxy", shape = "box"] "[root] aws_instance.web" [label = "aws_instance.web", shape = "box"] "[root] aws_internet_gateway.terraform-tutorial" [label = "aws_internet_gateway.terraform-tutorial", shape = "box"] "[root] aws_route_table.terraform-tutorial" [label =

Slide 91

Slide 91 text

CHALLENGE #7 DELEGATING OPS TO MULTIPLE TEAMS

Slide 92

Slide 92 text

OPERATIONS DELEGATION "CORE" OPERATIONS TEAMS APPLICATION OPERATIONS TEAMS ELIMINATE SHADOW OPS SAFELY MAKE CHANGES SHARE OPERATIONS KNOWLEDGE

Slide 93

Slide 93 text

TERRAFORM MODULES module "consul" { source = "github.com/hashicorp/consul/terraform/aws" servers = 5 version = "0.4.0" }

Slide 94

Slide 94 text

TERRAFORM MODULES module "consul" { source = "github.com/hashicorp/consul/terraform/aws" servers = 5 version = "0.4.0" } resource "dnsimple_record" "consul" { domain = "example.com" name = "consul" value = "${module.consul.ip_address}" type = "A" }

Slide 95

Slide 95 text

TERRAFORM REMOTE STATE resource "terraform_remote_state" "consul" { backend = "atlas" config { path = "hashicorp/consul-prod" } } output "consul-address" { value = "${terraform_remote_state.consul.addr}" }

Slide 96

Slide 96 text

CHALLENGE #8 SERVICE COMPOSITION, INFRASTRUCTURE ORCHESTRATION

Slide 97

Slide 97 text

SERVICE COMPOSITION Modern infrastructures are almost always "multi-provider": DNS in CloudFlare, compute in AWS, etc. Infrastructure change requires composing data from multiple services, executing change in multiple services

Slide 98

Slide 98 text

SERVICE COMPOSITION // Terraform allows you to combine multiple external providers and // their outputs into a single pipeline resource "aws_instance" "web" {
 // Existing resource attributes } resource "cloudflare_record" "www" { domain = "foo.com" name = "www" value = "${aws_instance.web.private_ip}" type = "A" }

Slide 99

Slide 99 text

LOGICAL RESOURCES // In additional to physical resources, Terraform also has logical // resources such as templates resource "template_file" "data" { filename = "data.tpl" vars { address = "${var.addr}" } } resource "aws_instance" "web" {
 user_data = "${template_file.data.rendered}" }

Slide 100

Slide 100 text

CHALLENGE #9 HISTORY OF CHANGES

Slide 101

Slide 101 text

HISTORY OF INFRASTRUCTURE CHANGE Atlas by HashiCorp

Slide 102

Slide 102 text

HISTORY OF INFRASTRUCTURE CHANGE Atlas by HashiCorp Who is making changes?

Slide 103

Slide 103 text

HISTORY OF INFRASTRUCTURE CHANGE Atlas by HashiCorp How did changes occur?

Slide 104

Slide 104 text

HISTORY OF INFRASTRUCTURE CHANGE Atlas by HashiCorp SCM-like workflow

Slide 105

Slide 105 text

CHALLENGE #10 INFRASTRUCTURE COLLABORATION

Slide 106

Slide 106 text

INFRASTRUCTURE COLLABORATION Approve plans - similar to pull requests, but for infrastructure SCM integration

Slide 107

Slide 107 text

INFRASTRUCTURE COLLABORATION Approve plans - similar to pull requests, but for infrastructure Infrastructure change review

Slide 108

Slide 108 text

INFRASTRUCTURE COLLABORATION Approve plans - similar to pull requests, but for infrastructure Ability to "gate" process

Slide 109

Slide 109 text

SETH VARGO @sethvargo QUESTIONS?