Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HashiConf 2016 - Consul @ Target

Danny Parker
September 08, 2016

HashiConf 2016 - Consul @ Target

Slides from Target's talk at HashiConf 2016.

Danny Parker

September 08, 2016
Tweet

Other Decks in Technology

Transcript

  1. Danny Parker - @dcparker88
    Matt Helgen - @matt_helgen

    View full-size slide

  2. Agenda
    • Before Consul
    • Kicking the Tires
    • Commitment and Scaling
    • Interesting Use Cases

    View full-size slide

  3. About Target
    • 2 Enterprise Data Centers
    • 38 Distribution Centers
    • 1795 Stores
    • ~$70 billion revenue
    • target.com / mobile
    • $1 billion investment

    View full-size slide

  4. The Dark Ages (pre Consul)

    View full-size slide

  5. Target’s Engineering Journey
    • DevOps: Product-driven teams
    • Focus on in-house engineering talent
    • On track to hire 1000 engineers this year

    View full-size slide

  6. 2014 - Guest Facing Cloud POC
    • Small, low risk POC Application
    • How fast can we rebuild and deploy a public facing app?
    • Prove the value of speed to the business
    • IAAS templates + chef cookbooks

    View full-size slide

  7. Cloud Architecture
    • 2 Regions x 2 Environments – target.com applications
    • Shared Platform Infrastructure - Product teams deploy at will
    • The cloud platform manages:
    • Deployment pipeline
    • Service discovery
    • Log aggregation
    • Metrics collection
    • Data delivery
    Consul
    Log Aggregation
    Metrics
    C* / Kafka
    VPN
    Applications
    Applications Applications
    x4

    View full-size slide

  8. Consul POC
    • Compelling Features
    • Service registry
    • Service health checks
    • Real-time updates
    • Installed it over a weekend and started integrating to our demo
    • 4x faster deploy time
    • Parallel deployment of all VMs
    • Chef runs no longer had issues with dependencies
    • Health checks / Consul template (haproxy demo)
    • A++++ Would recommend
    • Started integrating it everywhere

    View full-size slide

  9. Monitoring Stack
    • Sensu / Graphite for alerts and metrics
    • Consul DNS on all servers on the platform:
    • monitoring-rabbitmq.service.consul:4567
    • servicewatch: service sensu-client restart
    • graphite.service.consul:2003
    • Result:
    • Rebuild the monitoring stack 10-20x a day to rapidly iterate
    • All clients automatically reconnect, no noticeable impact

    View full-size slide

  10. Medieval Times (Consul on-prem)

    View full-size slide

  11. Old Internal Architecture
    • Started 4 years ago with Chef and 30 VMs
    • Chef (and lots of Chef search) configured:
    • Nginx
    • API -> API traffic
    • Logging/Monitoring (Sensu/ELK)
    • Database (Cassandra)

    View full-size slide

  12. Current Internal Architecture
    • 100+ APIs, microservices architecture
    • 3k+ instances
    • Nginx
    • Consul-template for upstream
    • Consul kv for specific API settings
    • Microservices
    • One service might talk to many (aggregators)
    • Local HAproxy + consul template
    • Consul has become the source of truth
    Nginx/Varnish
    Fastly
    API
    Consul
    Cassandra

    View full-size slide

  13. Preparing our APIs
    • Better health checks
    • /health
    • up/down
    • database connections
    • disk
    • Be careful - you could DDoS yourself with consul
    • Consistency
    • Same ports, config locations, etc
    • Our APIs should be like “legos”

    View full-size slide

  14. Data Platform
    • Cassandra/ELK/Kafka
    • Provide teams with a consul DNS name
    • We can alter the underlying nodes
    • No more chef search or static code
    • Cassandra seed discovery
    • Ask consul KV if the cluster exists
    • If not, you are the seed
    • Be careful with heavy health checks
    • nodetool status can take a long time

    View full-size slide

  15. Deployment Statistics
    • Prod
    • 3600+ consul agents
    • 450+ services
    • Non-Prod
    • 4000 consul agents
    • 500 services
    • Every server deployed takes advantage of consul

    View full-size slide

  16. Interesting Consul Use Cases

    View full-size slide

  17. “Immutable” Cassandra
    • Config files generated from consul kv
    • cassandra-env.sh
    • cassandra.yaml
    • KV path from user data
    • Allows a generic image to be very flexible
    • Speeds build time
    • Reduces errors
    • ~100 node cluster

    View full-size slide

  18. Scaling Elasticsearch

    View full-size slide

  19. Consul Certificate Signing
    • Utilizes Consul kv
    • Security managed server has consul key watch
    • Consul clients post a CSR to a particular kv path
    • Server responds with a signed cert in a different kv path
    • Scriptable certificate signing
    • Valid certificates signed by internal PKI, security approved

    View full-size slide

  20. Network Considerations
    • Nodes that couldn’t satisfy full mesh
    • Internal
    • caused 500mb/s+ network traffic
    • disabled consul agent
    • Cloud
    • Lots of flapping services due to raft protocol and network
    routing
    • Result: Make all private addresses routable to all others

    View full-size slide

  21. Consul DNS
    • Dnsmasq
    • resolv.conf points to dnsmasq
    • Dnsmasq directs consul zones to consul cluster
    • .service.consul
    • Allow separate consul-template hosts file
    • Supports de-duplication

    View full-size slide

  22. Immutable Image Pipeline
    • Bake all consul templates into an image - immutable / autoscalable
    deployment
    • Envconsul => set environment variables at runtime and drive all config
    • Push Consul KV to clusters automatically through jenkins and github
    • Vault integration - trustme api
    • trustme provides a consul token with the proper ACL based on tags / environment variables
    • Cloud and environment agnostic

    View full-size slide

  23. Lessons Learned
    • Work on your elevator pitch
    • Use Consul everywhere
    • Know what to do if you lose your masters
    • Focus on fundamentals

    View full-size slide

  24. Come find us!

    View full-size slide