Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Picos de tráfico - BA Tech Talk #1

Picos de tráfico - BA Tech Talk #1

Nahuel Oyhanarte sobre cómo soportar picos de tráfico en AWS.
Best Practices and Case Study.

Avatar for Edrans Social

Edrans Social

August 24, 2017
Tweet

More Decks by Edrans Social

Other Decks in Technology

Transcript

  1. Picos de tráfico Cómo soportamos “mucho” tráfico? Cómo evitamos fallos

    y cuellos de botella? Cómo aseguramos 100% uptime? Cómo garantizamos una buena experiencia para todos? Depende… no tenemos todas las respuestas, pero sí algunas sugerencias :)
  2. Best Practices ESTÁTICOS? AFUERA! - S3 para hostear assets -

    CloudFront para servirlos - Aceleramos tiempos del sitio - Reducimos carga en front’s
  3. Best Practices SCALING - Servers should be stateless - Real

    data persisted only on DBs - Temp data persisted on cache tier - Treat your servers as cattle - Scale OUT > Scale UP - Auto-scaling by CPU/RAM/IO usage
  4. Best Practices LOAD BALANCE AWS ELB: - Rutea tráfico sólo

    a instancias healthy - Registra y de-registra automágicamente Otras opciones: - Nginx / HaProxy - Traefik + Consul
  5. Best Practices AUTOMATE - Repeatability and reproducibility - HW: Terraform

    / CloudFormation - SW: Ansible / Puppet / Chef / etc.
  6. DECOUPLE - Synchronicity is bad - Use queues to avoid

    coupling - Tiers should be “independent” - Fallbacks in place! Best Practices
  7. Best Practices CACHE - Avoid expensive trips to DB -

    Save common queries - Centralize session data - RAM beats HD - Memcache / Redis / etc.
  8. Best Practices SPOF - Avoid Single Points of Failure -

    Plan for failure - Auto-remediation & self-healing - Failover procedures "Everything fails, all the time". Werner Vogels, CTO AWS (not actually a meme)
  9. HA - Embrace multi AZ, multi Region or even multi

    Cloud! - Duplicate (triplicate?) critical component’s - Use highly available and fault tolerant services “If an individual EC2 instance or an entire AWS Availability Zone fails, your app should stay up. This is the essence of architecting for High Availability (HA).” Best Practices
  10. Best Practices METRICS You can’t improve what you don’t measure.

    - Host-level metrics   - Aggregate-level metrics - Log analysis   - External site performance Plan Test Improve
  11. Best Practices STRESS TEST - Benchmark how many RPS you

    can support - Test your infra under heavy load - Stress every component - Fail early, fail fast, fail again Tools: - AB / Siege / jMeter - Goad / Vegeta - etc. etc.
  12. Best Practices STRESS TEST -> GOAD - Tool for distributed

    load - Written in Go - Uses Lambda - Up to 100k requests
  13. HotSale 2017 OBJETIVO - Infra para hotsale.com.ar - Automatizada /

    Autoescalable - Alta disponibilidad (HA) - Evitando SPOF - Que no rueden cabezas!
  14. HotSale 2017 LA PREVIA - Warm-up de tráfico - Escalar

    instancias EC2 - Agregar nodos ES - Sumar Read-Replica RDS - Morfi && mate! Monitoreando desde el “búnker” de Edrans...
  15. HotSale 2017 EL PICO - 15/05 00:16 AM - +40k

    usuarios concurrentes máx. - +65k request’s x minuto - 50 instancias EC2 (WEB + API) - 55% acceso mobile - 0% downtime =)
  16. HotSale 2017 EL PICO - 15/05 00:16 AM - +40k

    usuarios concurrentes máx. - +65k request’s x minuto - 50 instancias EC2 (WEB + API) - 55% acceso mobile - 0% downtime =)
  17. EN RESUMEN… - 2+ millones de usuarios "únicos" los 3

    días - 2 sesiones y 10 pageviews en promedio c/u - 2 cluster ECS (WEB + API) - Route 53 + CloudFront + S3 - ElasticSearch - SQS + Lambda + RDS - CloudWatch + NewRelic - Todo en container’s (Docker + ECS) - Levantado con Terraform (IaC) HotSale 2017