Growing Up With AWS

3ca5501cb61a4251bd1e6f0a878bb8d4?s=47 Michael Hamrah
January 24, 2019
4

Growing Up With AWS

3ca5501cb61a4251bd1e6f0a878bb8d4?s=128

Michael Hamrah

January 24, 2019
Tweet

Transcript

  1. Growing Up With AWS Michael Hamrah / Riadh Amari 1

  2. Namely An HR SaaS 550+ Employees 140+ Engineers

  3. 3 A Story of Growth

  4. 4 AWS Bills Payments

  5. 5 Observability Continuous Integration Continuous Delivery/Deployment Runtime (Kubernetes/Docker/App Servers) Server

    Infrastructure (AWS) Operations Configuration Management Environment Management Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs Security
  6. 6 SignalFx, Logz.io, New Relic Jenkins Spinnaker/Octopus Kubernetes + Istio

    AWS Spinnaker, kubectl, Kubernetes, ad-hoc Kubernetes/Octopus Terraform Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs IAM, RBAC, Networking, VPN, Secrets
  7. There is no right answer, only various degrees of wrong.

    We experiment, learn, decide, act, rinse, repeat and improve!
  8. 8 Welcome to Namely Infra AWS (Virginia) Production 10.50.0.0/16 Int

    10.52.0.0/16 Stage 10.51.0.0/16 VendorX 10.53.0.0/16 Ops 10.54.0.0/16 Portal IT 172.16.0.0/16 An environment is: • An AWS account and permissions • A VPC • Route tables • Everything required to run Namely • The ability to deploy components
  9. None
  10. None
  11. None
  12. 12 Environment Basics a Peering Public ELB(s) Internet Gateway Server1

    Server2 Jumpboxes Server3 VPC A bunch of RDS Alotta ElasticCache Some Aurora CloudFront S3 Kubernetes 15 workers 3 masters 5 etcd
  13. 13 $17,000/mo on NAT Gateways 380 TB

  14. VPC Endpoints

  15. 15 $1,500/mo on NAT Gateways 33 TB

  16. 16 Kubernetes Cluster Etcd0 Etcd1 Etcd2 Etcd3 Etcd4 Master0 Master1

    Master2 Worker0 Worker2 Worker3 Worker1 Worker... Worker15 State is stored here Does most of the k8s work Where stuff runs
  17. 17 EKS Cluster Worker0 Worker2 Worker3 Worker1 Worker... Worker15 Where

    stuff runs Better networking
  18. 18 Latency Improvements with EKS

  19. OOPS, I IOPS’ED Datawarehouse : EC2 + MSSQL Server (Data

    across multiple EBS) I/O Concurrency issues => Latency Transition Volume Type from gp2 to io1 with no downtime I IOPS’ED
  20. 20 RDS : Multiple DB flavors (MySQL, Postgres, Aurora) Issue:

    Replication lag too high (Read Replica =Latency) for the nightly refreshes. Solution: Enabling IOPS (Master + Read Replicas)! I IOPS’ED
  21. DB Encryption Master KMS keys shared across accounts! EBS volumes

    encrypted at rest for EC2. RDS Encryption enabled! Encryption of data in Transit Compliance NY State Cybersecurity Requirements for Financial Services
  22. ElastiCache : Redis Heavy usage of Redis: 50+ instances(Millions Keys)!

    Transitioning from Redis instances to Redis clusters: Sharding + data partitioning
  23. 23 Automation: Terraform

  24. None
  25. 25 200+ Load Balancers

  26. Reserved Instances

  27. None
  28. Things we’re curious about • Container evolution (EKS -> Fargate)

    • Transit Gateway • Aurora Postgres HA • Lambda • EC2 Optimizations • Better Account Management
  29. Everything is an Investment We want a return. We must

    build up on what we’ve done.
  30. None
  31. None
  32. None
  33. None
  34. None