Slide 1

Slide 1 text

Growing Up With AWS Michael Hamrah / Riadh Amari 1

Slide 2

Slide 2 text

Namely An HR SaaS 550+ Employees 140+ Engineers

Slide 3

Slide 3 text

3 A Story of Growth

Slide 4

Slide 4 text

4 AWS Bills Payments

Slide 5

Slide 5 text

5 Observability Continuous Integration Continuous Delivery/Deployment Runtime (Kubernetes/Docker/App Servers) Server Infrastructure (AWS) Operations Configuration Management Environment Management Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs Security

Slide 6

Slide 6 text

6 SignalFx, Logz.io, New Relic Jenkins Spinnaker/Octopus Kubernetes + Istio AWS Spinnaker, kubectl, Kubernetes, ad-hoc Kubernetes/Octopus Terraform Foundational Infrastructure Ease of Development, Testing and Delivery Meeting Production SLOs IAM, RBAC, Networking, VPN, Secrets

Slide 7

Slide 7 text

There is no right answer, only various degrees of wrong. We experiment, learn, decide, act, rinse, repeat and improve!

Slide 8

Slide 8 text

8 Welcome to Namely Infra AWS (Virginia) Production 10.50.0.0/16 Int 10.52.0.0/16 Stage 10.51.0.0/16 VendorX 10.53.0.0/16 Ops 10.54.0.0/16 Portal IT 172.16.0.0/16 An environment is: ● An AWS account and permissions ● A VPC ● Route tables ● Everything required to run Namely ● The ability to deploy components

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

12 Environment Basics a Peering Public ELB(s) Internet Gateway Server1 Server2 Jumpboxes Server3 VPC A bunch of RDS Alotta ElasticCache Some Aurora CloudFront S3 Kubernetes 15 workers 3 masters 5 etcd

Slide 13

Slide 13 text

13 $17,000/mo on NAT Gateways 380 TB

Slide 14

Slide 14 text

VPC Endpoints

Slide 15

Slide 15 text

15 $1,500/mo on NAT Gateways 33 TB

Slide 16

Slide 16 text

16 Kubernetes Cluster Etcd0 Etcd1 Etcd2 Etcd3 Etcd4 Master0 Master1 Master2 Worker0 Worker2 Worker3 Worker1 Worker... Worker15 State is stored here Does most of the k8s work Where stuff runs

Slide 17

Slide 17 text

17 EKS Cluster Worker0 Worker2 Worker3 Worker1 Worker... Worker15 Where stuff runs Better networking

Slide 18

Slide 18 text

18 Latency Improvements with EKS

Slide 19

Slide 19 text

OOPS, I IOPS’ED Datawarehouse : EC2 + MSSQL Server (Data across multiple EBS) I/O Concurrency issues => Latency Transition Volume Type from gp2 to io1 with no downtime I IOPS’ED

Slide 20

Slide 20 text

20 RDS : Multiple DB flavors (MySQL, Postgres, Aurora) Issue: Replication lag too high (Read Replica =Latency) for the nightly refreshes. Solution: Enabling IOPS (Master + Read Replicas)! I IOPS’ED

Slide 21

Slide 21 text

DB Encryption Master KMS keys shared across accounts! EBS volumes encrypted at rest for EC2. RDS Encryption enabled! Encryption of data in Transit Compliance NY State Cybersecurity Requirements for Financial Services

Slide 22

Slide 22 text

ElastiCache : Redis Heavy usage of Redis: 50+ instances(Millions Keys)! Transitioning from Redis instances to Redis clusters: Sharding + data partitioning

Slide 23

Slide 23 text

23 Automation: Terraform

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

25 200+ Load Balancers

Slide 26

Slide 26 text

Reserved Instances

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Things we’re curious about ● Container evolution (EKS -> Fargate) ● Transit Gateway ● Aurora Postgres HA ● Lambda ● EC2 Optimizations ● Better Account Management

Slide 29

Slide 29 text

Everything is an Investment We want a return. We must build up on what we’ve done.

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content