Slide 1

Slide 1 text

From bare metal boxes to Amazon ECS Our journey to the cloud.

Slide 2

Slide 2 text

Who am I? ● I’m the Lead Developer at DealTrak. ● I’ve been a developer for around 16 years and in the last few years have become increasingly interested in ops and system architecture. ● I work in an agile environment at our HQ in Leeds Dock with a team of 13 devs and 3 QA testers. ● I love cats!

Slide 3

Slide 3 text

Who are DealTrak? ● DealTrak provide a platform that helps car dealerships provide compliant motor F&I products to their customers. ● DealTrak acts as a connector between hundreds of finance and insurance providers, and offer customer relationship management and reporting. ● We processed over 2.5 million car finance proposals in 2017. ● We are on target to process 3.5 million proposals in 2018. ● We currently have 16% of the market and are now aiming for 40% by 2020. ● This means from an infrastructure point of view that we need to radically change in order to scale.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

The previous state of affairs ● All bare metal servers managed by a friendly York based hosting company. ● We had a shared load balancer and firewall with no direct control. ● 4 application servers running Ubuntu Server. ● 1 UAT server running Docker containers. ● 2 Redis Cache servers in master-slave configuration monitored by Sentinel. ● MySQL 5.6 with one master and 3 replicated slaves. ● We used a GlusterFS data store for documents. ● We did and still use Docker to power Dev and UAT environments. ● It was very difficult to scale due to time overhead!

Slide 6

Slide 6 text

Why did we move to AWS ● Massively improved scalability with a minimal time overhead. ● Auto scaling, when demand requires. ● Flexibility! Dozens of services at our fingertips. ● Increased performance, Aurora is 5x faster than vanilla MySQL. ● Increased resiliency - multiple availability zones in the UK region. ● Self healing architecture - Cloudwatch alarms trigger self repair. ● Reduced costs, yet much more functionality is available. ● Compliant and uses best security practices. ● We have direct control over our infrastructure. ● Improved automation for deployment of AWS infrastructure via their APIs.

Slide 7

Slide 7 text

AWS Services Used ● Virtual Private Cloud (VPC) - like a virtual network. ● Route 53 for DNS. ● Aurora - A high availability DB with MySQL compatibility (using encryption). ● ElastiCache - with Redis compatibility (cluster mode). ● Elastic Load Balancer for routing traffic (ELB). ● EC2 (for both ECS and stand alone). ● Elastic Container Storage (ECS). ● Elastic Container Repository (ECR). ● S3 for Encrypted Document Storage. ● Simple Queue Service (SQS). ● Cloud Watch for logs and auditing. ● IAM for role management.

Slide 8

Slide 8 text

Logical Infrastructure Diagram

Slide 9

Slide 9 text

Amazon Elastic Container Service (ECS) ● It uses a special variant of an EC2 AMI that contain the necessary tools like Docker and the ECS agent. ● The ECS service interacts with the agent on each EC2 instance in order to control containers. ● A task definition is a blueprint for your application (like Docker compose). ● A service maintains copies of the task definition in your EC2 cluster and will auto recover any stopped tasks and maintain number of running instances. ● By default containers are deployed evenly across availability zones for maximum redundancy. There are different placement strategies available. ● Containers will fit on an EC2 instance based on available RAM and CPU credits available. You assign requirements in the task definition.

Slide 10

Slide 10 text

Our Docker Environment ● On Docker build, everything is packaged into the image including production code. ● The same image is used for Dev, UAT and Live environments. ● Images are stored in ECR and authenticated via aws cli ecr get-login. ● For dev, a volume is mounted via Docker Compose. ● We use environment variables to power environment config (via DotEnv), now integrated into Symfony making life easier. ● Config files are stored on a restricted, encrypted S3 bucket and retrieved on deployment. ● We automate deployments via an Ansible playbook.

Slide 11

Slide 11 text

Ansible powered deployment process 1. Code is merged into master. 2. The Master branch is cloned locally and any config files or 3. certs are pulled in from S3. 4. Authenticate Docker with ECR. 5. Docker Image is built (including source code and composer packages) and then pushed up. 6. A new ECS task definition is created specifying the latest image. 7. The ECS service definition is updated with the new task definition. 8. The new image is deployed to EC2 instances by ECS. 9. Running containers are cycled based on the defined deployment strategy. Only healthy containers make it into the mix.

Slide 12

Slide 12 text

Infrastructure is managed with Ansible ● Ansible is flexible and can be used for all our various infrastructure needs. ● Can be used for both infrastructure deployments and host automation. ● State is idempotent. We can run a playbook and it will only make required changes based on state, rather than repeating everything. ● There are a wide range of modules are available for various tasks. ● No agent is required on remote hosts, only requires SSH and Python. ● It’s easy for people to run the playbooks from any host. ● We already had some knowledge and experience. ● It is an easy learning curve to bring other team members on board. ● Ansible is mature and owned by RedHat.

Slide 13

Slide 13 text

The zero down time switcharoo (well, almost zero) ● Replication was setup from a dedicated MySQL slave over a secure VPN connection to Aurora. ● Moved DNS name servers over to Route 53. ● Made sure DNS TTL is a small as possible. ● Stop all web server instances and cron jobs. ● Stop master replication to slaves and firewall off. ● Stop replication on Aurora. ● Deploy application production image with correct env vars e.g. DB host. ● Swap DNS entries over. ● Test like crazy. On the day (in the wee early hours)

Slide 14

Slide 14 text

Secure by design ● Separate private subnets exist in our VPC for the application layer and data store layer. ● Access into a private subnet is only possible from our site to site VPN with a dynamic Bastion host as a backup. ● For example only hosts in the application tier can connect to the database and cache hosts. ● Security groups implement access control, like a firewall, which are assigned to resources like an EC2 instance. ● Only required routes exist, restricting data flow. ● Our only public hosts are our mail server and bastion host, which is a backup access open only created on the fly when required.

Slide 15

Slide 15 text

What is next? ● An API centric architecture. ● Continuous integration. ● A Redshift powered data warehouse. ● More rewriting of our legacy code base! ● Move to Symfony 4 and PHP 7 (we have a mix). ● Fix all the bugs. ● Fix the bugs arising from fixing the bugs. ● Go to the pub.

Slide 16

Slide 16 text

No content