Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From bare metal boxes to Amazon ECS

From bare metal boxes to Amazon ECS

My talk tells the story of how DealTrak moved from over a decade of bare metal hosting to AWS with practically zero downtime, and the technologies we opted for. We use Docker consistently across all our environments, and manage our infrastructure - including deployments and spinning up UAT instances via Ansible playbooks.

You can find me at https://www.linkedin.com/in/marktaylor78/

Mark Taylor

June 21, 2018
Tweet

More Decks by Mark Taylor

Other Decks in Technology

Transcript

  1. Who am I? • I’m the Lead Developer at DealTrak.

    • I’ve been a developer for around 16 years and in the last few years have become increasingly interested in ops and system architecture. • I work in an agile environment at our HQ in Leeds Dock with a team of 13 devs and 3 QA testers. • I love cats!
  2. Who are DealTrak? • DealTrak provide a platform that helps

    car dealerships provide compliant motor F&I products to their customers. • DealTrak acts as a connector between hundreds of finance and insurance providers, and offer customer relationship management and reporting. • We processed over 2.5 million car finance proposals in 2017. • We are on target to process 3.5 million proposals in 2018. • We currently have 16% of the market and are now aiming for 40% by 2020. • This means from an infrastructure point of view that we need to radically change in order to scale.
  3. The previous state of affairs • All bare metal servers

    managed by a friendly York based hosting company. • We had a shared load balancer and firewall with no direct control. • 4 application servers running Ubuntu Server. • 1 UAT server running Docker containers. • 2 Redis Cache servers in master-slave configuration monitored by Sentinel. • MySQL 5.6 with one master and 3 replicated slaves. • We used a GlusterFS data store for documents. • We did and still use Docker to power Dev and UAT environments. • It was very difficult to scale due to time overhead!
  4. Why did we move to AWS • Massively improved scalability

    with a minimal time overhead. • Auto scaling, when demand requires. • Flexibility! Dozens of services at our fingertips. • Increased performance, Aurora is 5x faster than vanilla MySQL. • Increased resiliency - multiple availability zones in the UK region. • Self healing architecture - Cloudwatch alarms trigger self repair. • Reduced costs, yet much more functionality is available. • Compliant and uses best security practices. • We have direct control over our infrastructure. • Improved automation for deployment of AWS infrastructure via their APIs.
  5. AWS Services Used • Virtual Private Cloud (VPC) - like

    a virtual network. • Route 53 for DNS. • Aurora - A high availability DB with MySQL compatibility (using encryption). • ElastiCache - with Redis compatibility (cluster mode). • Elastic Load Balancer for routing traffic (ELB). • EC2 (for both ECS and stand alone). • Elastic Container Storage (ECS). • Elastic Container Repository (ECR). • S3 for Encrypted Document Storage. • Simple Queue Service (SQS). • Cloud Watch for logs and auditing. • IAM for role management.
  6. Amazon Elastic Container Service (ECS) • It uses a special

    variant of an EC2 AMI that contain the necessary tools like Docker and the ECS agent. • The ECS service interacts with the agent on each EC2 instance in order to control containers. • A task definition is a blueprint for your application (like Docker compose). • A service maintains copies of the task definition in your EC2 cluster and will auto recover any stopped tasks and maintain number of running instances. • By default containers are deployed evenly across availability zones for maximum redundancy. There are different placement strategies available. • Containers will fit on an EC2 instance based on available RAM and CPU credits available. You assign requirements in the task definition.
  7. Our Docker Environment • On Docker build, everything is packaged

    into the image including production code. • The same image is used for Dev, UAT and Live environments. • Images are stored in ECR and authenticated via aws cli ecr get-login. • For dev, a volume is mounted via Docker Compose. • We use environment variables to power environment config (via DotEnv), now integrated into Symfony making life easier. • Config files are stored on a restricted, encrypted S3 bucket and retrieved on deployment. • We automate deployments via an Ansible playbook.
  8. Ansible powered deployment process 1. Code is merged into master.

    2. The Master branch is cloned locally and any config files or 3. certs are pulled in from S3. 4. Authenticate Docker with ECR. 5. Docker Image is built (including source code and composer packages) and then pushed up. 6. A new ECS task definition is created specifying the latest image. 7. The ECS service definition is updated with the new task definition. 8. The new image is deployed to EC2 instances by ECS. 9. Running containers are cycled based on the defined deployment strategy. Only healthy containers make it into the mix.
  9. Infrastructure is managed with Ansible • Ansible is flexible and

    can be used for all our various infrastructure needs. • Can be used for both infrastructure deployments and host automation. • State is idempotent. We can run a playbook and it will only make required changes based on state, rather than repeating everything. • There are a wide range of modules are available for various tasks. • No agent is required on remote hosts, only requires SSH and Python. • It’s easy for people to run the playbooks from any host. • We already had some knowledge and experience. • It is an easy learning curve to bring other team members on board. • Ansible is mature and owned by RedHat.
  10. The zero down time switcharoo (well, almost zero) • Replication

    was setup from a dedicated MySQL slave over a secure VPN connection to Aurora. • Moved DNS name servers over to Route 53. • Made sure DNS TTL is a small as possible. • Stop all web server instances and cron jobs. • Stop master replication to slaves and firewall off. • Stop replication on Aurora. • Deploy application production image with correct env vars e.g. DB host. • Swap DNS entries over. • Test like crazy. On the day (in the wee early hours)
  11. Secure by design • Separate private subnets exist in our

    VPC for the application layer and data store layer. • Access into a private subnet is only possible from our site to site VPN with a dynamic Bastion host as a backup. • For example only hosts in the application tier can connect to the database and cache hosts. • Security groups implement access control, like a firewall, which are assigned to resources like an EC2 instance. • Only required routes exist, restricting data flow. • Our only public hosts are our mail server and bastion host, which is a backup access open only created on the fly when required.
  12. What is next? • An API centric architecture. • Continuous

    integration. • A Redshift powered data warehouse. • More rewriting of our legacy code base! • Move to Symfony 4 and PHP 7 (we have a mix). • Fix all the bugs. • Fix the bugs arising from fixing the bugs. • Go to the pub.