NAVCLOUD • A cloud-based storage service that allows users to seamlessly synchronize trip information between devices as well as share and receive navigation information with other people (e.g., friends or companies). • NavCloud aims to be scalable and reactive while ensuring privacy and security.
Our AWS approach • We allocate resources with CloudFormation stacks. • We build stacks inside of VPC. • We use S3 to store files, backups and logs. • We manage our DNS records with Route53.
Streaming on AWS • All requests go via ELB. • Using TCP/SSL listener instead. Proxy protocol support. • ELB closes connections with timeout (60s). • Sending ‘heartbeat’ messages to keep connection alive. • HTTP(s) listener: issue with RST tcp message.
Your Own Load Balancer? • HAProxy, Nginx, Apache. • Full and real-time access to logs. • Configurability. • But: HA setup? Multi-AZ? • But: Security setup. Tradeoff: configurability vs simplicity
Streaming on AWS Improved • Amazon’s good practice. • But: API node should be directly accessible. • More improvements: distributed events (across API nodes) instead of polling storage for updates. • Message Queue (RabbitMQ cluster) with Fanout pattern. • AWS alternative: google SNS + SQS fanout pattern.
ELB ‘features’ :) • Performance tests. Pre-warming. • Really easy to hit it beyond ~10K concurrent connections. • Request Amazon support to pre-warm or just run tests for some time without measuring. • Logs access. Improved lately (export to S3) …
Monitoring • We are investigating StackDriver (stackdriver.com). • Third-party monitoring tool with rich and customizable UI. • Custom application metrics. • Supports monitoring of a lot of standard services out of the box: Riak, Message Queue services, App containers.
Provisioning CloudFormation! • JSON script that describes the whole stack. • Automatic resources lifecycle management. • VPC, Security, Route53 records, S3, EC2 -> everything is managed inside CF scripts. • Currently we are stuck with monolithic CF script -> 3000 LOC. Not very manageable.
Deployment • We use Python boto library to talk to AWS services. Including calling our own scripts during CloudFormation stack setup. • Python scripts + shell scripts (AWS SDK CLI). • Capistrano for doing distributed tasks.
Capistrano Capistrano - a remote server automation and deployment tool written in Ruby. • Agent-less: Needs ssh and POSIX-compatible shell. That’s it. • Routing out of the box (connecting via ssh router).
Capistrano with CF stacks Problem: dynamic nature of AWS resources. IP addresses can’t be hardcoded. Solution: Auto-discovery of CF resources (e.g. stacks, hosts) is a part of Capistrano job.
Capistrano with CF stacks • lsfleet is a simple shell script that queries the CloudFormation API and returns ip addresses of instances within supplied Auto-Scaling Group. • Could be done even easier with Ruby AWS SDK.
Capistrano Use Cases • Distributing application across the whole App stack (Deploying to different ‘dev’ CF stacks). • Gathering log files. • Getting some OS-related stats from all nodes. Interactively invoke commands on all nodes of ASG
‘Switching’ The Stacks • Allows to fully automate dev environment updates. Can be a Continuos Integration job! • Decreasing the downtime. • Procedure: 1. Provision the new CF stack using Python boto script. 2. Download & Apply the latest backup from S3 using shell script & s3 cmd tool. 3. Switch the Route53 DNS record using AWS API.