TomTom NavCloud on AWS

NAVCLOUD ON AWS AWS Amsterdam Meetup! April 29th 2014

NAVCLOUD • A cloud-based storage service that allows users to
seamlessly synchronize trip information between devices as well as share and receive navigation information with other people (e.g., friends or companies). • NavCloud aims to be scalable and reactive while ensuring privacy and security.

The Team Full stack developers • Server • Mobile /
SDKs • Systems / AWS

Architecture Riak Cluster HTTP(s) API node … … HTTP(s) API
node HTTP(s) API node Clients

Architecture Horizontal scaling • Stateless* API nodes. • No direct
interconnection between API nodes. • Riak scales horizontally very well.

But Why AWS? • Embracing DevOps. • Embracing (horizontal) scalability.
• A whole ecosystem of different services helping to solve (almost) any task.

Our AWS approach • We allocate resources with CloudFormation stacks.
• We build stacks inside of VPC. • We use S3 to store ﬁles, backups and logs. • We manage our DNS records with Route53.

So What About AWS?

Syncing Data Problem Client wants to receive updates in background.
Solutions Client Server • Polling Client Server • Streaming

Streaming Riak Cluster HTTP API node … … HTTP API
node HTTP API node Clients HTTP 1.1 chunked HTTP 1.1 chunked HTTP 1.1 chunked

Streaming on AWS

Streaming on AWS • All requests go via ELB. •
Using TCP/SSL listener instead. Proxy protocol support. • ELB closes connections with timeout (60s). • Sending ‘heartbeat’ messages to keep connection alive. • HTTP(s) listener: issue with RST tcp message.

Your Own Load Balancer? • HAProxy, Nginx, Apache. • Full
and real-time access to logs. • Conﬁgurability. • But: HA setup? Multi-AZ? • But: Security setup. Tradeoff: conﬁgurability vs simplicity

Streaming on AWS Improved

Streaming on AWS Improved • Amazon’s good practice. • But:
API node should be directly accessible. • More improvements: distributed events (across API nodes) instead of polling storage for updates. • Message Queue (RabbitMQ cluster)   with Fanout pattern. • AWS alternative: google SNS + SQS fanout pattern.

Streaming on AWS Improved

ELB ‘features’ :) • Performance tests. Pre-warming. • Really easy
to hit it beyond ~10K concurrent connections. • Request Amazon support to pre-warm or just run tests for some time without measuring. • Logs access. Improved lately (export to S3) …

Monitoring • We are investigating StackDriver (stackdriver.com). • Third-party monitoring
tool with rich and customizable UI. • Custom application metrics. • Supports monitoring of a lot of standard services out of the box: Riak, Message Queue services, App containers.

Monitoring

Provisioning CloudFormation! • JSON script that describes the whole stack.
• Automatic resources lifecycle management. • VPC, Security, Route53 records, S3, EC2 -> everything is managed inside CF scripts. • Currently we are stuck with monolithic CF  script -> 3000 LOC. Not very manageable.

Deployment • We use Python boto library to talk to
AWS services. Including calling our own scripts during CloudFormation stack setup. • Python scripts + shell scripts (AWS SDK CLI). • Capistrano for doing distributed tasks.

Capistrano Capistrano - a remote server automation and deployment tool
written in Ruby. • Agent-less: Needs ssh and POSIX-compatible shell. That’s it. • Routing out of the box (connecting via ssh router).

Capistrano with CF stacks Problem: dynamic nature of AWS resources.
IP addresses can’t be hardcoded. Solution: Auto-discovery of CF resources   (e.g. stacks, hosts) is a part of Capistrano job.

Capistrano with CF stacks

Capistrano with CF stacks • lsﬂeet is a simple shell
script that queries the CloudFormation API and returns ip addresses of instances within supplied Auto-Scaling Group. • Could be done even easier with Ruby AWS SDK.

Capistrano Use Cases • Distributing application across the whole App
stack (Deploying to different ‘dev’ CF stacks). • Gathering log ﬁles. • Getting some OS-related stats from all nodes. Interactively invoke commands on all nodes of ASG

Capistrano: why bother? Before! • A huge (480+304 LOC) shell
scripts for app deployment. • Doing manual ssh routing, etc. After! • Capﬁle ~70 LOC & helper shell scripts (50+153 LOC) • Easier to maintain. • Capistrano params: easier conﬁgurable.

‘Switching’ The Stacks • Allows to fully automate dev environment
updates.   Can be a Continuos Integration job! • Decreasing the downtime. • Procedure: 1. Provision the new CF stack using Python boto script. 2. Download & Apply the latest backup from S3 using shell script & s3 cmd tool. 3. Switch the Route53 DNS record using AWS API.

Questions? Dmitry Ivanov @idajantis [email protected] Vincenzo Vitale [email protected] Nami Nasserazad
[email protected] @nami4552 @sicilianamente

TomTom NavCloud on AWS

TomTom NavCloud on AWS

Dmitry Ivanov

More Decks by Dmitry Ivanov

Other Decks in Programming

Featured

Transcript

NAVCLOUD ON AWS AWS Amsterdam Meetup! April 29th 2014

NAVCLOUD • A cloud-based storage service that allows users to

The Team Full stack developers • Server • Mobile /

Architecture Riak Cluster HTTP(s) API node … … HTTP(s) API

Architecture Horizontal scaling • Stateless* API nodes. • No direct

But Why AWS? • Embracing DevOps. • Embracing (horizontal) scalability.

Our AWS approach • We allocate resources with CloudFormation stacks.

So What About AWS?

Syncing Data Problem Client wants to receive updates in background.

Streaming Riak Cluster HTTP API node … … HTTP API

Streaming on AWS

Streaming on AWS • All requests go via ELB. •

Your Own Load Balancer? • HAProxy, Nginx, Apache. • Full

Streaming on AWS Improved

Streaming on AWS Improved • Amazon’s good practice. • But:

Streaming on AWS Improved

ELB ‘features’ :) • Performance tests. Pre-warming. • Really easy

Monitoring • We are investigating StackDriver (stackdriver.com). • Third-party monitoring

Monitoring

Provisioning CloudFormation! • JSON script that describes the whole stack.

Deployment • We use Python boto library to talk to

Capistrano Capistrano - a remote server automation and deployment tool

Capistrano with CF stacks Problem: dynamic nature of AWS resources.

Capistrano with CF stacks

Capistrano with CF stacks • lsﬂeet is a simple shell

Capistrano Use Cases • Distributing application across the whole App

Capistrano: why bother? Before! • A huge (480+304 LOC) shell

‘Switching’ The Stacks • Allows to fully automate dev environment

Questions? Dmitry Ivanov @idajantis [email protected] Vincenzo Vitale [email protected] Nami Nasserazad