Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TomTom NavCloud on AWS

TomTom NavCloud on AWS

A talk from AWS Amsterdam meetup

Dmitry Ivanov

April 29, 2014
Tweet

More Decks by Dmitry Ivanov

Other Decks in Programming

Transcript

  1. NAVCLOUD • A cloud-based storage service that allows users to

    seamlessly synchronize trip information between devices as well as share and receive navigation information with other people (e.g., friends or companies). • NavCloud aims to be scalable and reactive while ensuring privacy and security.
  2. Architecture Horizontal scaling • Stateless* API nodes. • No direct

    interconnection between API nodes. • Riak scales horizontally very well.
  3. But Why AWS? • Embracing DevOps. • Embracing (horizontal) scalability.

    • A whole ecosystem of different services helping to solve (almost) any task.
  4. Our AWS approach • We allocate resources with CloudFormation stacks.

    • We build stacks inside of VPC. • We use S3 to store files, backups and logs. • We manage our DNS records with Route53.
  5. Syncing Data Problem Client wants to receive updates in background.

    Solutions Client Server • Polling Client Server • Streaming
  6. Streaming Riak Cluster HTTP API node … … HTTP API

    node HTTP API node Clients HTTP 1.1 chunked HTTP 1.1 chunked HTTP 1.1 chunked
  7. Streaming on AWS • All requests go via ELB. •

    Using TCP/SSL listener instead. Proxy protocol support. • ELB closes connections with timeout (60s). • Sending ‘heartbeat’ messages to keep connection alive. • HTTP(s) listener: issue with RST tcp message.
  8. Your Own Load Balancer? • HAProxy, Nginx, Apache. • Full

    and real-time access to logs. • Configurability. • But: HA setup? Multi-AZ? • But: Security setup. Tradeoff: configurability vs simplicity
  9. Streaming on AWS Improved • Amazon’s good practice. • But:

    API node should be directly accessible. • More improvements: distributed events (across API nodes) instead of polling storage for updates. • Message Queue (RabbitMQ cluster) 
 with Fanout pattern. • AWS alternative: google SNS + SQS fanout pattern.
  10. ELB ‘features’ :) • Performance tests. Pre-warming. • Really easy

    to hit it beyond ~10K concurrent connections. • Request Amazon support to pre-warm or just run tests for some time without measuring. • Logs access. Improved lately (export to S3) …
  11. Monitoring • We are investigating StackDriver (stackdriver.com). • Third-party monitoring

    tool with rich and customizable UI. • Custom application metrics. • Supports monitoring of a lot of standard services out of the box: Riak, Message Queue services, App containers.
  12. Provisioning CloudFormation! • JSON script that describes the whole stack.

    • Automatic resources lifecycle management. • VPC, Security, Route53 records, S3, EC2 -> everything is managed inside CF scripts. • Currently we are stuck with monolithic CF
 script -> 3000 LOC. Not very manageable.
  13. Deployment • We use Python boto library to talk to

    AWS services. Including calling our own scripts during CloudFormation stack setup. • Python scripts + shell scripts (AWS SDK CLI). • Capistrano for doing distributed tasks.
  14. Capistrano Capistrano - a remote server automation and deployment tool

    written in Ruby. • Agent-less: Needs ssh and POSIX-compatible shell. That’s it. • Routing out of the box (connecting via ssh router).
  15. Capistrano with CF stacks Problem: dynamic nature of AWS resources.

    IP addresses can’t be hardcoded. Solution: Auto-discovery of CF resources 
 (e.g. stacks, hosts) is a part of Capistrano job.
  16. Capistrano with CF stacks • lsfleet is a simple shell

    script that queries the CloudFormation API and returns ip addresses of instances within supplied Auto-Scaling Group. • Could be done even easier with Ruby AWS SDK.
  17. Capistrano Use Cases • Distributing application across the whole App

    stack (Deploying to different ‘dev’ CF stacks). • Gathering log files. • Getting some OS-related stats from all nodes. Interactively invoke commands on all nodes of ASG
  18. Capistrano: why bother? Before! • A huge (480+304 LOC) shell

    scripts for app deployment. • Doing manual ssh routing, etc. After! • Capfile ~70 LOC & helper shell scripts (50+153 LOC) • Easier to maintain. • Capistrano params: easier configurable.
  19. ‘Switching’ The Stacks • Allows to fully automate dev environment

    updates. 
 Can be a Continuos Integration job! • Decreasing the downtime. • Procedure: 1. Provision the new CF stack using Python boto script. 2. Download & Apply the latest backup from S3 using shell script & s3 cmd tool. 3. Switch the Route53 DNS record using AWS API.