Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running Cassandra in AWS

Running Cassandra in AWS

In this presentation (first delivered at the Boston AWS meetup on April 8th, 2013), we highlight our reasons for choosing Cassandra versus other platforms and describe the novel architecture that allows us to tolerate inevitable failures that one would expect running at scale on AWS.

Stackdriver

April 08, 2013
Tweet

More Decks by Stackdriver

Other Decks in Technology

Transcript

  1. Running Cassandra in AWS
    Patrick Eaton, PhD
    [email protected]
    @PatrickREaton
    Joey Imbasciano
    [email protected]
    @_joeyi

    View full-size slide

  2. Stackdriver at a Glance
    Stackdriver's hosted monitoring service helps SaaS
    companies innovate more by reducing the burden of
    day-to-day operations
    ● Focus on complex distributed systems
    ● Founded by cloud/infrastructure industry veterans
    (Microsoft, VMware, EMC, Endeca, Red Hat) with deep
    systems and DevOps expertise
    ● Team of 15, based in Downtown Boston
    ● Private beta underway, let us know if you want to get
    involved

    View full-size slide

  3. Problem Domain
    Monitor customer cloud-hosted
    applications
    ● Inventory
    ● Services
    ● Performance data
    Analyze
    ● Groups
    ● Aggregation
    ● Report, recommend, alert,
    optimize...

    View full-size slide

  4. Lambda Architecture
    ● Typical of modern architectures for on-line applications.
    ● Formalized by Nathan Marz
    ● Composed of "batch", "speed", and "serving" layers
    ● Batch layer
    ○ Store of record
    ○ Compute arbitrary views
    ● Speed layer
    ○ Low latency updates
    ○ Streaming algorithms
    ● Serving layer
    ○ Combine data from batch and
    speed layers to answer queries
    Speed Batch
    Serving
    Data

    View full-size slide

  5. Stackdriver Architecture
    ● Shares characteristics of
    lambda architecture
    ● Analysis path
    ○ Compute aggregations
    ○ Create recommendations
    ● Indexing path
    ○ Make "live" data available
    "pre-analysis"
    ● Query layer
    ○ Combine "live" and analyzed
    data to answer queries
    ○ May require on-the-fly analysis
    ● Alerting path
    ○ Stream processing to detect
    policy-based anomalies (not discussed here)
    Database
    Data
    Query
    (Serving)
    Analysis
    (Batch)
    Indexing
    (Speed)
    Alerting
    (Speed)
    Notification
    (Serving)

    View full-size slide

  6. Database Options
    ● We chose Cassandra!
    ○ True P2P architecture
    ○ Good support for write-heavy workloads
    ○ Compatible data model
    ● Why not MySQL?
    ○ Experience with operating large, sharded deployments
    ○ Relational data model not a good match
    ● Why not HBase?
    ○ Operational complexity - zk, hadoop, hdfs, ...
    ○ Special "Master" role
    ● Why not Dynamo?
    ○ Avoid vendor lock-in and high cost

    View full-size slide

  7. Stackdriver Architecture ++
    ● Critical archival pipeline has very
    small surface area
    ● Data path has multiple recovery
    options
    ● Scales out easily
    ● Cassandra consolidates results of
    analysis (batch) and indexing (speed)
    ● Cassandra stores immutable data, so
    consistency is not a problem
    ● Cassandra is "soft state"
    Replicate
    Analyze
    Archive
    Index
    Cleanse
    Roll-ups
    Recs
    Analysis
    Inventory
    Data Series
    Data
    Query

    View full-size slide

  8. Cassandra at Stackdriver Cluster Configuration
    ● Version: Datastax Community Edition 1.2.3
    ● Replication Factor: 3
    ● Vnodes
    ● Murmur3Partitioner
    ● Ec2Snitch
    ○ Aids in request efficiency
    ○ Enables Cassandra to ensure replicas are in
    different Availability Zones
    ● phi_convict_threshold: 8 -> 12
    ○ Used to determine when nodes are down
    ○ AWS network can be spotty

    View full-size slide

  9. Cassandra Topology in AWS
    1
    1 4
    us-east-1a
    3 6
    us-east-1c
    2 5
    us-east-1b
    us-east-1a
    3
    us-east-1c
    2
    us-east-1b
    Where we started... Where we are...
    Keep it balanced!

    View full-size slide

  10. Cassandra EC2 Node Configuration
    ● m1.xlarge (4 cores, 15 GB RAM)
    ○ 4 ephemeral disks available
    ● 1 disk used for CommitLog
    ○ ext4 - defaults,noatime
    ○ Sequential Writes
    ● 3 disks RAID-0 for Data Volume
    ○ ext4 - defaults,noatime
    ○ mdadm RAID-0
    ○ Compactions
    ○ Heavy Read/Write IO

    View full-size slide

  11. Cassandra Automation and Operations
    ● Combination of Boto, Fabric, &
    Puppet
    ○ Boto for AWS API
    ○ Fabric + Puppet for Bootstrapping
    ○ Fabric for Operations
    ● One command to:
    ○ Launch a new cluster
    ○ Upsize a cluster
    ○ Replace a dead node
    ○ Remove existing nodes
    ○ List nodes in a cluster

    View full-size slide

  12. Our (Internal) Slogan

    View full-size slide

  13. Cassandra Backups using S3
    ● No Cassandra Powered Backups
    ● Restore from S3
    ● Useful for major version upgrades
    S3
    Bulk
    Loader
    Elastic
    Map
    Reduce
    Cassandra
    Data
    1. Data is archived when it is received
    2. Bulk loader reads from S3
    3. EMR re-analyzes data
    4. Cassandra is repopulated

    View full-size slide

  14. Cassandra Bulk Load In Action

    View full-size slide

  15. Thank you!
    Yes, we are hiring!
    Patrick Eaton - [email protected] - @PatrickREaton
    Joey Imbasciano - [email protected] - @_joeyi

    View full-size slide