Slide 1

Slide 1 text

Running Cassandra in AWS Patrick Eaton, PhD [email protected] @PatrickREaton Joey Imbasciano [email protected] @_joeyi

Slide 2

Slide 2 text

Stackdriver at a Glance Stackdriver's hosted monitoring service helps SaaS companies innovate more by reducing the burden of day-to-day operations ● Focus on complex distributed systems ● Founded by cloud/infrastructure industry veterans (Microsoft, VMware, EMC, Endeca, Red Hat) with deep systems and DevOps expertise ● Team of 15, based in Downtown Boston ● Private beta underway, let us know if you want to get involved

Slide 3

Slide 3 text

Problem Domain Monitor customer cloud-hosted applications ● Inventory ● Services ● Performance data Analyze ● Groups ● Aggregation ● Report, recommend, alert, optimize...

Slide 4

Slide 4 text

Lambda Architecture ● Typical of modern architectures for on-line applications. ● Formalized by Nathan Marz ● Composed of "batch", "speed", and "serving" layers ● Batch layer ○ Store of record ○ Compute arbitrary views ● Speed layer ○ Low latency updates ○ Streaming algorithms ● Serving layer ○ Combine data from batch and speed layers to answer queries Speed Batch Serving Data

Slide 5

Slide 5 text

Stackdriver Architecture ● Shares characteristics of lambda architecture ● Analysis path ○ Compute aggregations ○ Create recommendations ● Indexing path ○ Make "live" data available "pre-analysis" ● Query layer ○ Combine "live" and analyzed data to answer queries ○ May require on-the-fly analysis ● Alerting path ○ Stream processing to detect policy-based anomalies (not discussed here) Database Data Query (Serving) Analysis (Batch) Indexing (Speed) Alerting (Speed) Notification (Serving)

Slide 6

Slide 6 text

Database Options ● We chose Cassandra! ○ True P2P architecture ○ Good support for write-heavy workloads ○ Compatible data model ● Why not MySQL? ○ Experience with operating large, sharded deployments ○ Relational data model not a good match ● Why not HBase? ○ Operational complexity - zk, hadoop, hdfs, ... ○ Special "Master" role ● Why not Dynamo? ○ Avoid vendor lock-in and high cost

Slide 7

Slide 7 text

Stackdriver Architecture ++ ● Critical archival pipeline has very small surface area ● Data path has multiple recovery options ● Scales out easily ● Cassandra consolidates results of analysis (batch) and indexing (speed) ● Cassandra stores immutable data, so consistency is not a problem ● Cassandra is "soft state" Replicate Analyze Archive Index Cleanse Roll-ups Recs Analysis Inventory Data Series Data Query

Slide 8

Slide 8 text

Cassandra at Stackdriver Cluster Configuration ● Version: Datastax Community Edition 1.2.3 ● Replication Factor: 3 ● Vnodes ● Murmur3Partitioner ● Ec2Snitch ○ Aids in request efficiency ○ Enables Cassandra to ensure replicas are in different Availability Zones ● phi_convict_threshold: 8 -> 12 ○ Used to determine when nodes are down ○ AWS network can be spotty

Slide 9

Slide 9 text

Cassandra Topology in AWS 1 1 4 us-east-1a 3 6 us-east-1c 2 5 us-east-1b us-east-1a 3 us-east-1c 2 us-east-1b Where we started... Where we are... Keep it balanced!

Slide 10

Slide 10 text

Cassandra EC2 Node Configuration ● m1.xlarge (4 cores, 15 GB RAM) ○ 4 ephemeral disks available ● 1 disk used for CommitLog ○ ext4 - defaults,noatime ○ Sequential Writes ● 3 disks RAID-0 for Data Volume ○ ext4 - defaults,noatime ○ mdadm RAID-0 ○ Compactions ○ Heavy Read/Write IO

Slide 11

Slide 11 text

Cassandra Automation and Operations ● Combination of Boto, Fabric, & Puppet ○ Boto for AWS API ○ Fabric + Puppet for Bootstrapping ○ Fabric for Operations ● One command to: ○ Launch a new cluster ○ Upsize a cluster ○ Replace a dead node ○ Remove existing nodes ○ List nodes in a cluster

Slide 12

Slide 12 text

Our (Internal) Slogan

Slide 13

Slide 13 text

Cassandra Backups using S3 ● No Cassandra Powered Backups ● Restore from S3 ● Useful for major version upgrades S3 Bulk Loader Elastic Map Reduce Cassandra Data 1. Data is archived when it is received 2. Bulk loader reads from S3 3. EMR re-analyzes data 4. Cassandra is repopulated

Slide 14

Slide 14 text

Cassandra Bulk Load In Action

Slide 15

Slide 15 text

Thank you! Yes, we are hiring! Patrick Eaton - [email protected] - @PatrickREaton Joey Imbasciano - [email protected] - @_joeyi