Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Databases at Stackdriver

NoSQL Databases at Stackdriver

Choose carefully and re-evaluate often. This presentation was given at the first event of the AWS Super Users Online Meetup Group. Join here: http://www.meetup.com/AWS-Super-Users-Online-Meetup-Group/

Stackdriver

March 12, 2014
Tweet

More Decks by Stackdriver

Other Decks in Technology

Transcript

  1. • Stackdriver provides an intelligent monitoring service • Acquire billions

    of time series data points per day • Must write data at wire speeds • Read slices of data for graphing and analysis • Also write various aggregations and summarizations Why We Need a Database
  2. We Chose Cassandra Key Cassandra features • True P2P architecture

    • Replicates data across fault domains • EC2-aware data placement strategies • Good support for write-heavy workloads • Compatible data model for time series data • Automatic data expiration with TTLs Why not MySQL? • Relational data model not a good match • Experience with operating large, sharded deployments Why not HBase? • Operational complexity - zk, Hadoop, HDFS, ... • Special "master" role Why not Dynamo? • Avoid vendor lock-in and high cost
  3. Cassandra at Stackdriver Usage • Primary: 15 TB of Data

    Online, 50k+ writes/s • (Alerting: 1 GB of Data Online, 700 writes/s) EC2 node configuration • m1.xlarge instances ◦ 8 ECUs (4 Cores x 2 ECUs), 15 GB RAM ◦ 4 spinning disks via mdadm RAID-0 ◦ 1.7TB of available storage per node Cassandra Configuration • 36 nodes • Ec2Snitch (Availability Zone Aware) • Replication Factor: 3 • Vnodes • Cost = ~$12,500/month
  4. Growing Cassandra in AWS 1 us-east-1a us-east-1c us-east-1b us-east-1a 3

    us-east-1c 2 us-east-1b Where we started… Where we are...
  5. Automation in AWS • Combination of Boto, Fabric, & Puppet

    ◦ Boto for AWS API ◦ Fabric + Puppet for bootstrapping ◦ Fabric for operations • One CLI tool ◦ Launch a new cluster ◦ Upsize a cluster ◦ Replace a dead node ◦ Remove existing nodes ◦ List nodes in a cluster
  6. Today: Next Phase of Scale Option 1: Upsize cluster from

    36 nodes -> 48 nodes • Total cost: $16,500 / month (vs. $12,500 currently) • Pros: known configuration, grows existing cluster • Cons: more nodes, more problems • Bootstrapping takes day(s) Option 2: Build new cluster using 9 hi1.4xlarge • Total cost: $20,000 / month • 4x compute, 4x memory, SSD vs. spinning rust • Everybody is doing it (Netflix, Instagram) • Pros: less nodes, less problems • SSDs removes I/O bottleneck for compaction • Faster reads • Cons: unknown configuration, requires data reload
  7. Dynamo as an Alternative? Pros • Hosted • Automatic tuning

    • Automatic upgrades • Full-time operations • “Infinitely” scalable • Automatic scaling • Likely decreasing costs • AWS has a history of aggressively reducing prices • Last Dynamo price reduction March, 2013 Cons • Vendor lock-in • Complicated cost model • Based on “write units” and “read units” • Request rate, data size, consistency model • No organizational experience • Must endure growing pains of new service adoption • No TTL for data • Impacts costs • Efficient data deletion requires engineering investment
  8. Dynamo Versus Cassandra Costs Cassandra Costs • Ongoing management =

    ¼ engineer, ~$3000/month • Primary cluster - ~5TB data, ~45k w/s • 36 m1.xlarge @ $0.48/hr = ~$12,500/month • Alerting cluster - ~1GB data, ~700 w/s, ~2500 r/s • 3 c3.2xlarge @ $0.60/hr = ~$1300/month Dynamo costs • Ongoing management = ~$0/month • Primary cluster ◦ Total = ~$22,400 + reads, without reserved capacity ▪ storage = ~$1380/month, writes = ~$21,000/month, reads = ??? • Alerting cluster ◦ Total: ~$600, without reserved capacity (or $475 for eventually consistent reads) ▪ storage = ~$0, writes = ~$350, reads = ~$250 • Save ~53% for 1-year reserved capacity, ~76% on 3-year