Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra @ Cayova

Cassandra @ Cayova

Presentation to the Dublin Cassandra Users 'Cassandra at Scale' meetup.

Bill de hÓra

July 11, 2013
Tweet

Other Decks in Technology

Transcript

  1. Feed: public timeline posting Chat: private and group messaging Hub:

    group discussion and sharing Content: upload & share photo/video/music/files Box: personal web tracking and dashboard - what does it do?
  2. What’s C* being used for? Posts Chat Inbox Notifications Object

    Metadata Counters Timelines Hashtags Browser Metrics System Events Likes
  3. Why C*? Bypass “startup migrates off RDBMS” war story Excellent

    robustness & availability Predictable scaling & cost model Domain and access pattern fit In-house experience at scale Strong community
  4. [11:40am] dehora: lol, ec2 killed one of the events nodes

    [11:40am] dehora: 10.53.53.155 eu-west 1a Up Normal 895.21 MB 75.00% 0 [11:40am] dehora: 10.64.110.63 eu-west 1b Up Normal 851.05 MB 75.00% 42535295865117307932921825928971026432 [11:40am] dehora: 10.55.65.71 eu-west 1a Up Normal 892.55 MB 75.00% 85070591730234615865843651857942052864 [11:40am] dehora: 10.251.39.177 eu-west 1b Down Normal 430.73 MB 75.00% 127605887595351923798765477786913079296 [11:40am] matthew :O [11:40am] dehora: the last node’s instance doesn't exist anymore, but system’s fine [11:41am] dehora: asg spun up a new node, but it has a random token so didn't autojoin [11:41am] matthew: eugh >:( [11:41am] matthew: should Priam not have handled that? [11:41am] dehora: yes, but it can't [11:42am] dehora: the ami we're using here has a bug/feature (apache .deb starts cassandra which means priam can't assign) [11:42am] dehora: the latest ami (0.2.3) has a fix for that [11:45am] dehora: k, i'll remove that node and bring in a new one on c, done with testing 2 zone evac anyway Node loss - still 100% available
  5. Node Setup Cassandra: 1.1.11, Apache .deb, JDK6 AWS: eu-west1, per

    cluster LC/SG, 3 x AZ AMI: Parsel, derived from Ubuntu 12.04 base AMI Servers: m1.xlarge, 1.6T 4x eph RAID0, 8G ebs Conf: 8GB/800M heap, RF=3, 100M keycache, -rowcache Management: Priam, Graphite, Boundary, jmxtrans Client: Astyanax, Quorum, Metrics, TokenAware, Backoff
  6. 1.6T: 4x eph mdadm RAID0, XFS Cassandra 1.1.11 Priam/Tomcat S3

    Parsel +Base AMI, Ubuntu 12.04 LTS Oracle JDK6 jmxtrans Graphite supervisord puppet agent m1.xlarge: Zoned, Static ASG, SG Astyanax clients bprobe Node Setup Graphite Boundary
  7. repair Astyanax clients asg: 1a asg: 1b asg: 1c eu-west-1

    flush writes lc: cass_metrics_0001 load Cluster Setup!
  8.     @Override    Keyspace  get()  {      

     AstyanaxContext<Keyspace>  context  =  new  AstyanaxContext.Builder()                        .forCluster(cluster)                        .forKeyspace(keyspace)                        .withAstyanaxConfiguration(createAstyanaxSettings())                        .withConnectionPoolConfiguration(createConnectionPoolSettings())                        .withConnectionPoolMonitor(new  YammerConnectionPoolMonitor())                        .buildKeyspace(ThriftFamilyFactory.getInstance())        context.start()        addShutdownHook  {  context.shutdown()  }        context.entity    } Astyanax Client (Groovy)
  9.    private  AstyanaxConfigurationImpl  createAstyanaxSettings()  {        new  AstyanaxConfigurationImpl()

                           .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)                        .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)                        .setAsyncExecutor(createExecutor())                        .setClock(clock)                        .setDefaultReadConsistencyLevel(ConsistencyLevel.valueOf(defaultReadConsistencyLevel))                        .setDefaultWriteConsistencyLevel(ConsistencyLevel.valueOf(defaultWriteConsistencyLevel))                        .setRetryPolicy(new  ExponentialBackoff(baseSleepTime,  maxAttempts))    } Astyanax Client (Groovy)
  10. General Guidance Pair deploy for destructive operations Automate as much

    as possible & burn AMIs Use a management tool (DSE, OpsCenter, Priam) Set consistencylevel as QUORUM in the CLI Monitor growth in load Consider getting support Ask for help - mailing list, lots of community expertise
  11. Watch all the things Repair/Compaction spikes: compactionstats Disk load: nodetool

    info, du, iostat -x, backups off-node Write pressure: tpstats FlushWriter stage, memtables GC: grep GCI, top -H; we enable gc log in prod Baseline heap: phat nodes might want more than 8G Cache hits & resizing: nodetool info, warnings in logs Dropped/Pending: nodetool tpstats, cfhistograms Flapping: up/down log messages, client markdowns ಠ_ಠ
  12.     def  repair(hostname,  keyfile,  username='',  passwd=''):      

     with  settings(warn_only=True):                env.host_string  =  hostname                env.key_filename  =  keyfile                result  =  run("curl  -­‐v  http://localhost:8080/Priam/REST/v1/cassadmin/repair")                send_happy_mail(hostname,  username,  passwd)        if  result.failed:                send_error_mail(hostname,  username,  passwd)                abort("Failing  Cassandra  repair  call  on  %s"  %  hostname)     Centralise Repair (Fabric)
  13. It’s all in the grind, Sizemore Understand row width and

    data access Avoid heavy delete after writes (queues) Avoid read before writes (usually) Understand your client Model, don’t prototype RDBMS if you need a lock Redis/RDBMS if you need precision counters Allow time to relearn & get productive
  14. Future Multi-regional deploys, SSD/hi1.4xlarge Usecases: Graph, Recording, Tags, Thumbs C*

    1.2/2.0 Better disk density CQL #1311: Triggers #5062 : ConsistencyLevel.SERIAL #4865: Off heap Bloom filters