Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cost of 100% processing and crashstorage options for Socorro

Cost of 100% processing and crashstorage options for Socorro

Selena Deckelmann

March 26, 2014

More Decks by Selena Deckelmann

Other Decks in Technology


  1. Type Initial Cost + DC 2014 Maintenance renewal Hbase $1,117k

    $80k Non-hbase $257k $22k Elastic Search $43k $3k RabbitMQ $4k $1k NFS (symbols) $12k n/a Zeus $10k n/a
  2. Type Initial Cost + DC Add’l systems Hbase $1,117k Non-hbase

    $257k $138k Elastic Search $43k $11k RabbitMQ $4k NFS (symbols) $12k Zeus $10k $10k Object Store $315k Add’l processors ???
  3. Big changes • 17 more systems in stage • Postgres

    - +10 systems to store all raw and processed JSON (5 pairs) (SSDs?) • Ceph (or something) instead of HBase • Likely get rid of the Netapp....
  4. System type Who Hbase BIDW/Annie Non-hbase jakem, cshields Elastic Search

    adrian/solarce/bugzilla RabbitMQ solarce NFS (symbols) lerxst Zeus jakem Add’l systems lonnen, lars, me, adrian
  5. Next steps • Test processing throughput (Lars) • Implement Ceph/S3

    crashstorage class • Test Ceph (Inktank meeting Friday!) • Plan for symbols (See Ted later this morning)
  6. Assumptions • Durability: No loss of user submitted data (crashes)

    • Size: Need a distributed storage mechanism for ~60TB of crash dumps (Current footprint 50 TB unreplicated, ~150TB replicated x3)
  7. Purpose of data we store • raw_dump: reprocessing and putting

    into a debugger • raw_crash: metadata display • processed_crash: MapReduce and reporting
  8. Do we need to store raw crashes/ processed json in

    hbase? If HBase is to continue as our primary crash storage, yes, we need all three of raw_crash, raw_dump and processed crash in there. It is required that we save raw_crash and processed_crash in there if we are to continue to support Map/Reduce jobs on our data.
  9. Assumptions Performance: Need to retrieve single, arbitrary crashes in a

    timely fashion for the web front-end and processors
  10. Assumptions Performance: Need to store single crashes in a timely

    fashion for crashmovers. The only time requirement is that priority jobs must be saved, retrieved and processed within 60 seconds. Since any crash could potentially be a priority job, we must be able to store from the mover with seconds.
  11. Assumptions HBase is a CP (consistent, partition tolerant) system. Wasn’t

    initially an explicit requirement, but now important architecturally for our processors and front- end which assume consistency.
  12. Theory • To replace HDFS/HBase/Hadoop, we'll likely need a combination

    of a few new systems. • If we use an AP or AC system, we'll need another layer to ensure consistency.
  13. Options • Distributed Filesystems: GlusterFS, AFS • Object storage: S3,

    Ceph, Nimbus, WOS • Hbase alternative with MR: Riak, Cassandra • Alternative architecture: Fast queue + stream processing system
  14. GlusterFS • Supported by Redhat, lacks interface, just looks like

    a filesystem • Probably too bare-bones for our needs • We’ve already been down the NFS road...
  15. Ceph • CP system • Architecture: http://ceph.com/docs/master/ architecture/ • Object

    gateway docs: http://ceph.com/docs/ master/radosgw/ • Python API example: http://ceph.com/docs/ master/radosgw/s3/python/
  16. Ceph Pros • Provides an S3-like interface • Uses a

    Paxos algorithm for reliability • Have good personal relationship with Sage, main dev
  17. Ceph Cons • no prior ops experience Moz Ops deployed

    a test cluster! • Need to test performance (but not likely to be a dealbreaker) • Need a map-reduce story (maybe)
  18. Riak + RiakCS • AP system • Cons: Cost, reliability

    layer not open source, needs a consistency layer even though very expensive
  19. Cassandra Pros • Very simple API • Performance on reporting

    side • Designed with operations in mind
  20. Cassandra Cons • Potential loss of data on write due

    to network partition/node loss: http:// aphyr.com/posts/294-call-me-maybe- cassandra • Not designed for large object storage • Best for a starter streaming reporting system
  21. Larger re-architecture • Pursue Kafka + a streaming system (like

    LinkedIn/Twitter) • Requires more research, more dev involvement • Peter prototyped a Kafka consumer • Point is faster mean-time-to-reports not immediate access to data
  22. Next steps • Performance test Ceph • Performance test Cassandra,

    implement reports (TCBS?) • Report back, evaluate whether more research into streaming is warranted