Cost of 100% processing and crashstorage options for Socorro

Slide 1

Slide 1 text

100% processing Selena Deckelmann and Lars Lohn Web Engineering Workweek Q12014

Slide 2

Slide 2 text

Current 3 year cost: $1.8 million (at 10% processing)

Slide 3

Slide 3 text

The support classiﬁer Why we want to do 100% processing

Slide 4

Slide 4 text

http://uncommonrose.com/ presentations/pytn14/resources/ socorro.dollar.svg

Slide 5

Slide 5 text

Type Initial Cost + DC 2014 Maintenance renewal Hbase $1,117k $80k Non-hbase $257k $22k Elastic Search $43k $3k RabbitMQ $4k $1k NFS (symbols) $12k n/a Zeus $10k n/a

Slide 6

Slide 6 text

Estimated costs for 100% processing

Slide 7

Slide 7 text

Type Initial Cost + DC Add’l systems Hbase $1,117k Non-hbase $257k $138k Elastic Search $43k $11k RabbitMQ $4k NFS (symbols) $12k Zeus $10k $10k Object Store $315k Add’l processors ???

Slide 8

Slide 8 text

Estimated 3 year cost: $900k (at 100% processing, plus more processors...)

Slide 9

Slide 9 text

$900k < $1.8 million

Slide 10

Slide 10 text

Big changes • 17 more systems in stage • Postgres - +10 systems to store all raw and processed JSON (5 pairs) (SSDs?) • Ceph (or something) instead of HBase • Likely get rid of the Netapp....

Slide 11

Slide 11 text

Who helped collect this data

Slide 12

Slide 12 text

System type Who Hbase BIDW/Annie Non-hbase jakem, cshields Elastic Search adrian/solarce/bugzilla RabbitMQ solarce NFS (symbols) lerxst Zeus jakem Add’l systems lonnen, lars, me, adrian

Slide 13

Slide 13 text

Next steps • Test processing throughput (Lars) • Implement Ceph/S3 crashstorage class • Test Ceph (Inktank meeting Friday!) • Plan for symbols (See Ted later this morning)

Slide 14

Slide 14 text

Crash Storage Options Selena Deckelmann Web Engineering Workweek Q12014

Slide 15

Slide 15 text

Most of this information collected in this etherpad: https://etherpad.mozilla.org/socorro-hbase-alternatives

Slide 16

Slide 16 text

Assumptions • Durability: No loss of user submitted data (crashes) • Size: Need a distributed storage mechanism for ~60TB of crash dumps (Current footprint 50 TB unreplicated, ~150TB replicated x3)

Slide 17

Slide 17 text

Purpose of data we store • raw_dump: reprocessing and putting into a debugger • raw_crash: metadata display • processed_crash: MapReduce and reporting

Slide 18

Slide 18 text

Do we need to store raw crashes/ processed json in hbase? If HBase is to continue as our primary crash storage, yes, we need all three of raw_crash, raw_dump and processed crash in there. It is required that we save raw_crash and processed_crash in there if we are to continue to support Map/Reduce jobs on our data.

Slide 19

Slide 19 text

Assumptions Performance: Need to retrieve single, arbitrary crashes in a timely fashion for the web front-end and processors

Slide 20

Slide 20 text

Assumptions Performance: Need to store single crashes in a timely fashion for crashmovers. The only time requirement is that priority jobs must be saved, retrieved and processed within 60 seconds. Since any crash could potentially be a priority job, we must be able to store from the mover with seconds.

Slide 21

Slide 21 text

100% processing note! When we move to 100% processing, priority jobs may no longer be needed

Slide 22

Slide 22 text

Assumptions HBase is a CP (consistent, partition tolerant) system. Wasn’t initially an explicit requirement, but now important architecturally for our processors and front- end which assume consistency.

Slide 23

Slide 23 text

Theory • To replace HDFS/HBase/Hadoop, we'll likely need a combination of a few new systems. • If we use an AP or AC system, we'll need another layer to ensure consistency.

Slide 24

Slide 24 text

Options • Distributed Filesystems: GlusterFS, AFS • Object storage: S3, Ceph, Nimbus, WOS • Hbase alternative with MR: Riak, Cassandra • Alternative architecture: Fast queue + stream processing system

Slide 25

Slide 25 text

GlusterFS • Supported by Redhat, lacks interface, just looks like a ﬁlesystem • Probably too bare-bones for our needs • We’ve already been down the NFS road...

Slide 26

Slide 26 text

Ceph • CP system • Architecture: http://ceph.com/docs/master/ architecture/ • Object gateway docs: http://ceph.com/docs/ master/radosgw/ • Python API example: http://ceph.com/docs/ master/radosgw/s3/python/

Slide 27

Slide 27 text

Ceph Pros • Provides an S3-like interface • Uses a Paxos algorithm for reliability • Have good personal relationship with Sage, main dev

Slide 28

Slide 28 text

Ceph Cons • no prior ops experience Moz Ops deployed a test cluster! • Need to test performance (but not likely to be a dealbreaker) • Need a map-reduce story (maybe)

Slide 29

Slide 29 text

Riak + RiakCS • AP system • Cons: Cost, reliability layer not open source, needs a consistency layer even though very expensive

Slide 30

Slide 30 text

Cassandra • AP system • https://wiki.apache.org/cassandra/ ArchitectureOverview

Slide 31

Slide 31 text

Cassandra Pros • Very simple API • Performance on reporting side • Designed with operations in mind

Slide 32

Slide 32 text

Cassandra Cons • Potential loss of data on write due to network partition/node loss: http:// aphyr.com/posts/294-call-me-maybe- cassandra • Not designed for large object storage • Best for a starter streaming reporting system

Slide 33

Slide 33 text

Larger re-architecture • Pursue Kafka + a streaming system (like LinkedIn/Twitter) • Requires more research, more dev involvement • Peter prototyped a Kafka consumer • Point is faster mean-time-to-reports not immediate access to data

Slide 34

Slide 34 text

Next steps • Performance test Ceph • Performance test Cassandra, implement reports (TCBS?) • Report back, evaluate whether more research into streaming is warranted