Type Initial Cost + DC 2014 Maintenance
renewal
Hbase $1,117k $80k
Non-hbase $257k $22k
Elastic Search $43k $3k
RabbitMQ $4k $1k
NFS (symbols) $12k n/a
Zeus $10k n/a
Slide 6
Slide 6 text
Estimated costs for
100% processing
Slide 7
Slide 7 text
Type Initial Cost + DC Add’l systems
Hbase $1,117k
Non-hbase $257k $138k
Elastic Search $43k $11k
RabbitMQ $4k
NFS (symbols) $12k
Zeus $10k $10k
Object Store $315k
Add’l processors ???
Slide 8
Slide 8 text
Estimated 3 year cost:
$900k
(at 100% processing, plus more processors...)
Slide 9
Slide 9 text
$900k < $1.8 million
Slide 10
Slide 10 text
Big changes
• 17 more systems in stage
• Postgres - +10 systems to store all raw and
processed JSON (5 pairs) (SSDs?)
• Ceph (or something) instead of HBase
• Likely get rid of the Netapp....
Slide 11
Slide 11 text
Who helped collect
this data
Slide 12
Slide 12 text
System type Who
Hbase BIDW/Annie
Non-hbase jakem, cshields
Elastic Search adrian/solarce/bugzilla
RabbitMQ solarce
NFS (symbols) lerxst
Zeus jakem
Add’l systems lonnen, lars, me, adrian
Slide 13
Slide 13 text
Next steps
• Test processing throughput (Lars)
• Implement Ceph/S3 crashstorage class
• Test Ceph (Inktank meeting Friday!)
• Plan for symbols (See Ted later this
morning)
Slide 14
Slide 14 text
Crash Storage Options
Selena Deckelmann
Web Engineering Workweek Q12014
Slide 15
Slide 15 text
Most of this information collected in this etherpad:
https://etherpad.mozilla.org/socorro-hbase-alternatives
Slide 16
Slide 16 text
Assumptions
• Durability: No loss of user submitted
data (crashes)
• Size: Need a distributed storage
mechanism for ~60TB of crash dumps
(Current footprint 50 TB unreplicated,
~150TB replicated x3)
Slide 17
Slide 17 text
Purpose of data we
store
• raw_dump: reprocessing and putting into a
debugger
• raw_crash: metadata display
• processed_crash: MapReduce and reporting
Slide 18
Slide 18 text
Do we need to store raw crashes/
processed json in hbase?
If HBase is to continue as our primary crash
storage, yes, we need all three of raw_crash,
raw_dump and processed crash in there. It is
required that we save raw_crash and
processed_crash in there if we are to
continue to support Map/Reduce jobs on
our data.
Slide 19
Slide 19 text
Assumptions
Performance: Need to retrieve single,
arbitrary crashes in a timely fashion for the
web front-end and processors
Slide 20
Slide 20 text
Assumptions
Performance: Need to store single
crashes in a timely fashion for crashmovers.
The only time requirement is that priority
jobs must be saved, retrieved and processed
within 60 seconds. Since any crash could
potentially be a priority job, we must be able
to store from the mover with seconds.
Slide 21
Slide 21 text
100% processing note!
When we move to 100% processing, priority
jobs may no longer be needed
Slide 22
Slide 22 text
Assumptions
HBase is a CP (consistent, partition
tolerant) system. Wasn’t initially an explicit
requirement, but now important
architecturally for our processors and front-
end which assume consistency.
Slide 23
Slide 23 text
Theory
• To replace HDFS/HBase/Hadoop, we'll
likely need a combination of a few new
systems.
• If we use an AP or AC system, we'll need
another layer to ensure consistency.
Slide 24
Slide 24 text
Options
• Distributed Filesystems: GlusterFS, AFS
• Object storage: S3, Ceph, Nimbus, WOS
• Hbase alternative with MR: Riak, Cassandra
• Alternative architecture: Fast queue +
stream processing system
Slide 25
Slide 25 text
GlusterFS
• Supported by Redhat, lacks interface, just
looks like a filesystem
• Probably too bare-bones for our needs
• We’ve already been down the NFS road...
Slide 26
Slide 26 text
Ceph
• CP system
• Architecture: http://ceph.com/docs/master/
architecture/
• Object gateway docs: http://ceph.com/docs/
master/radosgw/
• Python API example: http://ceph.com/docs/
master/radosgw/s3/python/
Slide 27
Slide 27 text
Ceph Pros
• Provides an S3-like interface
• Uses a Paxos algorithm for reliability
• Have good personal relationship with Sage,
main dev
Slide 28
Slide 28 text
Ceph Cons
• no prior ops experience Moz Ops
deployed a test cluster!
• Need to test performance (but not likely to
be a dealbreaker)
• Need a map-reduce story (maybe)
Slide 29
Slide 29 text
Riak + RiakCS
• AP system
• Cons: Cost, reliability layer not open
source, needs a consistency layer even
though very expensive
Slide 30
Slide 30 text
Cassandra
• AP system
• https://wiki.apache.org/cassandra/
ArchitectureOverview
Slide 31
Slide 31 text
Cassandra Pros
• Very simple API
• Performance on reporting side
• Designed with operations in mind
Slide 32
Slide 32 text
Cassandra Cons
• Potential loss of data on write due to
network partition/node loss: http://
aphyr.com/posts/294-call-me-maybe-
cassandra
• Not designed for large object storage
• Best for a starter streaming reporting
system
Slide 33
Slide 33 text
Larger re-architecture
• Pursue Kafka + a streaming system (like
LinkedIn/Twitter)
• Requires more research, more dev
involvement
• Peter prototyped a Kafka consumer
• Point is faster mean-time-to-reports not
immediate access to data
Slide 34
Slide 34 text
Next steps
• Performance test Ceph
• Performance test Cassandra, implement
reports (TCBS?)
• Report back, evaluate whether more
research into streaming is warranted