• High-throughput, distributed, persistent publish-subscribe messaging system • Originates from LinkedIn • Typically used as buffer/de-coupling layer in online stream processing Message queues & routers kafka.apache.org
Distributed time series database on top HBase • Store, index, query & plot metrics • Extremely scalable • Low-level monitoring Time series datastores opentsdb.net
No-dependency, time series database written in Go • SQLish query language (incl. regex, fan out) • Single node or Raft-based distributed node mode Time series datastores influxdb.com
Toolbox • Distributed systems are hard • Set up and operation of components • One (static) cluster per component • Efficient usage of cluster resources (TCO)
• Run stateless services (Web server, app server, etc.) and Big Data services like Kafka, Spark, or Cassandra together on one cluster • Dynamic partitioning of your cluster, depending on your business requirements • Increased utilization (10% → 80%++)