Slack data platform evolves from the batch system to near real-time. I will also touch base on how Samza helps us to build low latency data pipelines & Experimentation framework.
Public launch: 2014 800+ employees across 7 countries worldwide HQ in San Francisco Diverse set of industries including software/technology, retail, media, telecom and professional services. About Slack
Performance & Experimentation ● Engineering & CE team should be able to detect the performance bottleneck proactively. ● Engineers should be able to see their experimentation performance in near real-time.
Keep the load in DW Kafka predictable. More comfortable to upgrade and verify newer Kafka version. Smaller Kafka cluster is relatively more straightforward to operate. Why Analytics Kafka
● Approx percentile using Druid Histogram extension [http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf] ● Unique users based on Druid HyperLogLog implementation ● Slack bot integration to alert based on performance metrics Performance monitoring
● Both the Streams hash partitioned by Team & User ● RocksDB Store exposure table (team_users_experimentation mapping). ● Metrics events range join with exposure table. ● A periodic snapshot of RocksDB to quality check with batch system Experimentation Framework