Slide 1

Slide 1 text

Ananth Packkildurai February 27, 2019 1 Reliable Events Pipeline

Slide 2

Slide 2 text

Events “An event is a single occurrence within an environment, usually involving an attempted state change.”

Slide 3

Slide 3 text

Logs “A log is a collection of event records”

Slide 4

Slide 4 text

Logs @ Slack 2M 4 3TB Events per second Kafka clusters Per hour

Slide 5

Slide 5 text

Me ➢ @ananthdurai ➢ Data Infrastructure Engineer @ Slack ➢ Passionate about all things related to ethical data management

Slide 6

Slide 6 text

Team REP Derek Smith Jackson Argo

Slide 7

Slide 7 text

Public launch: 2014 1000+ employees across 7 countries worldwide HQ in San Francisco $841M in capital raised Key investors include Softbank, Accel, a16z, Social Capital, Index, Thrive, GV, Kleiner Perkins, GGV, Horizons, Spark, IVP and DST. Diverse set of industries including software/technology, retail, media, telecom and professional services. About Slack

Slide 8

Slide 8 text

An unprecedented adoption rate

Slide 9

Slide 9 text

Data Decisions

Slide 10

Slide 10 text

Growth Metrics

Slide 11

Slide 11 text

Service Quality Metrics

Slide 12

Slide 12 text

Billing Metrics

Slide 13

Slide 13 text

How did we start?

Slide 14

Slide 14 text

Is it reliable?

Slide 15

Slide 15 text

REP Characteristics Trust in Logs

Slide 16

Slide 16 text

REP Characteristics Trust in Logs High Availability

Slide 17

Slide 17 text

REP Characteristics Trust in Logs High Availability Low Latency

Slide 18

Slide 18 text

Efficient REP Characteristics Trust in Logs High Availability Low Latency

Slide 19

Slide 19 text

Efficient REP Characteristics Trust in Logs High Availability Low Latency

Slide 20

Slide 20 text

REP pipeline

Slide 21

Slide 21 text

Murron: Murron is a sidecar running per instance based, collecting logs from host and containers ● Guarantee at least once message delivery ● Support retry, back pressure and configurable dynamic routing ● Support Grpc, TCP, Http & unix domain protocol Murron logging agent

Slide 22

Slide 22 text

Murron Protocol

Slide 23

Slide 23 text

UID

Slide 24

Slide 24 text

Message Signature

Slide 25

Slide 25 text

Container

Slide 26

Slide 26 text

Log correctness Did we log correctly? Measuring Reliability Log reliability Are we missing any data?

Slide 27

Slide 27 text

Log reliability

Slide 28

Slide 28 text

Log reliability

Slide 29

Slide 29 text

Log Inspector

Slide 30

Slide 30 text

Pinot is a realtime distributed OLAP datastore ● A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length ● Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index ● Near real time ingestion from Kafka and batch ingestion from Hadoop ● SQL like language that supports selection, aggregation, filtering, group by, order by, distinct queries on fact data. ● Horizontally scalable and fault tolerant Apache Pinot

Slide 31

Slide 31 text

REP extended

Slide 32

Slide 32 text

Log Inspector

Slide 33

Slide 33 text

Thank You! 33 For more information go to: slack.com