Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How ScalingData uses Kafka for Event-Oriented Machine Data

Hakka Labs
November 17, 2014

How ScalingData uses Kafka for Event-Oriented Machine Data

Hakka Labs

November 17, 2014
Tweet

More Decks by Hakka Labs

Other Decks in Programming

Transcript

  1. 1. producers -> kafka 2. kafka -> consumers 3. consumers

    -> storage/processing message schema? apis? legacy sources? keys? dupes? partitioning? offset management? durability? dupes? dead letter handling? where does transformation go? data center replication? …well, with filtering and stuff?
  2. these things aren’t kafka’s responsibility. they depend on… lots of

    stuff. but you need to think about them. nothing is free.
  3. Broker Cluster “events” topic Agent Log4J Appender Java API REST

    API Syslog Statsd Clients Files / Directories Other Stuff Event Event Event
  4. Broker Cluster “events” topic HDFS Consumer Solr Consumer HDFS Solr

    Event Record Doc Custom Stuff Magic ScalingData Application Impala/Hive Spark
  5. HDFS Consumer (process) Thread Consumer Kite Writer (Dataset A) Transform

    Engine Transform Transform Event Avro Object Thread Consumer Kite Writer (Dataset B) Transform Engine Transform Transform Event Avro Object HDFS Consumer (process) Thread Consumer Kite Writer (Dataset A) Transform Engine Transform Transform Event Avro Object Thread Consumer Kite Writer (Dataset B) Transform Engine Transform Transform Event Avro Object Topic A Topic B HDFS Dataset A Data File Data File Dataset B Data File Data File
  6. i’m probably out of time by now. …but maybe time

    for some questions. (we’re hiring.)