Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

How It Works - Kafka

How It Works - Kafka

A series of talks on data engineering

Avatar for Yuri Ostapchuk

Yuri Ostapchuk

September 13, 2021
Tweet

More Decks by Yuri Ostapchuk

Other Decks in Programming

Transcript

  1. USE-CASES Messaging: traditional message broker pattern of data processing Website

    Activity Tracking: real-time publish-subscribe feeds in domains of page views, searches, and other user interactions. Metrics: operational monitoring data processing. Log Aggregation: collecting physical log files and store them for further processing. Stream Processing: multi-stage data processing pipelines. Event Sourcing: support of apps built with stored event sequences that can be replayed and applied again for deriving a consistent system state. Commit Log: the type of data stored in distributed system that ensures the re-syncing mechanism. 5 . 1
  2. CONSUMER API consumer vs consumer group offset control & offset

    storage push vs pull and optimal batching 10 . 3
  3. ADMIN, MONITORING, CONFIGURATION UI & monitoring CLI admin api configs:

    topic, broker, consumer, producer, admin, (connect, streams) 11 . 1
  4. LOG STORAGE commit log instead of transient queue starting/ending offsets

    acknowledgement & replication topic ttl sequential disk access vs random-memory access log compaction 12 . 1
  5. DELIVERY SEMANTICS ordering at producer ordering at consumer delivery guarantees

    at-most-once vs. at-least-once vs. exactly- once 13 . 1
  6. <..> VS. KAFKA yes flexibility scaling, parallelism pub/sub or queue

    msg ordering encryption integration streaming connect protocols persistance (commit log vs. trasient queue) HA, fault-tolerance (tolerate N-1 server failures) 14 . 1
  7. no msg priority transactions (no out of the box) consumer

    ack producer-consumer routing get msg by key (offset-based consuming) batch processing 14 . 2
  8. TIPS & TRICKS deployment: usually ec2, packer/ansible AWS MKS interface:

    kafkacat, kafka ui confluent platform and schema management offset storage partitioning strategies 19 . 1
  9. COMMON ARCHITECTURES s3/sns/hdfs + kafka simple producer + kafka kafka

    + spark kafka + akka kafka connect (s3, http, hdfs, sns, kinesis, jdbc, hbase, elastic, cassandra,…) confluent platform: ksql + schema registry + control center + rest proxy (+ connect + streams) kafka from presto/hive .. 19 . 2
  10. ACTION? getting hands dirty download kafka, start and play with

    topics produce text file - row by row RTFM streams: word-count 20 . 1