From Messaging to Logs with Apache Kafka - OUGN17

Slide 1

Slide 1 text

#ougn17 messaging → logs @apachekafka Jorge Quilcate Otoya @jeqo89

Slide 2

Slide 2 text

#ougn17 About me Jorge Quilcate Otoya Back-end/Integration Developer at Sysco Middleware @jeqo89 | github.com/jeqo | jeqo.github.io

Slide 3

Slide 3 text

#ougn17 Context

Slide 4

Slide 4 text

#ougn17 “Technology that enables asynchronous communication … Channels, also known as queues, are the logical pathways that connect the programs and convey messages … A sender or producer is a program that sends a message by writing the message to a channel A receiver or consumer is a program that receives a message by reading (and deleting) it from a channel.” Context: Messaging Enterprise Integration Patterns - Gregor Hohpe and Bobby Woolf http://www.enterpriseintegrationpatterns.com/patterns/messaging/Introduction.html

Slide 5

Slide 5 text

#ougn17 Message Channels: Point-to-Point, Pub/Sub

Slide 6

Slide 6 text

#ougn17 Context: Logs Records appended to the end of the Log... Each record has a Key… Records are ordered… Order defines a notion of “time”... Content is not important at this point, could be anything … They records what happened and when. The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

Slide 7

Slide 7 text

#ougn17 Logs… Logs everywhere How does your database store data on disk reliably? It uses a log. How does one database replica synchronise with another replica? It uses a log. How does activity data get recorded in a system like Apache Kafka? It uses a log. How will the data infrastructure of your application remain robust at scale? Guess what… Using logs to build a solid data infrastructure (or why dual writes are a bad idea) - Martin Kleppmann https://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/ https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/

Slide 8

Slide 8 text

#ougn17 Log-Centric Architecture (a.k.a. Kappa) “A system that assumes an external log is present allows the individual systems to relinquish a lot of their own complexity and rely on the shared log.” The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying http://milinda.pathirage.org/kappa-architecture.com/

Slide 9

Slide 9 text

#ougn17 Use Cases

Slide 10

Slide 10 text

#ougn17 Messaging use-case: Job Queues Fire and Forget Store and Forward (a.k.a. Push Model) Broker in charge of the delivery Event sourcing and stream processing at scale - Martin Kleppmann https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-proce ssing-at-ddd-europe.html Implementations: JMS/AMQP

Slide 11

Slide 11 text

#ougn17 Messaging Challenges Out-of-order when messages are retried Risk of inconsistencies in different clients (producers and/or consumers)

Slide 12

Slide 12 text

#ougn17 Solving Messaging Challenges with Logs Ordering and Reprocessing

Slide 13

Slide 13 text

#ougn17 Logs use-case: Event Log Pull Model Ordered stream of Events Consumers in control of message consumption Event sourcing and stream processing at scale - Martin Kleppmann https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-process ing-at-ddd-europe.html Implementations: Apache Kafka, Amazon Kinesis Streams, Apache DistributedLog (incubating - Twitter)

Slide 14

Slide 14 text

#ougn17 Apache Kafka A Distributed Streaming Platform

Slide 15

Slide 15 text

#ougn17 Apache Kafka: Facts ➔ Born from necessity to solve the data pipeline problem in LinkedIn. ➔ First use-cases: Collectings system metrics and User’s activity monitoring. 2010: Open-sourced 2011: Apache project 2012: Graduated from incubator in October 2014: Confluent Inc. founded Kafka: The Definitive Guide - Neha Narkhede, Gwen Shapira & Todd Palino

Slide 16

Slide 16 text

#ougn17 Apache Kafka: Use-cases ➔ Activity Tracking ➔ Messaging ➔ Metrics/Logging ➔ Commit Log ➔ Change Data Capture (CDC) ➔ Stream Processing ➔ Cloud Adoption ➔ …

Slide 17

Slide 17 text

#ougn17 Messaging Batch Database

Slide 18

Slide 18 text

#ougn17 Apache Kafka Tour (v0.10.2.0) Kafka Cluster Log Records Kafka Producer API Kafka Consumer API Kafka Streams API Kafka Connect API Kafka ++

Slide 19

Slide 19 text

#ougn17 Kafka Core

Slide 20

Slide 20 text

#ougn17 Kafka Cluster

Slide 21

Slide 21 text

#ougn17 Centralized coordination service: consensus, group management, presence protocols, atomic broadcast Kafka’s internal “source of truth” Used for: ➔ Master election ➔ Replica propagation (ISR) ➔ And more Kafka Topology: Why Zookeeper? Distributed Consensus Reloaded: Apache Zookeeper and Replication in Kafka - Flavio Junqueira https://www.confluent.io/blog/distributed-consensus-reloaded-apache-zookeeper-and-replication-in-kafka/

Slide 22

Slide 22 text

#ougn17 Balance Availability and Consistency Use case #1 Activity Tracking ➔ Retention: 3 days ➔ More Partitions ➔ Less Replication Factor ➔ Availability is most important Use case #2 Inventory adjustments ➔ Retention: 6 months ➔ Less Partitions ➔ More Replication Factor ➔ Consistency is most important Streaming in Practice: Putting Kafka in Production - Roger Hoover https://www.confluent.io/apache-kafka-talk-series/Streaming-in-Practice-Putting-Kafka-in-Production/

Slide 23

Slide 23 text

#ougn17

Slide 24

Slide 24 text

#ougn17 Log Record

Slide 25

Slide 25 text

#ougn17 from Topics to Partitions http://kafka.apache.org/documentation

Slide 26

Slide 26 text

#ougn17 from Partitions to Segments https://www.confluent.io/apache-kafka-talk-series/deep-dive-into-apache-kafka/ https://www.confluent.io/apache-kafka-talk-series/

Slide 27

Slide 27 text

#ougn17 from Segments to Records https://www.confluent.io/apache-kafka-talk-series/deep-dive-into-apache-kafka/ https://www.confluent.io/apache-kafka-talk-series/

Slide 28

Slide 28 text

#ougn17 Log Unit: Record https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol

Slide 29

Slide 29 text

#ougn17 Schema Evolution: Why Avro? Reader’s schema and writer’s schema does not have to be the same Forward/Backward compatibility ➔ Add/remove fields with default values ➔ Explicit `null` type (no optional or required markers) ➔ Change data types ➔ Change names (i.e. alias) Designing Data-Intensive Applications - Martin Kleppmann

Slide 30

Slide 30 text

#ougn17 Lab: Kafka Cluster Scalability: Cluster and Brokers Topics: Partitions, Replication, ISR Cleaning up: Compaction and Retention

Slide 31

Slide 31 text

#ougn17 Lab: Log Record Record Structure: Key/Value Serialization/Deserialization Metadata: Offset/Timestamp

Slide 32

Slide 32 text

#ougn17 Kafka Clients API

Slide 33

Slide 33 text

#ougn17 Kafka Clients survey https://www.confluent.io/blog/first-annual-state-apache-kafka-client-use-survey

Slide 34

Slide 34 text

#ougn17 Kafka Producer API

Slide 35

Slide 35 text

#ougn17 Batching and Compression

Slide 36

Slide 36 text

#ougn17 Acknowledgment: Latency vs Durability Ack=0 → No network delay → some data loss

Slide 37

Slide 37 text

#ougn17 Acknowledgment: Latency vs Durability Ack=1 → 1 network round-trip → few data loss

Slide 38

Slide 38 text

#ougn17 Acknowledgment: Latency vs Durability Ack=all (-1) → 2 network round-trip → no data loss (in combination with `min.insync.replicas`)

Slide 39

Slide 39 text

#ougn17 Lab: Kafka Producer Batching and Compression Acknowledgements

Slide 40

Slide 40 text

#ougn17 Kafka Consumer API

Slide 41

Slide 41 text

#ougn17 ➔ Consumer Groups as Logical Subscribers ➔ Offset by Consumer instance (group member) ➔ Consumer Groups as base of parallelism, with Partitions ➔ Ordering ensured by partition (+ keyed topics is normally enough) Multiple Consumers

Slide 42

Slide 42 text

#ougn17 At-Most-Once Delivery ➔ Scenario the consumer process crashes after saving its position but before saving the output of its message processing. ➔ Result In this case the process that took over processing would start at the saved position even though a few messages prior to that position had not been processed.

Slide 43

Slide 43 text

#ougn17 At-Least-Once Delivery ➔ Scenario the consumer process crashes after processing messages but before saving its position. ➔ Result In this case when the new process takes over the first few messages it receives will already have been processed.

Slide 44

Slide 44 text

#ougn17 Exactly-Once Delivery “Exactly-once delivery requires co-operation with the destination storage system …” Coming soon (KIP-98/KIP-129): ● Idempotent Producer Guarantees ● Transactional Guarantees ● Streams Exactly-Once semantics

Slide 45

Slide 45 text

#ougn17 Lab: Kafka Consumer Consumer Groups: Parallelism Rewind Offsets: Control and reprocessing (https://jeqo.github.io/post/2017-01-31-kafka-rewind-consumers-offset/)

Slide 46

Slide 46 text

#ougn17 Kafka Streams API & Kafka Connector API

Slide 47

Slide 47 text

#ougn17 Kafka Streams API & Kafka Connector API Unifying Stream Processing and Interactive Queries in Apache Kafka - Eno Thereska https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

Slide 48

Slide 48 text

#ougn17 Kafka Streams https://twitter.com/lcrsilveira/status/829615803133730816 https://twitter.com/jessetanderson/status/830113106277785600

Slide 49

Slide 49 text

#ougn17 Kafka Connect HDFS, JDBC, GoldenGate, Elasticsearch, Couchbase, DataStax, Cassandra, Attunity, Azure IoTHub, SAP Hana, VoltDb, FTP, JMS, JMX, MongoDB, Solr, Splunk, RethinkDB, SQS, S3, MQTT, Redis, InfluxDB, HBase, Hazelcast, Twitter, and more...

Slide 50

Slide 50 text

#ougn17 Lab: Kafka Streams & Kafka Connector Twitter/File Connectors “Simplified Consumer” Stream/Table Duality Stateful processing (Time Window)*

Slide 51

Slide 51 text

#ougn17 Kafka++

Slide 52

Slide 52 text

#ougn17 Confluent Platform: Apache Kafka Enterprise Edition

Slide 53

Slide 53 text

#ougn17 Integration with Kafka Integration Platforms: ➔ Camel http://camel.apache.org/kafka.html ➔ Akka Streams http://doc.akka.io/docs/akka-stream-kafka/current/home.html ➔ Oracle Service Bus http://www.ateam-oracle.com/osb-transport-for-apache-kafka-part-1/

Slide 54

Slide 54 text

#ougn17 What’s in discussion and/or coming soon? Exactly-once Delivery / Txn Messaging (adopted - wip) https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional +Messaging Headers support (additional metadata) (vote) https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers ZStandard Compression support (discussion) https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression Reset Offset tool (vote) https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+a+tool+to+Reset+Consumer+Group+Of fsets https://cwiki.apache.org/confluence/display/KAFKA/ Kafka+Improvement+Proposals

Slide 55

Slide 55 text

#ougn17 How NOT to use Kafka Top 5: ➔ No consideration of data on the inside vs outside ➔ Schema not externally defined ➔ Same config for every clients/topics ➔ 128 partitions as default ➔ Running on 8 overloaded nodes Kafka Summit 2016: 101 ways to config Kafka - Badly https://www.confluent.io/ kafka-summit-2016-101-ways-to-configure-kafka-badly https://cwiki.apache.org/confluence/display/KAFKA/Operations

Slide 56

Slide 56 text

#ougn17 Further reading

Slide 57

Slide 57 text

#ougn17 Thanks!!! Twitter: @jeqo89 GitHub: /jeqo Blog: jeqo.github.io Code: github.com/jeqo/talk-kafka-messaging-logs