as queues, are the logical pathways that connect the programs and convey messages … A sender or producer is a program that sends a message by writing the message to a channel A receiver or consumer is a program that receives a message by reading (and deleting) it from a channel.” Context: Messaging Enterprise Integration Patterns - Gregor Hohpe and Bobby Woolf http://www.enterpriseintegrationpatterns.com/patterns/messaging/Introduction.html
Log... Each record has a Key… Records are ordered… Order defines a notion of “time”... Content is not important at this point, could be anything … They records what happened and when. The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
on disk reliably? It uses a log. How does one database replica synchronise with another replica? It uses a log. How does activity data get recorded in a system like Apache Kafka? It uses a log. How will the data infrastructure of your application remain robust at scale? Guess what… Using logs to build a solid data infrastructure (or why dual writes are a bad idea) - Martin Kleppmann https://www.confluent.io/blog/using-logs-to-build-a-solid-data-infrastructure-or-why-dual-writes-are-a-bad-idea/ https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
external log is present allows the individual systems to relinquish a lot of their own complexity and rely on the shared log.” The Log: What every software engineer should know about real-time data's unifying abstraction - Jay Kreps https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying http://milinda.pathirage.org/kappa-architecture.com/
Forward (a.k.a. Push Model) Broker in charge of the delivery Event sourcing and stream processing at scale - Martin Kleppmann https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-proce ssing-at-ddd-europe.html Implementations: JMS/AMQP
Events Consumers in control of message consumption Event sourcing and stream processing at scale - Martin Kleppmann https://martin.kleppmann.com/2016/01/29/event-sourcing-stream-process ing-at-ddd-europe.html Implementations: Apache Kafka, Amazon Kinesis Streams, Apache DistributedLog (incubating - Twitter)
the data pipeline problem in LinkedIn. ➔ First use-cases: Collectings system metrics and User’s activity monitoring. 2010: Open-sourced 2011: Apache project 2012: Graduated from incubator in October 2014: Confluent Inc. founded Kafka: The Definitive Guide - Neha Narkhede, Gwen Shapira & Todd Palino
➔ Retention: 3 days ➔ More Partitions ➔ Less Replication Factor ➔ Availability is most important Use case #2 Inventory adjustments ➔ Retention: 6 months ➔ Less Partitions ➔ More Replication Factor ➔ Consistency is most important Streaming in Practice: Putting Kafka in Production - Roger Hoover https://www.confluent.io/apache-kafka-talk-series/Streaming-in-Practice-Putting-Kafka-in-Production/
does not have to be the same Forward/Backward compatibility ➔ Add/remove fields with default values ➔ Explicit `null` type (no optional or required markers) ➔ Change data types ➔ Change names (i.e. alias) Designing Data-Intensive Applications - Martin Kleppmann
Consumer instance (group member) ➔ Consumer Groups as base of parallelism, with Partitions ➔ Ordering ensured by partition (+ keyed topics is normally enough) Multiple Consumers
saving its position but before saving the output of its message processing. ➔ Result In this case the process that took over processing would start at the saved position even though a few messages prior to that position had not been processed.
processing messages but before saving its position. ➔ Result In this case when the new process takes over the first few messages it receives will already have been processed.
Processing and Interactive Queries in Apache Kafka - Eno Thereska https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
Akka Streams http://doc.akka.io/docs/akka-stream-kafka/current/home.html ➔ Oracle Service Bus http://www.ateam-oracle.com/osb-transport-for-apache-kafka-part-1/
consideration of data on the inside vs outside ➔ Schema not externally defined ➔ Same config for every clients/topics ➔ 128 partitions as default ➔ Running on 8 overloaded nodes Kafka Summit 2016: 101 ways to config Kafka - Badly https://www.confluent.io/ kafka-summit-2016-101-ways-to-configure-kafka-badly https://cwiki.apache.org/confluence/display/KAFKA/Operations