Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Event Sourcing with Kafka Streams

Event Sourcing with Kafka Streams

Amitay Horwitz

May 19, 2018
Tweet

More Decks by Amitay Horwitz

Other Decks in Programming

Transcript

  1. ~$ whois • Software engineer @ Wix • Functional programming,


    distributed systems, TDD, … • • @amitayh on Twitter, GitHub, et al
  2. • 120M < Users • 2,100 < Employees • 700

    < Engineers • 600 < Micro-services in production
  3. AGENDA ✅ • Event sourcing 101 • Eventim • Kafka

    & Kafka Streams • Putting it all together
  4. A SERVICE IS BORN ✒ • Wix Invoices was incepted

    in mid 2015 • Rich domain model • Auditing is important for monetary products
  5. invoices invoice_id customer_id issue_date due_date sent_date currency status line_items line_item_id

    invoice_id description quantity price customers customer_id name email address payments transaction_id invoice_id payment_type payment_amount taxes tax_id line_item_id name rate OBJECT RELATIONAL IMPEDANCE MISMATCH
  6. MUTABLE STATE • Instead of saving the current state, we

    save the succession of events that brought us to this state
  7. MUTABLE STATE • Instead of saving the current state, we

    save the succession of events that brought us to this state • currentState =
 events.foldLeft(empty) { (state, event) =>
 state apply event
 }
  8. Invoice created { "customer": {...}, "issueDate": "2018-01-01", "dueDate": "2018-02-01", "lineItems:

    [{"price": 1.99, "qty": 1}], "status": "DRAFT" } Line item added time INVOICE LIFECYCLE
  9. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 1.99, "qty": 1}, {"price": 3.50, "qty": 2}], "status": "DRAFT" } Line item added Line item added time
  10. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "DRAFT" } Line item added Line item added Line item removed time
  11. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "SENT" } Line item added Line item added Line item removed Invoice sent to customer time
  12. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "PAID" } Line item added Line item added Line item removed Invoice sent to customer Payment received time
  13. DESIGN GOALS • Small and simple library • Non-intrusive •

    Maintain data integrity • Easily add custom views
  14. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Validate command payload
  15. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Handler #1 … Command Dispatcher Handler #2 Handler #n Validate command payload
  16. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Handler #1 … Command Dispatcher Handler #2 Handler #n Validate command payload PartialFunction[DomainCommand, Unit]
  17. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Handler #1 … Command Dispatcher Handler #2 Handler #n Validate command payload PartialFunction[DomainCommand, Unit] Function[DomainCommand, Unit]
  18. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events Current
 aggregate Command
  19. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events Current
 aggregate Command
 events Command
  20. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events New Aggregate Current
 aggregate Command
 events Command
  21. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events New Aggregate Current
 aggregate Command
 events Publish events (optimistic locking) Command
  22. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events New Aggregate Current
 aggregate Command
 events Publish events (optimistic locking) Event Bus Command
  23. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Event Bus Events
  24. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Event Bus Events User Interface
  25. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Queries Event Bus Events User Interface
  26. PAIN POINTS • Despite of the simple design, became quite

    a big library • Inherent eventual consistency is not integrated in the product (read after write) • Rebuilding views is a complex operation
  27. REBUILDING VIEWS View Projector #1 Event Handler DB #1 Event

    Bus Events View Projector #2 Event Handler DB #2
  28. REBUILDING VIEWS View Projector #1 Event Handler DB #1 Event

    Bus Events View Projector #2 Event Handler DB #2 User Interface Queries
  29. APACHE KAFKA • Distributed append-only log • Replicated, fault-tolerant •

    Often used as pub-sub or queue • Used heavily at LinkedIn, Netflix, Wix and many others
  30. KAFKA TOPICS 6 5 4 3 2 1 4 3

    2 1 7 6 5 4 3 2 1 P0 P1 P2 Producer
  31. KAFKA TOPICS Consumer Group Node #1 6 5 4 3

    2 1 4 3 2 1 7 6 5 4 3 2 1 P0 P1 P2
  32. KAFKA TOPICS Consumer Group Node #1 6 5 4 3

    2 1 4 3 2 1 7 6 5 4 3 2 1 P0 P1 P2 Node #2
  33. STREAMS ✈ • "Data in flight" • Unbounded, continuously updating

    data set • Ordered, replayable, sequence of immutable data key-value pairs
  34. TABLES • "Data at rest" • A collection of evolving

    facts • A point-in-time view of aggregated data
  35. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1
  36. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1)
  37. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1)
  38. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2)
  39. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1
  40. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1 User Pageviews alice 1 charlie 1
  41. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1 User Pageviews alice 1 charlie 1 User Pageviews alice 2 charlie 1
  42. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing
  43. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations
  44. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations • Embedded in your app
  45. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations • Embedded in your app • Elastic, scaleable, fault-tolerant
  46. PROCESSOR API • The most low-level • Interact with state-stores,

    schedulers, etc. • All standard operations are implemented like this (map / filter / …) • Create your own custom processing logic
  47. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") • Programmatically describe your topology
  48. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines • Programmatically describe your topology
  49. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) • Programmatically describe your topology
  50. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) • Programmatically describe your topology
  51. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) • Programmatically describe your topology
  52. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) wordCounts.toStream.to("WordsWithCountsTopic") • Programmatically describe your topology
  53. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) wordCounts.toStream.to("WordsWithCountsTopic") • Programmatically describe your topology
  54. KSQL • SQL dialect for streaming data CREATE TABLE possible_fraud

    AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  55. COMMAND HANDLER command-results invoice-events flatMap: events map: results commands stream

    invoice-commands aggregate: snapshots Snapshots state-store events stream
  56. COMMAND HANDLER command-results invoice-events flatMap: events map: results commands stream

    invoice-commands invoice-snapshots aggregate: snapshots Snapshots state-store events stream
  57. COMMAND HANDLER command-results invoice-events flatMap: events map: results commands stream

    invoice-commands invoice-snapshots aggregate: snapshots Snapshots state-store events stream
  58. COMMAND HANDLER val builder = new StreamsBuilder val snapshots: KTable[UUID,

    Snapshot[Invoice]] = builder .stream("invoice-events") .groupByKey() .aggregate( reducer.initializer, reducer.aggregator, Materialized.as("snapshots-store"))
  59. WINS • Simple and declarative system • Eventual consistency is

    handled gracefully • Easy to add or change views
  60. WINS • Simple and declarative system • Eventual consistency is

    handled gracefully • Easy to add or change views • Benefits of event sourcing + scalability and fault- tolerance properties of Kafka
  61. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scaleable systems
  62. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scaleable systems • Know your tradeoffs (consistency guarantees,
 schema evolution, data integrity, …)
  63. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scaleable systems • Know your tradeoffs (consistency guarantees,
 schema evolution, data integrity, …) • Kafka & Kafka Streams are powerful tools that can be employed in many use cases
  64. RESOURCES • Demo code:
 https://github.com/amitayh/event-sourcing-kafka-streams • Event sourcing by Greg

    Young -
 https://youtu.be/8JKjvY4etTY • Kafka Streams - http://wix.to/00C2ADs • Blog post from Confluent - http://wix.to/Z0C2ADs
  65. Q&A