Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Event Sourcing with Kafka Streams

Event Sourcing with Kafka Streams

Avatar for Amitay Horwitz

Amitay Horwitz

May 19, 2018
Tweet

More Decks by Amitay Horwitz

Other Decks in Programming

Transcript

  1. ~$ whois • Software engineer @ Wix • Functional programming,


    distributed systems, TDD, … • • @amitayh on Twitter, GitHub, et al
  2. • 120M < Users • 2,100 < Employees • 700

    < Engineers • 600 < Micro-services in production
  3. AGENDA ✅ • Event sourcing 101 • Eventim • Kafka

    & Kafka Streams • Putting it all together
  4. A SERVICE IS BORN ✒ • Wix Invoices was incepted

    in mid 2015 • Rich domain model • Auditing is important for monetary products
  5. invoices invoice_id customer_id issue_date due_date sent_date currency status line_items line_item_id

    invoice_id description quantity price customers customer_id name email address payments transaction_id invoice_id payment_type payment_amount taxes tax_id line_item_id name rate OBJECT RELATIONAL IMPEDANCE MISMATCH
  6. MUTABLE STATE • Instead of saving the current state, we

    save the succession of events that brought us to this state
  7. MUTABLE STATE • Instead of saving the current state, we

    save the succession of events that brought us to this state • currentState =
 events.foldLeft(empty) { (state, event) =>
 state apply event
 }
  8. Invoice created { "customer": {...}, "issueDate": "2018-01-01", "dueDate": "2018-02-01", "lineItems:

    [{"price": 1.99, "qty": 1}], "status": "DRAFT" } Line item added time INVOICE LIFECYCLE
  9. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 1.99, "qty": 1}, {"price": 3.50, "qty": 2}], "status": "DRAFT" } Line item added Line item added time
  10. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "DRAFT" } Line item added Line item added Line item removed time
  11. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "SENT" } Line item added Line item added Line item removed Invoice sent to customer time
  12. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "PAID" } Line item added Line item added Line item removed Invoice sent to customer Payment received time
  13. DESIGN GOALS • Small and simple library • Non-intrusive •

    Maintain data integrity • Easily add custom views
  14. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Validate command payload
  15. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Handler #1 … Command Dispatcher Handler #2 Handler #n Validate command payload
  16. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Handler #1 … Command Dispatcher Handler #2 Handler #n Validate command payload PartialFunction[DomainCommand, Unit]
  17. WRITE PATH ✏ User Interface Command JSON Command DTO Decoder

    Domain Command Handler #1 … Command Dispatcher Handler #2 Handler #n Validate command payload PartialFunction[DomainCommand, Unit] Function[DomainCommand, Unit]
  18. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events Current
 aggregate Command
  19. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events Current
 aggregate Command
 events Command
  20. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events New Aggregate Current
 aggregate Command
 events Command
  21. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events New Aggregate Current
 aggregate Command
 events Publish events (optimistic locking) Command
  22. WRITE PATH ✏ Event Sourced
 Command Handler Aggregare
 Repository Load

    Event
 Store Get events New Aggregate Current
 aggregate Command
 events Publish events (optimistic locking) Event Bus Command
  23. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Event Bus Events
  24. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Event Bus Events User Interface
  25. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Queries Event Bus Events User Interface
  26. PAIN POINTS • Despite of the simple design, became quite

    a big library • Inherent eventual consistency is not integrated in the product (read after write) • Rebuilding views is a complex operation
  27. REBUILDING VIEWS View Projector #1 Event Handler DB #1 Event

    Bus Events View Projector #2 Event Handler DB #2
  28. REBUILDING VIEWS View Projector #1 Event Handler DB #1 Event

    Bus Events View Projector #2 Event Handler DB #2 User Interface Queries
  29. APACHE KAFKA • Distributed append-only log • Replicated, fault-tolerant •

    Often used as pub-sub or queue • Used heavily at LinkedIn, Netflix, Wix and many others
  30. KAFKA TOPICS 6 5 4 3 2 1 4 3

    2 1 7 6 5 4 3 2 1 P0 P1 P2 Producer
  31. KAFKA TOPICS Consumer Group Node #1 6 5 4 3

    2 1 4 3 2 1 7 6 5 4 3 2 1 P0 P1 P2
  32. KAFKA TOPICS Consumer Group Node #1 6 5 4 3

    2 1 4 3 2 1 7 6 5 4 3 2 1 P0 P1 P2 Node #2
  33. STREAMS ✈ • "Data in flight" • Unbounded, continuously updating

    data set • Ordered, replayable, sequence of immutable data key-value pairs
  34. TABLES • "Data at rest" • A collection of evolving

    facts • A point-in-time view of aggregated data
  35. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1
  36. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1)
  37. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1)
  38. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2)
  39. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1
  40. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1 User Pageviews alice 1 charlie 1
  41. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1 User Pageviews alice 1 charlie 1 User Pageviews alice 2 charlie 1
  42. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing
  43. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations
  44. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations • Embedded in your app
  45. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations • Embedded in your app • Elastic, scaleable, fault-tolerant
  46. PROCESSOR API • The most low-level • Interact with state-stores,

    schedulers, etc. • All standard operations are implemented like this (map / filter / …) • Create your own custom processing logic
  47. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") • Programmatically describe your topology
  48. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines • Programmatically describe your topology
  49. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) • Programmatically describe your topology
  50. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) • Programmatically describe your topology
  51. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) • Programmatically describe your topology
  52. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) wordCounts.toStream.to("WordsWithCountsTopic") • Programmatically describe your topology
  53. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) wordCounts.toStream.to("WordsWithCountsTopic") • Programmatically describe your topology
  54. KSQL • SQL dialect for streaming data CREATE TABLE possible_fraud

    AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  55. COMMAND HANDLER command-results invoice-events flatMap: events map: results commands stream

    invoice-commands aggregate: snapshots Snapshots state-store events stream
  56. COMMAND HANDLER command-results invoice-events flatMap: events map: results commands stream

    invoice-commands invoice-snapshots aggregate: snapshots Snapshots state-store events stream
  57. COMMAND HANDLER command-results invoice-events flatMap: events map: results commands stream

    invoice-commands invoice-snapshots aggregate: snapshots Snapshots state-store events stream
  58. COMMAND HANDLER val builder = new StreamsBuilder val snapshots: KTable[UUID,

    Snapshot[Invoice]] = builder .stream("invoice-events") .groupByKey() .aggregate( reducer.initializer, reducer.aggregator, Materialized.as("snapshots-store"))
  59. WINS • Simple and declarative system • Eventual consistency is

    handled gracefully • Easy to add or change views
  60. WINS • Simple and declarative system • Eventual consistency is

    handled gracefully • Easy to add or change views • Benefits of event sourcing + scalability and fault- tolerance properties of Kafka
  61. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scaleable systems
  62. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scaleable systems • Know your tradeoffs (consistency guarantees,
 schema evolution, data integrity, …)
  63. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scaleable systems • Know your tradeoffs (consistency guarantees,
 schema evolution, data integrity, …) • Kafka & Kafka Streams are powerful tools that can be employed in many use cases
  64. RESOURCES • Demo code:
 https://github.com/amitayh/event-sourcing-kafka-streams • Event sourcing by Greg

    Young -
 https://youtu.be/8JKjvY4etTY • Kafka Streams - http://wix.to/00C2ADs • Blog post from Confluent - http://wix.to/Z0C2ADs
  65. Q&A