Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building event sourced systems with Kafka Streams

Avatar for Amitay Horwitz Amitay Horwitz
November 30, 2018
950

Building event sourced systems with Kafka Streams

Slides from my talk "Building event sourced systems with Kafka Streams" in Codemotion Milan 2018

https://milan2018.codemotionworld.com

Avatar for Amitay Horwitz

Amitay Horwitz

November 30, 2018
Tweet

Transcript

  1. AGENDA ✅ • Event sourcing 101 • Eventim • Kafka

    & Kafka Streams • Putting it all together
  2. A SERVICE IS BORN ✒ • Wix Invoices was incepted

    in mid 2015 • Rich domain model • Auditing is important for monetary products
  3. invoices invoice_id customer_id issue_date due_date sent_date currency status line_items line_item_id

    invoice_id description quantity price customers customer_id name email address payments transaction_id invoice_id payment_type payment_amount taxes tax_id name rate NAÏVE SOLUTION tax_to_line line_item_id tax_id
  4. HOW DO YOU HYDRATE YOUR DOMAIN OBJECTS? SELECT e.employee_id AS

    "Employee #" , e.first_name || ' ' || e.last_name AS "Name" , e.email AS "Email" , e.phone_number AS "Phone" , TO_CHAR(e.hire_date, 'MM/DD/YYYY') AS "Hire Date" , TO_CHAR(e.salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Salary" , e.commission_pct AS "Comission %" , 'works as ' || j.job_title || ' in ' || d.department_name || ' department (manager: ' || dm.first_name || ' ' || dm.last_name || ') and immediate supervisor: ' || m.first_name || ' ' || m.last_name AS "Current Job" , TO_CHAR(j.min_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') || ' - ' || TO_CHAR(j.max_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Current Salary" , l.street_address || ', ' || l.postal_code || ', ' || l.city || ', ' || l.state_province || ', ' || c.country_name || ' (' || r.region_name || ')' AS "Location" , jh.job_id AS "History Job ID" , 'worked from ' || TO_CHAR(jh.start_date, 'MM/DD/YYYY') || ' to ' || TO_CHAR(jh.end_date, 'MM/DD/YYYY') || ' as ' || jj.job_title || ' in ' || dd.department_name || ' department' AS "History Job Title" FROM employees e -- to get title of current job_id JOIN jobs j ON e.job_id = j.job_id -- to get name of current manager_id LEFT JOIN employees m ON e.manager_id = m.employee_id -- to get name of current department_id LEFT JOIN departments d ON d.department_id = e.department_id -- to get name of manager of current department -- (not equal to current manager and can be equal to the employee itself) LEFT JOIN employees dm ON d.manager_id = dm.employee_id -- to get name of location
  5. HOW DO YOU HYDRATE YOUR DOMAIN OBJECTS? SELECT e.employee_id AS

    "Employee #" , e.first_name || ' ' || e.last_name AS "Name" , e.email AS "Email" , e.phone_number AS "Phone" , TO_CHAR(e.hire_date, 'MM/DD/YYYY') AS "Hire Date" , TO_CHAR(e.salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Salary" , e.commission_pct AS "Comission %" , 'works as ' || j.job_title || ' in ' || d.department_name || ' department (manager: ' || dm.first_name || ' ' || dm.last_name || ') and immediate supervisor: ' || m.first_name || ' ' || m.last_name AS "Current Job" , TO_CHAR(j.min_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') || ' - ' || TO_CHAR(j.max_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Current Salary" , l.street_address || ', ' || l.postal_code || ', ' || l.city || ', ' || l.state_province || ', ' || c.country_name || ' (' || r.region_name || ')' AS "Location" , jh.job_id AS "History Job ID" , 'worked from ' || TO_CHAR(jh.start_date, 'MM/DD/YYYY') || ' to ' || TO_CHAR(jh.end_date, 'MM/DD/YYYY') || ' as ' || jj.job_title || ' in ' || dd.department_name || ' department' AS "History Job Title" FROM employees e -- to get title of current job_id JOIN jobs j ON e.job_id = j.job_id -- to get name of current manager_id LEFT JOIN employees m ON e.manager_id = m.employee_id -- to get name of current department_id LEFT JOIN departments d ON d.department_id = e.department_id -- to get name of manager of current department -- (not equal to current manager and can be equal to the employee itself) LEFT JOIN employees dm ON d.manager_id = dm.employee_id -- to get name of location
  6. MUTABLE STATE • Instead of saving the current state, we

    save the succession of events that brought us to this state
  7. MUTABLE STATE • Instead of saving the current state, we

    save the succession of events that brought us to this state • currentState = fold(events, emptyState)
  8. Invoice created { "customer": {...}, "issueDate": "2018-01-01", "dueDate": "2018-02-01", "lineItems:

    [{"price": 1.99, "qty": 1}], "status": "DRAFT" } Line item added time INVOICE LIFECYCLE
  9. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 1.99, "qty": 1}, {"price": 3.50, "qty": 2}], "status": "DRAFT" } Line item added Line item added time
  10. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "DRAFT" } Line item added Line item added Line item removed time
  11. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "SENT" } Line item added Line item added Line item removed Invoice sent to customer time
  12. Invoice created INVOICE LIFECYCLE { "customer": {...}, "issueDate": "2018-01-01", "dueDate":

    "2018-02-01", "lineItems: [{"price": 3.50, "qty": 2}], "status": "PAID" } Line item added Line item added Line item removed Invoice sent to customer Payment received time
  13. DESIGN GOALS • Small and simple library • Non-intrusive •

    Maintain data integrity • Easily add custom views
  14. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    invoice ID: 12345
 expected version: 5
 amount: $12.34
 Add Payment
  15. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    invoice ID: 12345
 expected version: 5
 amount: $12.34
 Add Payment Aggregare
 Repository Load
  16. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    invoice ID: 12345
 expected version: 5
 amount: $12.34
 Add Payment Aggregare
 Repository Load Event
 Store Get events
  17. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    invoice ID: 12345
 expected version: 5
 amount: $12.34
 Add Payment Aggregare
 Repository Load Event
 Store Get events version: 5
 customer: {...}
 line items: [...]
 balance: $12.34 Invoice #12345
  18. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events invoice ID: 12345
 expected version: 5
 amount: $12.34 Add Payment version: 5
 customer: {...}
 line items: [...]
 balance: $12.34 Invoice #12345
  19. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events invoice ID: 12345
 expected version: 5
 amount: $12.34 Add Payment version: 5
 customer: {...}
 line items: [...]
 balance: $12.34 Invoice #12345
  20. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events
  21. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events Payment Added: $12.34
  22. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events Payment Added: $12.34 Status Changed: Paid
  23. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events Payment Added: $12.34 Status Changed: Paid version: 7
 customer: {...}
 line items: [...]
 balance: $0.00 Invoice #12345
  24. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events Payment Added: $12.34 Status Changed: Paid Publish events (OCC) version: 7
 customer: {...}
 line items: [...]
 balance: $0.00 Invoice #12345
  25. WRITE PATH ✏ Event Sourced
 Command Handler Command User Interface

    Aggregare
 Repository Load Event
 Store Get events Payment Added: $12.34 Status Changed: Paid Publish events (OCC) Event Bus
  26. READ PATH View Projector Event Handler DB Event Bus Events

    invoice_id customer balance status 12345 John Doe $12.34 New 67890 Jane Doe $34.56 Sent
  27. READ PATH View Projector Event Handler DB Event Bus Events

    invoice_id customer balance status 12345 John Doe $12.34 New 67890 Jane Doe $34.56 Sent Payment Added: $12.34
  28. READ PATH View Projector Event Handler DB Event Bus Events

    invoice_id customer balance status 12345 John Doe $0.00 New 67890 Jane Doe $34.56 Sent
  29. READ PATH View Projector Event Handler DB Event Bus Events

    invoice_id customer balance status 12345 John Doe $0.00 New 67890 Jane Doe $34.56 Sent Status Changed: Paid
  30. READ PATH View Projector Event Handler DB Event Bus Events

    invoice_id customer balance status 12345 John Doe $0.00 Paid 67890 Jane Doe $34.56 Sent
  31. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    DB Queries Event Bus Events User Interface
  32. READ PATH Mailer ✉ Event Handler View Projector Event Handler

    Reporting Event Handler DB Queries Event Bus Events User Interface
  33. PAIN POINTS • Persisting events and publishing them is not

    atomic • Inherent eventual consistency is not integrated in the product (read after write)
  34. PAIN POINTS • Persisting events and publishing them is not

    atomic • Inherent eventual consistency is not integrated in the product (read after write) • Rebuilding views is a complex operation
  35. REBUILDING VIEWS invoice_id invoice_version event_payload 12345 1 InvoiceCreated 12345 2

    LineItemAdded 67890 1 InvoiceCreated 67890 2 InvoiceDeleted
  36. REBUILDING VIEWS invoice_id invoice_version event_payload 12345 1 InvoiceCreated 12345 2

    LineItemAdded 12345 3 InvoiceSent 67890 1 InvoiceCreated 67890 2 InvoiceDeleted
  37. REBUILDING VIEWS invoice_id version payload timestamp 12345 1 ... 14:05

    12345 2 ... 14:06 12345 3 ... 15:50 67890 1 ... 15:30 67890 2 ... 15:33
  38. REBUILDING VIEWS invoice_id version payload timestamp 12345 1 ... 14:05

    12345 2 ... 14:06 67890 1 ... 15:30 67890 2 ... 15:33 12345 3 ... 15:50
  39. REBUILDING VIEWS invoice_id version payload order 12345 1 ... 1

    12345 2 ... 2 67890 1 ... 3 67890 2 ... 4 12345 3 ... 5
  40. APACHE KAFKA • Distributed append-only log • Replicated, fault-tolerant •

    Often used as pub-sub or queue • Used heavily at LinkedIn, Netflix, Wix and many others
  41. KAFKA TOPICS 6 5 4 3 2 1 4 3

    2 1 7 6 5 4 3 2 1 P0 P1 P2 Producer
  42. KAFKA TOPICS Consumer Group Node #1 6 5 4 3

    2 1 4 3 2 1 7 6 5 4 3 2 1 P0 P1 P2
  43. KAFKA TOPICS Consumer Group Node #1 6 5 4 3

    2 1 4 3 2 1 7 6 5 4 3 2 1 P0 P1 P2 Node #2
  44. STREAMS ✈ • "Data in flight" • Unbounded, continuously updating

    data set • Ordered, replayable, sequence of immutable data key-value pairs
  45. TABLES • "Data at rest" • A collection of evolving

    facts • A point-in-time view of aggregated data
  46. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1
  47. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1)
  48. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1)
  49. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2)
  50. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1
  51. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1 User Pageviews alice 1 charlie 1
  52. STREAM-TABLE DUALITY User Pageviews alice 1 User Pageviews alice 1

    charlie 1 User Pageviews alice 2 charlie 1 ("alice", 1) ("charlie", 1) ("alice", 2) User Pageviews alice 1 User Pageviews alice 1 charlie 1 User Pageviews alice 2 charlie 1
  53. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing
  54. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations
  55. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations • Embedded in your app
  56. STREAM PROCESSING APP Streams API Your app • Transforms and

    enriches data • Stateless / stateful processing • Supports windowing operations • Embedded in your app • Elastic, scaleable, fault-tolerant
  57. PROCESSOR API • The most low-level • Interact with state-stores,

    schedulers, etc. • All standard operations are implemented like this (map / filter / …) • Create your own custom processing logic!
  58. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") • Programmatically describe your topology
  59. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines • Programmatically describe your topology
  60. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) • Programmatically describe your topology
  61. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) • Programmatically describe your topology
  62. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) • Programmatically describe your topology
  63. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) wordCounts.toStream.to("WordsWithCountsTopic") • Programmatically describe your topology
  64. STREAMS DSL val builder = new StreamsBuilder val textLines: KStream[String,

    String] = builder.stream("TextLinesTopic") val wordCounts: KTable[String, Long] = textLines .flatMapValues(textLine => textLine.split("\\W+")) .groupBy((_, word) => word) .count(Materialized.as("counts-store")) wordCounts.toStream.to("WordsWithCountsTopic") • Programmatically describe your topology
  65. KSQL • SQL dialect for streaming data CREATE TABLE possible_fraud

    AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  66. COMMAND HANDLER transform: results commands stream invoice-commands Snapshots state-store invoice

    ID: 12345
 command ID: 67890
 amount: $12.34 Add Payment customer: {...}
 line items: [...]
 balance: $12.34 Invoice #12345
  67. COMMAND HANDLER transform: results commands stream invoice-commands Snapshots state-store invoice

    ID: 12345
 command ID: 67890
 amount: $12.34 Add Payment customer: {...}
 line items: [...]
 balance: $12.34 Invoice #12345
  68. COMMAND HANDLER transform: results commands stream invoice-commands Snapshots state-store Payment

    Added: $12.34 Status Changed: Paid customer: {...}
 line items: [...]
 balance: $0.00 Invoice #12345
  69. COMMAND HANDLER command-results invoice-events filter: successful transform: results commands stream

    invoice-commands Snapshots state-store flatMap: events map: snapshots invoice-snapshots
  70. COMMAND HANDLER command-results invoice-events filter: successful transform: results commands stream

    invoice-commands Snapshots state-store flatMap: events map: snapshots invoice-snapshots Long retention (forever) Compacted
  71. WINS • Simple and declarative system • Atomicity - Kafka

    used as event-store + notification • Eventual consistency is handled gracefully
  72. WINS • Simple and declarative system • Atomicity - Kafka

    used as event-store + notification • Eventual consistency is handled gracefully • Easy to add or change views
  73. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scalable systems
  74. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scalable systems • Know your tradeoffs (consistency, schema evolution, debugging, error handling, …)
  75. TAKEAWAYS • Event driven systems and event sourcing can help

    create very flexible and scalable systems • Know your tradeoffs (consistency, schema evolution, debugging, error handling, …) • Kafka & Kafka Streams are powerful tools that can be employed in many use cases
  76. RESOURCES • Demo code:
 https://github.com/amitayh/event-sourcing-kafka-streams • Event sourcing by Greg

    Young -
 https://youtu.be/8JKjvY4etTY • Martin Kleppmann - Is Kafka a Database?
 https://youtu.be/v2RJQELoM6Y • Kafka Streams docs - http://wix.to/00C2ADs • Blog post from Confluent - http://wix.to/Z0C2ADs
  77. Q&A