Kafka, the hard parts

Kafka, the hard parts

This talk tries to summarize a lot of the lessons I've learned building systems on kafka.

06f8b41980eb4c577fa40c41d5030c19?s=128

Chris Keathley

January 10, 2019
Tweet

Transcript

  1. 16.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  2. 17.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  3. 18.
  4. 19.
  5. 37.
  6. 40.
  7. 43.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  8. 44.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  9. 76.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  10. 77.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  11. 106.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  12. 107.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  13. 114.

    0 <= 1 <= n Delivery At least once At

    most once Impossible-ish
  14. 117.
  15. 118.

    You

  16. 122.

    Idempotence: …the property of certain operations in mathematics and computer

    science whereby they can be applied multiple times without changing the result beyond the initial application.
  17. 156.

    smtp send_email Sending Emails email id: 1 email id: 2

    email id: 3 What do we do if this fails?
  18. 157.
  19. 168.

    send_email Sending Emails email id: 1 If we see this

    message again move it to an audit topic
  20. 169.
  21. 171.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  22. 172.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  23. 178.

    { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads None of this tells you anything useful about your data
  24. 179.

    { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads What do we do when these things change?
  25. 180.

    { msg_id: String, type: String, data: { user_id: String, msg:

    String } } Data payloads What do we do when these things change?
  26. 181.

    { msg_id: String, type: String, data: { user_id: String, msg:

    String } } Data payloads Lets just use versions!
  27. 182.

    { msg_id: String, type: String, data: { user_id: String, msg:

    String } } Data payloads Lets just use versions! (spoiler: this isn’t great)
  28. 184.

    { msg_id: String, type: String, data: { user_id: String, msg:

    String }, meta: { version: 2 } } Data payloads
  29. 186.

    Data Versions Consumer v1 v1 v1 v1 v2 This consumer

    needs to understand both versions
  30. 187.

    Data Versions Consumer v1 v1 v1 v1 v2 This team

    needs to know to make these changes
  31. 194.

    { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads What are these?
  32. 196.

    { msg_id: String, type: String, data: { user_id: Integer, msg:

    String } } Data payloads What are these?
  33. 197.
  34. 199.

    UUID = string? & re_matches?(/^[0-9A-F]{8}-[0-9A-F] {4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i) ) CommentCreated = schema{

    req :msg_id, UUID req :type, lit(“comment.created”) req :data, schema { req :user_id, integer? | UUID req :msg, string? } } Data payloads
  35. 200.

    json = {type: “comment.created”, msg: “Hello world”} Norm.decode(CommentEvent, json) =>

    {:ok, data} Norm.decode(CommentEvent, {}) => {:error, errors}
 Norm.explain(CommentEvent, {}) => "In :msg_id, val: {} fails spec: required In :type, val: {} fails spec: required In :data, val: {} fails spec: required" Data payloads
  36. 202.

    CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }

    json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible
  37. 203.

    CommentEvent = schema{ req :type, lit(“comment.created”) req :msg, string? }

    json = { type: “comment.created”, msg: “Hello world”, data: { msg: “Hello world” } } Norm.decode(CommentEvent, json) => {:ok, data} Norm is extensible This will still get passed through
  38. 204.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  39. 205.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  40. 209.

    Property based testing Database Consumer id: 1 id: 2 id:

    3 id: 1 Information should end up here
  41. 210.

    Property based testing Database Consumer id: 1 id: 2 id:

    3 id: 1 Some combination of these messages causes a failure
  42. 212.

    Property based testing Database id: 1 id: 1 Looks like

    we aren’t handling duplicates correctly Consumer
  43. 216.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  44. 217.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Finding Errors Monitoring Capacity Planning #hottakes
  45. 222.
  46. 224.
  47. 226.
  48. 232.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
  49. 233.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
  50. 237.

    Calculating partions partitions < 100 x brokers x replication factor

    source: https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster
  51. 242.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes
  52. 243.

    Lets talk about… Kafka Terminology Maintaining Order Errors Distributed Systems

    and the joys of functional programming Data Validation Monitoring Capacity Planning #hottakes