Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Kafka — The Hard Parts

SQUER Solutions
June 14, 2022
170

Apache Kafka — The Hard Parts

Vienna Apache Kafka® Meetup by Confluent

SQUER Solutions

June 14, 2022
Tweet

Transcript

  1. Apache Kafka
    @duffleit
    THE HARD PARTS

    View Slide

  2. David Leitner
    @duffleit
    Coding Architect
    [email protected]
    @duffleit

    View Slide

  3. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''

    View Slide

  4. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Producer

    View Slide

  5. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Consumer Consumer

    View Slide

  6. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Consumer Consumer
    The number of partitions is the
    limiting factor for consumer instances.

    View Slide

  7. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Consumer Consumer
    The number of partitions is the
    limiting factor for consumer instances.
    User: Bob User: Alice User: Tim
    selection of partition:
    = hash(key) % #ofpartitions
    User: Bob

    View Slide

  8. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Consumer Consumer
    The number of partitions is the
    limiting factor for consumer instances.
    Partition D
    Partition ''
    Partition '
    User: Bob User: Alice User: Tim
    selection of partition:
    = hash(key) % 3 to 4
    User: Bob
    User: Bob
    User: Bob
    User: Alice
    Topic.v2

    View Slide

  9. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Consumer Consumer
    The number of partitions is the limiting
    factor for consumer instances.
    Select is wisely & over partition a bit.
    Partition D
    Partition ''
    Partition '
    User: Bob User: Alice User: Tim
    selection of partition:
    = hash(key) % 3 to 4
    User: Bob
    User: Bob
    User: Bob
    User: Alice
    Topic.v2
    Select something that can be devided
    by multipe numbers. e.g. 6, 12, 24, ...

    View Slide

  10. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: delete

    View Slide

  11. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: delete
    2 weeks
    or some size

    View Slide

  12. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: delete
    2 weeks

    View Slide

  13. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: compact
    we only keep the latest record for a specific key.

    View Slide

  14. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: compact
    we only keep the latest record for a specific key.
    How Kafka Stores Data
    userchanges.active.segment
    📄 userchanges.segment.1 userchanges.segment.2
    log.segment.bytes = 1GB
    Active
    Compaction
    ⚙ Compaction

    View Slide

  15. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: compact
    we only keep the latest record for a specific key.
    How Kafka Stores Data
    userchanges.active.segment
    📄 userchanges.segment.1 userchanges.segment.2
    log.segment.bytes = 1GB
    Active
    Compaction
    ⚙ Compaction

    View Slide

  16. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: compact
    we only keep the latest record for a specific key.
    How Kafka Stores Data
    userchanges.active.segment
    📄 userchanges.segment.1 userchanges.segment.2
    log.segment.bytes = 1GB
    Active
    Compaction
    ⚙ Compaction

    log.segment.ms = 1week
    Especially in GDPR
    related usecases think
    explicitly about
    segement-size and
    roll-time.

    View Slide

  17. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: compact
    Tombstone: Tim
    delete.retention.ms = 1day
    Slow Consumer
    needs more than one day to read all events from the topic
    that starts new.
    Tombstone: Tim

    View Slide

  18. The Basics
    Topic: UserChanges
    User: Bob User: Alice User: Tim
    User: Bob User: Bob
    log.cleanup.policy: compact
    Tombstone: Tim
    delete.retention.ms = 1day
    Slow Consumer
    needs more than one day to read all events from the topic
    that starts new.
    User: Tim
    Keep delete.retention in
    sync with the given
    topic retention.

    View Slide

  19. The Basics
    Cluster
    Node A Node B Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''

    View Slide

  20. The Basics
    Cluster
    Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Node B
    Node A
    Node F
    Node E
    Node D
    AZ 1
    AZ 2

    View Slide

  21. The Basics
    Cluster
    Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Node B
    Node A
    Node F
    Node E
    Node D
    AZ 1
    AZ 2
    Partition D Partition E Partition F
    Partition D'
    Partition D''
    Partition E'
    Partition E''
    Partition F'
    Partition F''
    KIP-36: Rack aware
    replica assignment

    View Slide

  22. Cluster
    The Basics
    Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Node B
    Node A
    Node F
    Node E
    Node D
    AZ 1
    AZ 2
    Partition D Partition E Partition F
    Partition D'
    Partition D''
    Partition E'
    Partition E''
    Partition F'
    Partition F''
    KIP-36: Rack aware
    replica assignment

    View Slide

  23. Multi Region

    View Slide

  24. Cluster
    Multi Region?
    Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Node B
    Node A
    Node F
    Node E
    Node D
    Region 1
    Region 2
    Partition D'
    Partition D''
    Partition E'
    Partition E''
    Partition F'
    Partition F''
    Partition A'
    Partition A''
    Partition B'
    Partition B''
    Partition C'
    Partition C''
    Partition D Partition E Partition F
    Usually the latency
    between multiple
    regions is to big to
    span a single
    cluster over it

    View Slide

  25. Cluster Region West
    Cluster Region East
    Multi Region?
    Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    Partition A Partition B Partition C
    Node B
    Node A
    Node F
    Node E
    Node D
    Region 1
    Region 2
    Partition D'
    Partition D''
    Partition E'
    Partition E''
    Partition F'
    Partition F''

    View Slide

  26. Cluster Region West
    Cluster Region East
    Multi Region?
    Node C
    Producer
    Consumer Consumer
    Consumer Group
    Topic
    A B C
    Node B
    Node A
    Node F
    Node E
    Node D
    Region 1
    Region 2
    A B C

    View Slide

  27. Cluster Region West
    Cluster Region East
    Multi Region?
    Node C
    Producer East
    Consumer Consumer
    Consumer Group
    Topic
    A B C
    Node B
    Node A
    Node F
    Node E
    Node D
    Region 1
    Region 2
    A B C
    Producer West

    View Slide

  28. Cluster Region West — "west"
    Cluster Region East — "east"
    Multi Region?
    Node C
    Producer East
    Consumer Consumer
    Consumer Group
    Topic
    A B C
    Node B
    Node A
    Node F
    Node E
    Node D
    Region 1
    Region 2
    A B C
    Producer West
    Mirror
    Maker 2
    east.C
    west.C
    *.C
    Ordering
    Guarantees?!

    View Slide

  29. @duffleit
    Order is Guaranteed
    in a single Partition.
    Are you sure?

    View Slide

  30. @duffleit
    Producer
    Partition
    max.in.flight.requests.per.connection = 5
    Message A
    Message B
    Message A
    retry
    Message B Message A
    retries = MAX_INT

    View Slide

  31. @duffleit
    Producer
    Partition
    max.in.flight.requests.per.connection = 5
    Message A
    Message B
    Message A
    retry
    Message B Message A
    Legacy Solution:
    max.in.flight.requests.per.connection = 1
    State-of-the-Art Solution:
    enable.idempotence = true
    retries = MAX_INT
    max.in.flight.requests.per.connection = 5
    acks = all
    SEQ#: 1
    SEQ#: 2
    OutOfOrderSequenceException
    SEQ#: 2
    If you don't want to set your retries to invinite
    prefer "delivery.timeout.ms" over "retries".

    View Slide

  32. Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer
    acks = none
    acks = 1
    acks = all
    min.insync.replicas = 3
    @duffleit

    View Slide

  33. Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer
    acks = none
    acks = 1
    acks = all
    min.insync.replicas = 2
    @duffleit

    View Slide

  34. min.insync.replicas = 2
    Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer acks = all
    @duffleit
    CAP Theorem
    Consistency
    Availabiltiy Paritioning

    min.insync.replicas++
    min.insync.replicas--

    View Slide

  35. min.insync.replicas = 3
    Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer acks = all
    @duffleit
    CAP Theorem
    Consistency
    Availabiltiy Paritioning

    min.insync.replicas++
    min.insync.replicas--

    View Slide

  36. min.insync.replicas = 3
    Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer acks = all
    @duffleit
    CAP Theorem
    Consistency
    Availabiltiy Paritioning

    min.insync.replicas++
    min.insync.replicas--

    View Slide

  37. min.insync.replicas = 2
    Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer acks = all
    @duffleit
    CAP Theorem
    Consistency
    Availabiltiy Paritioning

    min.insync.replicas++
    min.insync.replicas--

    View Slide

  38. min.insync.replicas = 2
    Node A Node B Node C
    Topic
    Partition A Partition B Partition C
    Partition C'
    Partition B''
    Partition A'
    Partition C''
    Partition B'
    Partition A''
    Producer acks = all
    @duffleit
    CAP Theorem
    Consistency
    Availabiltiy Paritioning

    min.insync.replicas++
    min.insync.replicas--
    Possible Data Loss
    There is no "ad-hoc fsync" by default. Can be
    configured via "log.default.flush.interval.ms"

    View Slide

  39. @duffleit
    Keep in mind that rack
    assignment is ingnored for
    insync replicas.
    Node C
    Node B
    Node A
    Node D Node E Node F
    AZ 1
    AZ 2
    Text . .
    . . .
    replicas=6
    min.insync.replicas = 4

    View Slide

  40. @duffleit
    Keep in mind that rack
    assignment is ingnored for
    insync replicas.
    Node C
    Node B
    Node A
    Node D Node E Node F
    AZ 1
    AZ 2
    Text . .
    . .
    replicas=5
    min.insync.replicas = 4
    fail on > 1

    View Slide

  41. @duffleit
    Keep in mind that rack
    assignment is ingnored for
    insync replicas.
    Node C
    Node B
    Node A
    Node D Node E Node F
    Node G Node H Node I
    AZ 1
    AZ 2
    AZ 3
    Text . .
    . . .
    . . .
    replicas=9
    min.insync.replicas = 7

    View Slide

  42. @duffleit
    @duffleit
    Let's talk about
    Lost Messages

    View Slide

  43. @duffleit
    Partition
    Message A
    Consumer
    Message B
    Message C
    enable.auto.commit=true
    auto.commit.interval.ms=5_SEC

    View Slide

  44. @duffleit
    Partition
    Message A
    Consumer
    Message B
    Message C
    enable.auto.commit=true
    auto.commit.interval.ms=5_SEC
    Auto-Commit: A,B,C

    View Slide

  45. @duffleit
    Partition
    Message D
    Consumer
    enable.auto.commit=true
    auto.commit.interval.ms=5_SEC
    Message A
    Message B
    Message C
    enable.auto.commit=false

    View Slide

  46. @duffleit

    View Slide

  47. Producer
    Stream
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Message

    View Slide

  48. Producer
    Stream
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Message Message

    View Slide

  49. Producer
    Stream
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Message Message
    Producer Consumers
    Achieve
    Exactly Once Semantics.
    Transaction to achieve
    Message

    View Slide

  50. Producer
    Stream
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Message Message
    Producer Consumers
    Achieve
    Exactly Once Semantics.
    Transaction to achieve
    Message
    Exactly Once.
    Atomicity between multipe
    Topic Operations.
    Transaction to achieve
    Transactions
    Balances
    Producer
    Message

    View Slide

  51. Producer
    Stream
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Message Message
    Producer Consumers
    Achieve
    Exactly Once Semantics.
    Transaction to achieve
    Message
    Exactly Once.
    Atomicity between multipe
    Topic Operations.
    Transaction to achieve
    Transactions
    Balances
    Producer
    Message
    Message

    View Slide

  52. @duffleit
    Onboarding
    UserUpdate
    Stream
    💥

    View Slide

  53. @duffleit
    Onboarding
    UserUpdate
    Stream
    User
    UserEvents
    CDC
    Outbox Pattern

    View Slide

  54. @duffleit
    Onboarding
    UserUpdate
    Stream

    User
    Listen to yourself Pattern.

    View Slide

  55. @duffleit
    Onboarding
    UserUpdated (age: 21)
    Stream
    UserUpdated (age: 21)
    Advertisment
    User (age: 21)
    User (age: 21)
    EventSourcing

    View Slide

  56. @duffleit
    Onboarding
    UserUpdated (age: 22)
    Stream
    UserUpdated (age: 21)
    Advertisment
    User (age: 21)
    User (age: 21)
    UserUpdated (age: 22)
    EventSourcing

    View Slide

  57. @duffleit
    Onboarding
    UserUpdated (age: 22)
    Stream
    UserUpdated (age: 21)
    Advertisment
    User (age: 22)
    User (age: 22)
    UserUpdated (age: 22)
    UserUpdated (age: 23)
    EventSourcing

    View Slide

  58. @duffleit
    Onboarding
    UserUpdated (age: 22)
    Stream
    Global EventSourcing
    UserUpdated (age: 21)
    Advertisment
    User (age: 22)
    User (age: 22)
    UserUpdated (age: 22)
    UserUpdated (age: 23)
    👻 if often breaks information hiding & data isolation.

    View Slide

  59. Stream
    @duffleit
    UpdateUser (age: 21)
    Stream
    Local EventSourcing
    UserAgeChanged (age: 21)
    UserUpdated (age: 21)
    Onboarding
    🔒
    👻
    💅

    View Slide

  60. Producer
    Producer
    Stream
    Consumers
    Achieve
    Exactly Once Semantics.
    Transaction to achieve
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Atomicity between multipe
    Topic Operations.
    Transaction to achieve
    Transactions
    Balances
    Producer
    Message
    Message
    Message
    Outbox Pattern Listen to Yourself Local Eventsourcing
    Message

    View Slide

  61. @duffleit
    "Kafka Transactions"
    Producer Consumers
    Producer
    Producer
    Consumers
    Consumers
    Stream
    Processor
    Stream
    Processor
    Stream
    Processor
    enable.idempotence = true
    isolation.level = read_committed
    Deduplication Inbox

    View Slide

  62. Producer
    Stream
    Achieve
    Exactly Once Semantics.
    Transaction to achieve
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Atomicity between multipe
    Topic Operations.
    Transaction to achieve
    Message
    Outbox Pattern Listen to Yourself Local Eventsourcing
    Deduplication Inbox Idempotency
    Producer Consumers
    Transactions
    Balances
    Producer
    Message
    Message
    Message

    View Slide

  63. Transfers
    Payment Service
    Alice -> Bob
    Alice -10€
    Bob +10€
    Transaction_Coordinator

    View Slide

  64. Transfers
    Payment Service
    Alice -> Bob
    Alice -10€
    Bob +10€
    __transaction_state
    Transaction: ID
    __consumer_offset
    payments: 1
    P1
    P1
    P2
    Transfers P3
    P2 P3 C
    C
    C
    C
    isolation.level=read_committed

    View Slide

  65. Transfers
    Payment Service
    Alice -> Bob
    Alice -10€
    __transaction_state
    Transaction: ID
    __consumer_offset
    payments: 1
    P1
    P1
    P2
    Transfers P3
    P2
    isolation.level=read_committed
    Service
    A
    A
    A
    Transaction: ID2

    View Slide

  66. Producer
    Stream
    Achieve
    Exactly Once Semantics.
    Transaction to achieve
    Consistency Kafka,
    and "Non-Kafka".
    Transaction to achieve
    Atomicity between multipe
    Topic Operations.
    Transaction to achieve
    Message
    Outbox Pattern Listen to Yourself Listen to yourself
    Deduplication Inbox Idempotency
    Producer Consumers
    Transactions
    Balances
    Producer
    Message
    Message
    Message
    Kafka's Exactly-Once Semantics Outbox Pattern

    View Slide

  67. @duffleit

    View Slide

  68. @duffleit
    Producer
    Consumers
    Topic
    📜
    📜
    Producer
    Producer
    Producer
    Producer
    Producer
    Producer
    Producer
    Consumers
    Consumers
    Consumers
    Consumers
    Consumers
    Consumers
    Consumers
    Topic
    Topic
    Topic
    Topic
    Topic
    Topic
    Topic

    View Slide

  69. Consumers
    Consumers
    Consumers
    Consumers
    Consumers
    Consumers
    Consumers
    Producer
    Producer
    Producer
    Producer
    Producer
    Producer
    Producer
    @duffleit
    Producer
    Consumers
    Topic
    📜
    📜
    Topic
    Topic
    Topic
    Topic
    Topic
    Topic
    Topic

    View Slide

  70. @duffleit
    Producer Consumers
    Topic
    Schema
    Registry
    Faulty Message
    Producer
    Producer
    Producer
    Producer
    Producer
    Producer
    Producer
    Broker Side Validation, FTW

    View Slide

  71. @duffleit
    Producer Consumers
    Topic
    Schema
    Registry
    Faulty Message
    Broker Side Validation
    🤚
    Deserialization on Broker
    😱
    MagicByte SubjectId Payload

    Check if
    MagicByte
    Exists.

    Check if
    SubjectId is
    Valid.

    Check if
    Payload
    Matches
    Schema.
    The more to the
    right, the more
    expensive it gets.

    View Slide

  72. @duffleit
    squer.link/broker-side-valdiation-sidecar

    View Slide

  73. @duffleit
    Cluster
    Node A Node B Node C
    Go Proxy Go Proxy Go Proxy
    ⏳ ⏳ ⏳

    View Slide

  74. @duffleit
    Cluster
    Node A Node B Node C
    Go Proxy Go Proxy Go Proxy
    Go Proxy Go Proxy Go Proxy
    Race Condition
    We can no longer guarantee ordering.

    View Slide

  75. @duffleit
    squer.link/broker-side-valdiation-sidecar

    View Slide

  76. Ok,
    Lets sum up.
    @duffleit

    View Slide

  77. @duffleit
    Multi AZ,
    Multi Region,
    Multi Cloud
    Consistency vs.
    Availability
    Disable
    Autocommit!
    Different Options to
    Achieve Transactional
    Guarantees in Kafka
    Broker Side Schema
    Validation
    Segment Size
    Portion Size:
    "over-partition a bit"
    and 200+ more Configuration Properties.
    What we have seen
    👀

    View Slide

  78. @duffleit
    Multi AZ,
    Multi Region,
    Multi Cloud
    Consistency vs.
    Availability
    Disable
    Autocommit!
    Different Options to
    Achieve Transactional
    Guarantees in Kafka
    Broker Side Schema
    Validation
    Segment Size
    Portion Size:
    "over-partition a bit"
    and 200+ more Configuration Properties.
    What we have seen
    👀
    We were able to handle them, so are you.
    💪

    View Slide

  79. David Leitner
    @duffleit
    Coding Architect
    [email protected]
    @duffleit

    View Slide