Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Investigating Kafka performance issue caused by lock contention in xfs

Investigating Kafka performance issue caused by lock contention in xfs

Haruki Okada / LINE Corporation

Apache Kafka Meetup Japan #12
https://kafka-apache-jp.connpass.com/event/267091/

LINE Developers
PRO

December 16, 2022
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Investigating Kafka
    performance issue caused
    by lock contention in xfs
    Haruki Okada
    2022.12.16

    View Slide

  2. Speaker
    2
    •Haruki Okada (Twitter/GitHub: ocadaruma)
    •Lead of LINE IMF team
    •The team is responsible for developing/maintaining company-wide

    Kafka platform
    •Kafka contributor
    •KIP-764: Configurable backlog size for creating Acceptor
    •Patch for KAFKA-13572

    View Slide

  3. Kafka at LINE
    3
    •One of the most popular middleware in LINE
    •We provide multi-tenant shared Kafka cluster
    •Our largest cluster:
    •Daily data in/out: 2.1PB
    •Daily message inflow: 1.2T messages

    View Slide

  4. Performance engineerings
    4
    •We put a lot of engineering efforts to make our cluster highly performant and reliable
    •e.g.
    •“Reliability Engineering Behind The Most Trusted Kafka Platform”

    (LINE DEVELOPER DAY 2019)
    •"Investigating Request Delay in a Large-Scale Kafka Cluster Caused by TCP"

    (LINE DEVELOPER DAY 2021)

    View Slide

  5. •Overview of the phenomenon we encountered
    •Investigation process
    •Solution
    •Conclusion
    Agenda
    5

    View Slide

  6. •One of our brokers experienced high produce response time (over 600ms at 99th)
    •Usual produce response time is ~50 ms at 99th
    •Violates our cluster's SLO. Serious performance issue
    Phenomenon
    6

    View Slide

  7. How requests are processed in Kafka
    7
    Network
    Threads
    Request
    Handler
    Threads
    Request Queue
    Response Queue
    Producer
    •RequestQueue time increase implies request handler threads were busy

    and delayed to dequeue from request queue
    Request
    Response

    View Slide

  8. 8
    Network
    Threads
    Request
    Handler
    Threads
    Request Queue
    Response Queue
    Producer
    Request
    Response
    Actually, request handlers were busy

    View Slide

  9. What makes request handlers to busy
    9
    •Suspect 1: Too many requests were sent
    •=> No. We could see no request count spike
    •Suspect 2: Took long time to process requests
    •=> Possible. At the incidental time, we observed high utilization from specific

    client. This implies it took long time to process requests from the client

    View Slide

  10. Identifying the hot code path: Async profiler
    10
    •Famous low overhead sampling profiler
    •https://github.com/jvm-profiling-tools/async-profiler
    •We succeeded to take the profile when the phenomenon recurred
    •Use wall-clock profiling which samples all threads regardless a thread

    consumed CPU time or not
    •Profile per thread separately
    •profiler.sh -d 30 -e wall -t -f profile.html $KAFKA_PID

    View Slide

  11. 11
    Some request handlers spent significant time on waiting to

    acquire lock when appending records to the log
    request-handler-0
    request-handler-1
    request-handler-2
    ...
    Log
    Lock
    append

    View Slide

  12. Identify the cause of the lock contention
    12
    •The profile implies that there was a thread executing time-consuming frame

    with holding the lock
    •We want to identify the thread and the frame
    •We need timeline of thread dumps
    request-handler-0
    request-handler-1
    ɾɾɾ pthread_cond_wait
    some_blocking_method

    View Slide

  13. Async profiler's JFR output
    13
    •Async profiler supports to output the profile in JFR (Java Flight Recorder) format
    •Series of profile samples that consisting of
    •Timestamp
    •Thread
    •Thread state
    •Stack trace
    •=> We can construct timeline of thread dumps from JFR output
    •profiler.sh -d 30 -e wall -t -f profile.jfr $KAFKA_PID

    View Slide

  14. Visualizing the JFR output
    14
    •jfrv: A tool to visualize Async profiler's JFR output
    •https://ocadaruma.github.io/jfrv/#/execution-sample
    Thread list Sample timeline per thread
    Thread dump in the clicked sample

    View Slide

  15. Observation: blocking `write` when appending records
    15
    @__write


    sun/nio/ch/FileDispatcherImpl@write0


    sun/nio/ch/FileDispatcherImpl@write


    sun/nio/ch/IOUtil@writeFromNativeBuffer


    sun/nio/ch/IOUtil@write


    sun/nio/ch/FileChannelImpl@write


    org/apache/kafka/common/record/MemoryRecords@writeFullyTo


    org/apache/kafka/common/record/FileRecords@append


    kafka/log/LogSegment@append

    View Slide

  16. Observation: blocking `write`
    16
    •We found that `write` system call sometimes blocked while other request
    handlers were waiting lock to acquire
    •Could be the direct cause of the busy request handlers. i.e.
    •One request handler thread acquired lock to write produced data to the log
    segment =>
    •`write` blocked somehow
    •Meanwhile, there were produce requests to the same partition but

    all of them got block

    View Slide

  17. Does `write` block ?
    17
    •In the first place, `write` is not supposed to block in general
    •In Linux, what `write` does is just copying data to `page` in memory
    •Kernel's writeback task does writing data to actual disk asynchronously
    request-handler page HDD
    write
    Writeback task
    produced data
    write asynchronously

    View Slide

  18. Observation: simultaneous

    blocking `sendfile`
    18
    @__GI_sendfile64


    sun/nio/ch/FileChannelImpl@transferTo0


    sun/nio/ch/FileChannelImpl@transferToDirectlyInternal


    sun/nio/ch/FileChannelImpl@transferToDirectly


    sun/nio/ch/FileChannelImpl@transferTo


    org/apache/kafka/common/network/
    PlaintextTransportLayer@transferFrom
    @__write


    sun/nio/ch/FileDispatcherImpl@write0


    sun/nio/ch/FileDispatcherImpl@write


    sun/nio/ch/IOUtil@writeFromNativeBuffer


    sun/nio/ch/IOUtil@write


    sun/nio/ch/FileChannelImpl@write


    org/apache/kafka/common/record/MemoryRecords@writeFullyTo


    org/apache/kafka/common/record/FileRecords@append


    kafka/log/LogSegment@append

    View Slide

  19. How does Kafka use `sendfile`
    19
    •`sendfile` is a system call that copies data from one file descriptor to another
    •Kafka uses `sendfile` to send records directly to the consumer to avoid

    copying data to user space
    Network
    Threads
    Consumer
    Request
    Socket
    Filesystem
    Response
    ...
    `sendfile`

    View Slide

  20. Does `sendfile` block ?
    20
    •Kafka is a streaming middleware. So usually, consumers read latest data from

    tail of the log
    •Data is "hot" enough so likely resides in page cache (memory) => Very fast
    Producer
    Page
    Page
    ...
    Write
    Consumer
    Read
    Log segment file

    View Slide

  21. `sendfile` could block
    21
    •However, consumers sometimes try to read old data for catch-up
    •If the data about to be copied by `sendfile` is not in page cache, it's necessary

    to read the data from the device (HDD), which is significantly slow
    •refs: https://logmi.jp/tech/articles/320330

    "Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages"
    Producer
    Page
    Page
    ...
    Write
    Consumer
    Read
    Log segment file

    View Slide

  22. Actually, there was a lagging consumer
    22
    •This is highly likely the cause of blocking `sendfile`

    View Slide

  23. How do blocking `sendfile` and blocking `write` relate
    23
    •Hypothesis: Some lock contention inside Kernel....?
    Log segment
    file
    request-handler
    thread
    network thread
    Lock???

    View Slide

  24. Experiment
    24
    •Wrote a small code to check if the hypothesis is possible or not
    •https://gist.github.com/ocadaruma/691a3527fe65bf38bc01aeca0dbd3416
    Writer Reader
    2. sendfile
    3. write
    1. Evict from page cache
    File

    View Slide

  25. Result: `write` indeed blocked during `sendfile`
    25
    •Injected read-delay into the device (using dm-delay)
    •Writing to the file blocked until the sendfile has return
    $ java App3 /path/to/test.bin


    [2022-08-08T15:38:07.702003] Took 500 ms to transfer


    [2022-08-08T15:38:07.702033] Took 450 ms to write


    ...

    View Slide

  26. Reading Linux kernel source code
    26
    •`write` and `sendfile` could actually contend
    •Next, we need to identify the root cause of the contention
    •After reading the Linux kernel source code, we found that both requires

    lock on inode
    • xfs_file_splice_read - https://github.com/torvalds/linux/blob/v3.10/fs/xfs/xfs_file.c#L336
    • xfs_file_buffered_aio_write - https://github.com/torvalds/linux/blob/v3.10/fs/xfs/xfs_file.c#L723

    View Slide

  27. inode lock contention in xfs
    27
    •There's a discussion in linux-fsdevel ML
    •https://lore.kernel.org/linux-fsdevel/[email protected]
    "xfs: reduce ilock contention on buffered randrw workload"

    View Slide

  28. Overall flow
    28
    •1. Consumer started to read old data, which caused blocking `sendfile`

    due to reading data from slow device (HDD)
    Page
    ...
    Log segment file
    Network
    Threads
    Consumer Socket
    Response
    Request
    Handler
    Threads
    `sendfile`
    Request

    View Slide

  29. Overall flow
    29
    •2. `sendfile` blocked produce-request handling on `write` (for appending data)

    due to lock contention in xfs
    Page
    ...
    Log segment file
    Producer Socket
    Request
    `write`
    Log
    lock
    Network
    Threads
    Request
    Handler
    Threads

    View Slide

  30. Overall flow
    30
    •3. Meanwhile, other produce requests also blocked due to lock contention in Kafka
    Page
    ...
    Log segment file
    Producer Socket
    Request
    Producer
    ...
    `write`
    Log
    lock
    Network
    Threads
    Request
    Handler
    Threads

    View Slide

  31. Overall flow
    31
    •4. Request handlers got busy, so delayed to dequeue from request queue

    => Resulted in 99th produce response time degradation

    View Slide

  32. Solution
    32
    •In short, the problem was:
    •Some parts of active (last) log segment was evicted from page cache

    (Produced data is appended to active segment)
    •Hence blocking `sendfile` (for catch-up consumers) happened while

    there were `write` (for producers)
    •=> Set smaller log segment size to ensure active segment is in page cache
    •`log.segment.bytes` config

    View Slide

  33. Solution
    33
    •Side effect:
    •Causes many log segment files =>
    •Kafka opens file descriptors for all of them
    •Also, potentially creates memory map for them
    •So we adjusted `nofile` and `vm.max_map_count` respectively
    •Result:
    •We rarely observe performance degradations by same cause so far

    View Slide

  34. Conclusion
    34
    •Lock contention in xfs could cause subtle performance issue on Kafka
    •Async-profiler's JFR output is quite useful for investigation

    to see the interactions between threads

    View Slide