Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Decaton

Introducing Decaton

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

December 22, 2020
Tweet

Transcript

  1. Introducing Decaton LINE Corporation
 Haruki Okada

  2. • Haruki Okada • Software engineer at LINE Corporation •

    Developing company-wide Apache Kafka platform • Decaton maintainer Speaker
  3. • https://github.com/line/decaton • Asynchronous task processing library built on top

    of Apache Kafka • Battle tested in various LINE services for several years • Open sourced in March, 2020 • Various features which make task processing development simple • Can process single partition in multiple threads • Contribute to improve consuming throughput especially for I/O intensive tasks What’s Decaton?
  4. • One of typical Kafka usage at LINE • I/O

    intensive background tasks Asynchronous task processing Task processor Web API Web API Kafka Produce tasks
 including external I/O Process asynchronously Storage
  5. Why not plain KafkaConsumer? ɾɾɾ | 5 | 4 |

    3 | 2 | 1 Partition poll() loop Partition Consumer Thread
  6. Sequential processing ɾɾɾ | 5 | 4 | 3 |

    2 | 1 doProcess(1) doProcess(2) doProcess(3) ɾ ɾ ɾ Consumer thread Partition Process latency • Processing model in major frameworks: • Kafka Streams • Spring Kafka
  7. Consumer throughput is a problem • If a process includes

    I/O with 10ms latency => 100 tasks / second at max Per partition throughput = 1 process latency per record
  8. • It’s difficult to estimate required concurrency from the beginning

    • However, • Adding partitions often requires contacting cluster administrator • Adding partitions has side effects • Message ordering breaks temporarily • More partitions tends to generate smaller producer batches • Number of open file descriptors / memory-mapped files • Not preferable in LINE circumstance • i.e. Single, multi-tenant shared Kafka cluster Why not adding more partitions?
  9. Why not just process async?

  10. At least once is broken | 5 | 4 |

    3 | 2 | 1 | 5 | 4 | 3 | 2 | 1 => 1. Fetch 1 to 5 by poll() 2. Submit tasks to async executor => 3. Commit latest offset “5”
 as poll() is already completed => 5. Consumer crash !!! => 6. Other instance takes over
 from offset “5” => 4. Process async • => LOST offset 3,4
  11. Why not just commit in batch?

  12. Head-of-line blocking is a problem | 5 | 4 |

    3 | 2 | 1 | 10 | 9 | 8 | 7 | 6 ɾɾɾ Waiting single outlier “4”
 to complete Records returned by poll() (Current batch) Blocked…. Partition
  13. Ideal commit management Continue processing without waiting “4" Commit “3"

    ɾɾɾ | 5 | 4 | 3 | 2 | 1 | 10 | 9 | 8 | 7 | 6 ɾɾɾ Partition Records returned by poll() ɾɾɾ
  14. Ideal commit management ɾɾɾ | 5 | 4 | 3

    | 2 | 1 | 10 | 9 | 8 | 7 | 6 ɾɾɾ Partition Records returned by poll() ɾɾɾ In Decaton, this offset is called “Watermark” i.e. The highest offset that all preceding offsets are already processed
  15. What we need is: • Mechanism to track watermarks •

    Note that any offset could be completed by several threads in arbitrary order | 5 | 4 | 3 | 2 | 1 Processor thread Processor thread ɾ ɾ ɾ
  16. How? • We can calculate watermarks by: • Iterating over

    completed offsets in ascending order, then check continuity
 by mapping to fetched offsets • i.e. Sort out-of-order offset completion by its offset • => Priority queue ? 4 | 2 | 1 ɾɾɾ | 5 | 4 | 3 | 2 | 1 Completed offsets Fetched offsets Watermark
  17. Initial approach: Priority queue 4 | 2 | 1 ɾɾɾ

    | 5 | 4 | 3 | 2 | 1 Consumer thread 1. Register fetched offsets as “pending” Processor thread Processor thread 2. Submit tasks 5,3,1 4,2 3. Put completed offsets concurrently
 (priority queue) 4. zip offsets
 => watermark = 2
  18. Deployed to the production Kafka Web API Decaton processors API

    servers 1 million / sec tasks HBase Put
  19. Watermark tracking
 became bottleneck ObjectMonitor

  20. Lock was necessary 4 | 2 | 1 Processor thread

    Processor thread lock() lock() ɾɾɾ | 5 | 4 | 3 | 2 | 1 • To protect priority queue from concurrent mutation • Appending to the queue • Remove offsets until a watermark • To keep queue length finite
  21. How can we improve? 4 | 2 | 1 Processor

    thread Processor thread lock() lock() • Problem: mutating shared object (priority queue) by multiple processor threads
  22. Get rid of mutating shared object • Each processor thread

    only mutates the state of the offsets it’s responsible for Processor thread Processor thread | 1 | | 2 | | 4 | | 5 | | 3 | markComplete() markComplete() Pending offsets
  23. Revised:
 Lock-free approach Consumer thread Processor thread Processor thread Fixed

    length
 ring buffer 1. Initialize states for fetched offsets (1..5)
 as “not completed” & set watermark pointer 0 1 2 3 4 5 6 7
  24. Revised:
 Lock-free approach Consumer thread 0 1 2 3 4

    5 6 7 2. Submit tasks 5,3,1 4,2 4. Advance
 watermark pointer 3. Mark offsets as “complete” Processor thread Processor thread
  25. No longer a bottleneck

  26. • Optimized watermark tracking is the core of Decaton •

    Enables multi-threaded processing per single partition • With minimizing the impact of head-of-line blocking Summary
  27. • In Decaton, process ordering guarantee is relaxed from “per

    partition” to “per key” Ordering semantics Processor thread Processor thread ɾɾɾ | 5 | 4 | 3 | 2 | 1 | a b a b a key: Partition | 5 | 3 | 1 | a a a Internal queue | 4 | 2 | b b
  28. Benchmark comparison 0 500 1000 1500 2000 2500 3000 Kafka

    Streams Spring Kafka Decaton Decaton (10 threads / partition) https://github.com/ocadaruma/decaton-benchmark-comparison Throughput (msg/sec) • Benchmark condition: • Process latency: 10ms per task • Partition count: 3 • In Decaton, throughput can be scaled
 linearly as increasing threads
  29. • 1. Define task protocol • 2. Implement task producer

    • 3. Implement task processor Getting started with Decaton
  30. • Protocol between task producer and processor • Can be

    defined in arbitrary format • Protobuf as an example 1. Define task protocol
  31. 2. Implement task producer

  32. 3. Implement task processor • That’s it. Simple enough isn’t

    it?
  33. • Decaton provides various useful features in task processing •

    From requirements for real-world LINE service development • Rate limiting • Retry queueing • And more Built-in features
  34. Rate limiting Decaton processor Web API Web API Kafka Sudden

    traffic spike • Abusing • Big campaign External web service Call Throttle processing rate Buffering on Kafka
  35. • Can be enabled by only few lines of code

    Rate limiting
  36. Retry queueing • Retry until succeeds ? • => Could

    block subsequent tasks
 (Though the impact is mitigated thanks to
 Decaton’s commit management model) • Just give up the task? • => Not preferable Decaton processor Storage
 ( high load) Fails intermittently
  37. Retry queueing Decaton processor Storage
 ( high load) Fails intermittently

    Kafka Produce retry task Retry after backoff
  38. Retry queueing • 1. Prepare retry topic in advance •

    2. Enable retry queueing feature in processor
  39. • Decaton is a battle tested Kafka consumer framework •

    Suites for I/O intensive workload the most • Provides various features which suit for many situations in task processing • Give it a try! • Feedbacks and contributions are welcome Conclusion
  40. Thank you