Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing Decaton

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Introducing Decaton

Avatar for LINE Developers

LINE Developers

December 22, 2020
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. • Haruki Okada • Software engineer at LINE Corporation •

    Developing company-wide Apache Kafka platform • Decaton maintainer Speaker
  2. • https://github.com/line/decaton • Asynchronous task processing library built on top

    of Apache Kafka • Battle tested in various LINE services for several years • Open sourced in March, 2020 • Various features which make task processing development simple • Can process single partition in multiple threads • Contribute to improve consuming throughput especially for I/O intensive tasks What’s Decaton?
  3. • One of typical Kafka usage at LINE • I/O

    intensive background tasks Asynchronous task processing Task processor Web API Web API Kafka Produce tasks
 including external I/O Process asynchronously Storage
  4. Why not plain KafkaConsumer? ɾɾɾ | 5 | 4 |

    3 | 2 | 1 Partition poll() loop Partition Consumer Thread
  5. Sequential processing ɾɾɾ | 5 | 4 | 3 |

    2 | 1 doProcess(1) doProcess(2) doProcess(3) ɾ ɾ ɾ Consumer thread Partition Process latency • Processing model in major frameworks: • Kafka Streams • Spring Kafka
  6. Consumer throughput is a problem • If a process includes

    I/O with 10ms latency => 100 tasks / second at max Per partition throughput = 1 process latency per record
  7. • It’s difficult to estimate required concurrency from the beginning

    • However, • Adding partitions often requires contacting cluster administrator • Adding partitions has side effects • Message ordering breaks temporarily • More partitions tends to generate smaller producer batches • Number of open file descriptors / memory-mapped files • Not preferable in LINE circumstance • i.e. Single, multi-tenant shared Kafka cluster Why not adding more partitions?
  8. At least once is broken | 5 | 4 |

    3 | 2 | 1 | 5 | 4 | 3 | 2 | 1 => 1. Fetch 1 to 5 by poll() 2. Submit tasks to async executor => 3. Commit latest offset “5”
 as poll() is already completed => 5. Consumer crash !!! => 6. Other instance takes over
 from offset “5” => 4. Process async • => LOST offset 3,4
  9. Head-of-line blocking is a problem | 5 | 4 |

    3 | 2 | 1 | 10 | 9 | 8 | 7 | 6 ɾɾɾ Waiting single outlier “4”
 to complete Records returned by poll() (Current batch) Blocked…. Partition
  10. Ideal commit management Continue processing without waiting “4" Commit “3"

    ɾɾɾ | 5 | 4 | 3 | 2 | 1 | 10 | 9 | 8 | 7 | 6 ɾɾɾ Partition Records returned by poll() ɾɾɾ
  11. Ideal commit management ɾɾɾ | 5 | 4 | 3

    | 2 | 1 | 10 | 9 | 8 | 7 | 6 ɾɾɾ Partition Records returned by poll() ɾɾɾ In Decaton, this offset is called “Watermark” i.e. The highest offset that all preceding offsets are already processed
  12. What we need is: • Mechanism to track watermarks •

    Note that any offset could be completed by several threads in arbitrary order | 5 | 4 | 3 | 2 | 1 Processor thread Processor thread ɾ ɾ ɾ
  13. How? • We can calculate watermarks by: • Iterating over

    completed offsets in ascending order, then check continuity
 by mapping to fetched offsets • i.e. Sort out-of-order offset completion by its offset • => Priority queue ? 4 | 2 | 1 ɾɾɾ | 5 | 4 | 3 | 2 | 1 Completed offsets Fetched offsets Watermark
  14. Initial approach: Priority queue 4 | 2 | 1 ɾɾɾ

    | 5 | 4 | 3 | 2 | 1 Consumer thread 1. Register fetched offsets as “pending” Processor thread Processor thread 2. Submit tasks 5,3,1 4,2 3. Put completed offsets concurrently
 (priority queue) 4. zip offsets
 => watermark = 2
  15. Deployed to the production Kafka Web API Decaton processors API

    servers 1 million / sec tasks HBase Put
  16. Lock was necessary 4 | 2 | 1 Processor thread

    Processor thread lock() lock() ɾɾɾ | 5 | 4 | 3 | 2 | 1 • To protect priority queue from concurrent mutation • Appending to the queue • Remove offsets until a watermark • To keep queue length finite
  17. How can we improve? 4 | 2 | 1 Processor

    thread Processor thread lock() lock() • Problem: mutating shared object (priority queue) by multiple processor threads
  18. Get rid of mutating shared object • Each processor thread

    only mutates the state of the offsets it’s responsible for Processor thread Processor thread | 1 | | 2 | | 4 | | 5 | | 3 | markComplete() markComplete() Pending offsets
  19. Revised:
 Lock-free approach Consumer thread Processor thread Processor thread Fixed

    length
 ring buffer 1. Initialize states for fetched offsets (1..5)
 as “not completed” & set watermark pointer 0 1 2 3 4 5 6 7
  20. Revised:
 Lock-free approach Consumer thread 0 1 2 3 4

    5 6 7 2. Submit tasks 5,3,1 4,2 4. Advance
 watermark pointer 3. Mark offsets as “complete” Processor thread Processor thread
  21. • Optimized watermark tracking is the core of Decaton •

    Enables multi-threaded processing per single partition • With minimizing the impact of head-of-line blocking Summary
  22. • In Decaton, process ordering guarantee is relaxed from “per

    partition” to “per key” Ordering semantics Processor thread Processor thread ɾɾɾ | 5 | 4 | 3 | 2 | 1 | a b a b a key: Partition | 5 | 3 | 1 | a a a Internal queue | 4 | 2 | b b
  23. Benchmark comparison 0 500 1000 1500 2000 2500 3000 Kafka

    Streams Spring Kafka Decaton Decaton (10 threads / partition) https://github.com/ocadaruma/decaton-benchmark-comparison Throughput (msg/sec) • Benchmark condition: • Process latency: 10ms per task • Partition count: 3 • In Decaton, throughput can be scaled
 linearly as increasing threads
  24. • 1. Define task protocol • 2. Implement task producer

    • 3. Implement task processor Getting started with Decaton
  25. • Protocol between task producer and processor • Can be

    defined in arbitrary format • Protobuf as an example 1. Define task protocol
  26. • Decaton provides various useful features in task processing •

    From requirements for real-world LINE service development • Rate limiting • Retry queueing • And more Built-in features
  27. Rate limiting Decaton processor Web API Web API Kafka Sudden

    traffic spike • Abusing • Big campaign External web service Call Throttle processing rate Buffering on Kafka
  28. Retry queueing • Retry until succeeds ? • => Could

    block subsequent tasks
 (Though the impact is mitigated thanks to
 Decaton’s commit management model) • Just give up the task? • => Not preferable Decaton processor Storage
 ( high load) Fails intermittently
  29. Retry queueing • 1. Prepare retry topic in advance •

    2. Enable retry queueing feature in processor
  30. • Decaton is a battle tested Kafka consumer framework •

    Suites for I/O intensive workload the most • Provides various features which suit for many situations in task processing • Give it a try! • Feedbacks and contributions are welcome Conclusion