Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improve Processing Performance for PayPay Cashback

Improve Processing Performance for PayPay Cashback

PayPay Corporation.
PRO

October 27, 2021
Tweet

More Decks by PayPay Corporation.

Other Decks in Technology

Transcript

  1. Improve Processing Performance
    for PayPay Cashback
    How we 4x the throughput using Akka Stream
    XIAO Yang
    Oct, 2021
    1

    View Slide

  2. 2
    - Name:
    - 肖杨 (XIAO Yang)
    - From:
    - Chengdu, Sichuan, China
    - PayPay:
    - Since 2019.10
    - Tech Lead in CLM team
    - Interest:
    - Functional Programming
    - Distributed system
    - Akka (Concurrent Toolset for JVM written in Scala)
    Self Introduction

    View Slide

  3. 3
    - What is Cashback in PayPay
    - How Cashback is given in real-time
    - Performance issue happened
    - How we improved the performance
    ToC

    View Slide

  4. 4
    Cashback in PayPay

    View Slide

  5. 5
    General Architecture

    View Slide

  6. 6
    - Akka Stream
    - Stream processing library
    - Code as same as Flow chart
    - Compositional building blocks
    - Back-Pressure Support
    - Alpakka Kafka
    - Kafka connector backed by Akka Stream
    - Fine tuned Kafka Consumer/Producer for high performance
    Akka Stream/Alpakka
    From: Reactive stream processing using Akka streams

    View Slide

  7. 7
    Akka Stream/Alpakka

    View Slide

  8. 8
    - User: 15 million
    - Peak Traffic: 300 TPS
    Big Campaign in Oct 2019

    View Slide

  9. 9
    - Cashback can’t show for some transactions after traffic
    rate is bigger than 250/s
    - Cashback process can’t support higher traffic
    - Time required from a transaction is made until cashback granted ⤴
    - Some partition stopped
    - Consumers for topics
    affect each other
    What happened

    View Slide

  10. 10
    - Visualization
    - Lag of each stage
    - Throughput of each stage
    0. Identify the Bottleneck

    View Slide

  11. 11
    - Monitor each Stage
    - Processing Lag
    - Throughput
    - External Dependencies
    - Response time
    - Throughput
    - Resource usage
    - CPU/Memory
    - DB
    Performance Dashboard

    View Slide

  12. 12
    Asynchronous operator in Akka Stream
    - mapAsync: Accept Future function and Parallelism
    - Concurrent Processing
    - Up to n(parallelism) elements
    - Can use separated thread pool
    - Not block caller thread
    - In-Order Processing
    - Order can be kept when commit Kafka message
    1. Optimize the Process

    View Slide

  13. 13
    Original Configuration: Threads are blocked
    - High Parallelism
    - 120 Futures will be created at same time
    - Default Executor
    - Java ForkJoinPool with 8 parallelism
    1. Optimize the Process
    Type Parallelism
    Save Incoming Event DB Write
    (Blocking)
    10
    Cashback Evaluation API Call
    (Non-blocking)
    50
    Update Event + Save
    Cashback
    DB Write
    (Blocking)
    10
    SQS Enqueue API Call
    (Non-blocking)
    50

    View Slide

  14. New Configuration: Less blocking
    - For Blocking Process
    - Reduce parallelism
    - Separated fix-sized thread-pool
    - For Non-blocking Process
    - Remain enough parallelism
    - Use default executor (no context switch)
    14
    1. Optimize the Process
    Type Latency Parallelism Theoretical Max Throughput
    1000/Latency * Parallelism * 30
    Save Incoming Event DB Write
    (Blocking)
    20ms~30ms 2 2000 ~ 3000 TPS
    Cashback Evaluation API Call
    (Non-blocking)
    100ms~300ms 20 2000 ~ 6000 TPS
    Update Event + Save
    Cashback
    DB Write
    (Blocking)
    30ms~40ms 2 1500 ~ 2000 TPS

    View Slide

  15. 15
    - Fully Handle 700 TPS in performance test
    - Whole process can finish within 2s
    - Up to 1200 TPS for Forward Stream
    - Show cashback result only
    - 4x+ vs. 250 TPS
    Mid Result

    View Slide

  16. Can we do better?
    16

    View Slide

  17. 17
    2. Remove Bottleneck
    Save and update
    Event in short
    interval
    SQS operation
    took time
    Update Cashback
    twice in short
    interval
    Read after
    write by ID

    View Slide

  18. 18
    2. Remove Bottleneck
    Write once
    Update once
    Ne need
    to Read
    Retry in-progress
    events got lost

    View Slide

  19. - Fully Handle 2000+ TPS in performance test
    - Whole process can finish within 2s
    - 2500+ TPS for Forward Stream process
    - Show cashback result only
    - 48x+ vs. 250 TPS
    - And Now
    - Handle daily traffic from 40 million user base with no lag
    19
    Result

    View Slide

  20. 20
    - Akka Stream/Alpakka Kafka helps
    - Easily control and tune the flow
    - Take care of back-pressure, in-order process, global error handling
    - Be careful about blocking operation
    - Asynchronous operator in Akka Stream again helps a lot
    - Reduce unnecessary operations before detail tuning
    - As user traffic continues to grow, there will always be new
    challenges
    Summary

    View Slide

  21. Thank You
    21

    View Slide