Slide 1

Slide 1 text

Improve Processing Performance for PayPay Cashback How we 4x the throughput using Akka Stream XIAO Yang Oct, 2021 1

Slide 2

Slide 2 text

2 - Name: - 肖杨 (XIAO Yang) - From: - Chengdu, Sichuan, China - PayPay: - Since 2019.10 - Tech Lead in CLM team - Interest: - Functional Programming - Distributed system - Akka (Concurrent Toolset for JVM written in Scala) Self Introduction

Slide 3

Slide 3 text

3 - What is Cashback in PayPay - How Cashback is given in real-time - Performance issue happened - How we improved the performance ToC

Slide 4

Slide 4 text

4 Cashback in PayPay

Slide 5

Slide 5 text

5 General Architecture

Slide 6

Slide 6 text

6 - Akka Stream - Stream processing library - Code as same as Flow chart - Compositional building blocks - Back-Pressure Support - Alpakka Kafka - Kafka connector backed by Akka Stream - Fine tuned Kafka Consumer/Producer for high performance Akka Stream/Alpakka From: Reactive stream processing using Akka streams

Slide 7

Slide 7 text

7 Akka Stream/Alpakka

Slide 8

Slide 8 text

8 - User: 15 million - Peak Traffic: 300 TPS Big Campaign in Oct 2019

Slide 9

Slide 9 text

9 - Cashback can’t show for some transactions after traffic rate is bigger than 250/s - Cashback process can’t support higher traffic - Time required from a transaction is made until cashback granted ⤴ - Some partition stopped - Consumers for topics affect each other What happened

Slide 10

Slide 10 text

10 - Visualization - Lag of each stage - Throughput of each stage 0. Identify the Bottleneck

Slide 11

Slide 11 text

11 - Monitor each Stage - Processing Lag - Throughput - External Dependencies - Response time - Throughput - Resource usage - CPU/Memory - DB Performance Dashboard

Slide 12

Slide 12 text

12 Asynchronous operator in Akka Stream - mapAsync: Accept Future function and Parallelism - Concurrent Processing - Up to n(parallelism) elements - Can use separated thread pool - Not block caller thread - In-Order Processing - Order can be kept when commit Kafka message 1. Optimize the Process

Slide 13

Slide 13 text

13 Original Configuration: Threads are blocked - High Parallelism - 120 Futures will be created at same time - Default Executor - Java ForkJoinPool with 8 parallelism 1. Optimize the Process Type Parallelism Save Incoming Event DB Write (Blocking) 10 Cashback Evaluation API Call (Non-blocking) 50 Update Event + Save Cashback DB Write (Blocking) 10 SQS Enqueue API Call (Non-blocking) 50

Slide 14

Slide 14 text

New Configuration: Less blocking - For Blocking Process - Reduce parallelism - Separated fix-sized thread-pool - For Non-blocking Process - Remain enough parallelism - Use default executor (no context switch) 14 1. Optimize the Process Type Latency Parallelism Theoretical Max Throughput 1000/Latency * Parallelism * 30 Save Incoming Event DB Write (Blocking) 20ms~30ms 2 2000 ~ 3000 TPS Cashback Evaluation API Call (Non-blocking) 100ms~300ms 20 2000 ~ 6000 TPS Update Event + Save Cashback DB Write (Blocking) 30ms~40ms 2 1500 ~ 2000 TPS

Slide 15

Slide 15 text

15 - Fully Handle 700 TPS in performance test - Whole process can finish within 2s - Up to 1200 TPS for Forward Stream - Show cashback result only - 4x+ vs. 250 TPS Mid Result

Slide 16

Slide 16 text

Can we do better? 16

Slide 17

Slide 17 text

17 2. Remove Bottleneck Save and update Event in short interval SQS operation took time Update Cashback twice in short interval Read after write by ID

Slide 18

Slide 18 text

18 2. Remove Bottleneck Write once Update once Ne need to Read Retry in-progress events got lost

Slide 19

Slide 19 text

- Fully Handle 2000+ TPS in performance test - Whole process can finish within 2s - 2500+ TPS for Forward Stream process - Show cashback result only - 48x+ vs. 250 TPS - And Now - Handle daily traffic from 40 million user base with no lag 19 Result

Slide 20

Slide 20 text

20 - Akka Stream/Alpakka Kafka helps - Easily control and tune the flow - Take care of back-pressure, in-order process, global error handling - Be careful about blocking operation - Asynchronous operator in Akka Stream again helps a lot - Reduce unnecessary operations before detail tuning - As user traffic continues to grow, there will always be new challenges Summary

Slide 21

Slide 21 text

Thank You 21