Slide 1

Slide 1 text

PayPay Step Merge Journey Scaling Out Transaction Aggregation Ville Misaki Oct, 2021 1

Slide 2

Slide 2 text

2 ● Ville Misaki ● From Helsinki, Finland ○ Currently living in Fukuoka, Japan – WFA ● Back-end software engineer ○ Since January 2021 ○ Customer Lifecycle Management team Self Introduction

Slide 3

Slide 3 text

3 ● What is PayPay Step? ○ Original Architecture ● PayPay Step Merge ○ Problem Statement ○ Scaling Out ○ Maintaining Accuracy ○ Speed vs. Batch ○ Special Edge Cases ● Outcome Agenda

Slide 4

Slide 4 text

4 ● Monthly basic cash-back campaign ● Use more this month; get more cash-back next month ○ 50 txn ≥100 yen and/or ≥100,000 yen total value ○ 0.5% → 1% → 1.5% ● Original campaign 2020-04-01 ~ 2021-07-31 What is PayPay Step? Counting period Effective period

Slide 5

Slide 5 text

5 ● User’s current progress queried from transaction history ○ Aggregated as needed ■ User’s transaction count (≥100 yen) & value for target month ○ Always up-to-date – but heavy load on DB Original Architecture 1 2

Slide 6

Slide 6 text

6 ● Campaign renewal project ○ Including other Yahoo! JAPAN services – “merge” ○ Campaign status visible from many new places ● New campaign rules from 2021-08-01 ~ PayPay Step Merge

Slide 7

Slide 7 text

7 ● Expecting a lot more traffic ○ We want to support 10,000+ RPS for status API ○ Need to be able to scale out ● Aggregation of transactions when viewing status ○ Heavy DB query ● Still keep near real-time ○ Users should see their purchases being reflected ○ Just caching results would cause lag Problem Statement

Slide 8

Slide 8 text

8 ● Move aggregation to be done after transactions ○ Each user’s aggregated campaign status stored in DynamoDB ○ Very fast fetch of current status for scaling out Scaling Out

Slide 9

Slide 9 text

9 ● Aggregation for campaign done in batches ○ Extra verification and control over data ○ Re-aggregation in case of issues or campaign rule changes Maintaining Accuracy

Slide 10

Slide 10 text

10 ● Lambda architecture ○ Speed and batch processing ● Synchronize aggregations ○ Aggregation as daily changes ○ Boundary set by speed aggregation ● Special edge cases ○ Exactly-once Processing ○ Re-aggregation ○ Campaign rule changes Speed vs. Batch

Slide 11

Slide 11 text

11 ● At least once ○ Provided by platform on each step ○ All events of transaction have its full history for filling gaps ■ E.g. Created → Completed → Refunded ○ Daily monitoring batch ● At most once ○ Deduplicate with transaction table Exactly-once Processing

Slide 12

Slide 12 text

12 ● The world is not perfect ○ Mis-configurations, system trouble, bugs... ● Aggregation may need corrections ○ Speed aggregation – aggregated once ○ Batch aggregation – inherent re-aggregation Re-aggregation

Slide 13

Slide 13 text

13 ● Campaign rules are not written in stone ● Two-month campaigns ○ Count and effective periods overlap ● Old and new campaign side by side ○ Versioning of results during batch – aggregate for both versions ○ Campaign configured with desired version Campaign Rule Changes July August September Counting v0 v1 v1 Effective v0 v0 v1

Slide 14

Slide 14 text

14 ● Lambda architecture ○ Combination of speed and batch processing ● Reached scalability goals ○ 10,000+ RPS for read ○ Near real-time aggregation ○ Robust, future-proof foundation ● The campaign is running ○ Check it out! Outcome

Slide 15

Slide 15 text

Thank You 15