2 ● Ville Misaki ● From Helsinki, Finland ○ Currently living in Fukuoka, Japan – WFA ● Back-end software engineer ○ Since January 2021 ○ Customer Lifecycle Management team Self Introduction
3 ● What is PayPay Step? ○ Original Architecture ● PayPay Step Merge ○ Problem Statement ○ Scaling Out ○ Maintaining Accuracy ○ Speed vs. Batch ○ Special Edge Cases ● Outcome Agenda
4 ● Monthly basic cash-back campaign ● Use more this month; get more cash-back next month ○ 50 txn ≥100 yen and/or ≥100,000 yen total value ○ 0.5% → 1% → 1.5% ● Original campaign 2020-04-01 ~ 2021-07-31 What is PayPay Step? Counting period Effective period
5 ● User’s current progress queried from transaction history ○ Aggregated as needed ■ User’s transaction count (≥100 yen) & value for target month ○ Always up-to-date – but heavy load on DB Original Architecture 1 2
6 ● Campaign renewal project ○ Including other Yahoo! JAPAN services – “merge” ○ Campaign status visible from many new places ● New campaign rules from 2021-08-01 ~ PayPay Step Merge
7 ● Expecting a lot more traffic ○ We want to support 10,000+ RPS for status API ○ Need to be able to scale out ● Aggregation of transactions when viewing status ○ Heavy DB query ● Still keep near real-time ○ Users should see their purchases being reflected ○ Just caching results would cause lag Problem Statement
8 ● Move aggregation to be done after transactions ○ Each user’s aggregated campaign status stored in DynamoDB ○ Very fast fetch of current status for scaling out Scaling Out
9 ● Aggregation for campaign done in batches ○ Extra verification and control over data ○ Re-aggregation in case of issues or campaign rule changes Maintaining Accuracy
11 ● At least once ○ Provided by platform on each step ○ All events of transaction have its full history for filling gaps ■ E.g. Created → Completed → Refunded ○ Daily monitoring batch ● At most once ○ Deduplicate with transaction table Exactly-once Processing
12 ● The world is not perfect ○ Mis-configurations, system trouble, bugs... ● Aggregation may need corrections ○ Speed aggregation – aggregated once ○ Batch aggregation – inherent re-aggregation Re-aggregation
13 ● Campaign rules are not written in stone ● Two-month campaigns ○ Count and effective periods overlap ● Old and new campaign side by side ○ Versioning of results during batch – aggregate for both versions ○ Campaign configured with desired version Campaign Rule Changes July August September Counting v0 v1 v1 Effective v0 v0 v1
14 ● Lambda architecture ○ Combination of speed and batch processing ● Reached scalability goals ○ 10,000+ RPS for read ○ Near real-time aggregation ○ Robust, future-proof foundation ● The campaign is running ○ Check it out! Outcome