Sticker Recommendation Using Federated Learning

Who I Am - Haruka Kikuchi - Product Manager (ML)
- Past Experience - Security Research (Prog. Lang.) - Human-Computer Interaction Research - Geo-spatial Data Analysis, etc.

Agenda - What is Federated Learning (FL)? - 1st Target
Application - ML Model Overview - LFL - LINE’s FL Platform - Privacy Preservation - Summary

Server-side Machine Learning (ML) Centralized server(s) collect data and process
ML Output Output Output Output Output Output Output Output Training Inference ML

Server-side Machine Learning (ML) Centralized server(s) collect data and process
ML Log Log Log Log Log Log Log Log Training Inference ML

On-Device ML Inferencing Client devices receive global ML model and
run inference ML Training Global Model Global Model Global Model Global Model Global Model Global Model Global Model Global Model

On-Device ML Inferencing Client devices receive global ML model and
run inference ML Training Global Model Global Model Global Model Global Model Global Model Global Model Global Model Global Model Inference Inference Inference Inference Inference Inference

Training Training Training Training Training Federated Learning (FL) Client On-device
ML training + server aggregation ML Training Training Training

Federated Learning (FL) Global model are sent to individual devices
ML Model Aggregation

ML Global Model Global Model Global Model Global Model Global Model Global Model Global Model Global Model Model Aggregation

ML Global Model Global Model Global Model Global Model Global Model Global Model Global Model Global Model Inference Inference Inference Inference Inference Inference Model Aggregation

Section Summary Server-side ML ! resourceful Lots of data /
computation resource Recommendation

computation resource On-Device ML Inferencing ! responsive no network latency Recommendation User Interface

computation resource On-Device ML Inferencing ! responsive no network latency Federated Learning ! privacy preservation users don’t have to send raw data to server Recommendation User Interface Sensitive Data Treatment

- Sticker suggestions based on semantic labels - Incremental suggestions
while text input, using pre-defined keywords associated with the each label Sticker Auto Suggest

Sticker Semantic Tags https://creator.line.me

Stickers Premium Subscription Service https://store.line.me/stickers-premium/landing/en

Available Stickers in Auto-Suggest Area Purchased, etc. User- downloaded Auto-
downloaded

Hybrid Approach Server-side ML ! resourceful Lots of data /
computation resource ! responsive no network latency Federated Learning ! privacy preservation users don’t have to send raw data to server Recommendation User Interface Communication Info. On-Device ML Inferencing Server-side ML Federated Learning

Hybrid Approach Server-side ML ! resourceful Lots of data /
computation resource ! responsive no network latency Federated Learning ! privacy preservation users don’t have to send raw data to server Recommendation User Interface Communication Info. Candidate Generation (1st Stage) Reranking (2nd Stage) On-Device ML Inferencing Server-side ML Federated Learning

1st Stage: Candidate Generation - Input: - Sticker purchase/download log
- User features (estimated demographics, etc.) - Output (intermediate) - Item embeddings - User embeddings - Final output - Item candidates (per user cluster) Input Final Output Intermediate Output

2nd Stage: Reranking - Input: - user embedding (fully personalized)
- item embedding (for each candidate) - Output - score for each item - Client App. performs - inference, triggered by text input - training, triggered when device-idle (Personalized) Make use of intermediate output in 1st stage (Global) Input Output

Candidate Generation (1st stage) Reranking (2nd stage) Personalization Sticker Candidates
(1,000,000 → 100) Reorder stickers (100) Signal Sticker download (e.g. purchases) Sticker click/impression Inference Server Client device Training Server Client device (mostly) Comparison 1st stage vs. 2nd stage

Requirements as a Platform LINE Federated Learning (LFL) - Model
upload without user ID - Differential Privacy (DP) mechanism Privacy Preservation Support multiple on-device ML instances - Separation of app. specific implementations from common FL functions

System Architecture Separation of service specific ML functions from FL
platform

System Architecture Quadrants

System Architecture 2-Staged ML

System Architecture On-device ML (2nd Stage)

System Architecture On-device ML Inferencing (2nd Stage)

System Architecture Federated Learning

Other Platform Features ONNX for OS-agnostic ML Runtime FL Model/Param
A/B Test Local Model Training in Background Candidate Generation For Sticker Sticker keyboard Model Training Scheduler Model/Feature Ver. Management

Please Check Tomorrow’s Session! Nov. 18th (Fri) 15:00-16:00 JST

Privacy Preservation - Noise addition to the model on local
devices (Local DP) Support Differential Privacy (DP) mechanism Minimization of data collection - Federated Learning (collect ML models on behalf of raw data) - Model upload by randomly-sampled users without user id

System Architecture FL + Local DP

FL with Local Differential Privacy Noise injection to local model
• Aggregation • Transformation • Noise Injection Noisy Outputs Local Devices Server

• Aggregation • Transformation • Noise Injection Noisy Outputs Local Devices Server Indistinguishable represented by ε ?

• Aggregation • Transformation • Noise Injection Noisy Outputs Local Devices Server Indistinguishable represented by ε ? ?

Current Status - As-is: set a weak value to evaluate
the feasibility of FL - To-be: set a mature value that balances utility of FL and users’ privacy Seeking a privacy parameter ε Implementation of Local DP mechanisms with FL - Gaussian mechanism for local gradient (Local DP) - Averaging the aggregated local models (FL) - Local model upload without user id

A/B Test Result 5.6% uplift - Personalized sticker suggestions evoke
explicit premium sticker package downloads

Collaborations Multiple Locations w/ 30+ Engineers

Collaborations Multiple Locations w/ 30+ Engineers KOREA KOREA TOKYO TOKYO
FUKUOKA TOKYO

Future Work - Seeking for LDP configuration - Shuffling mechanism
Make LFL as a true LINE’s privacy preservation platform Expand FL-based reranking to all the stickers - Currently, sticker premium is the only service

Sticker Recommendation Using Federated Learning

Sticker Recommendation Using Federated Learning

More Decks by Tech-Verse2022

Other Decks in Technology

Featured

Transcript