LFL Client Platform for Suppoting Multiple Federated Learning Instances

What is federated learning? Cloud-based Machine Learning Federated Learning Collected
user data Clients Server Aggregated ML model Clients User data Distribute ML model Training Download & upload ML model User data Training Server ML model Training

Federated learning examples Research papers Google’s Gboard Apple, Meta, etc.
NAVER SmartBoard

- Premium stickers recommendation - Before: recommend by usage history
- After: recommend by federated learning - A/B test result - Premium stickers download 5.56% uplift Federated learning at LINE

Music Call News Video Expecting multiple FL adoption Chat Pay

Client app Common fuctionalities for FL FL structure overview Server
Model aggregation Common functionality User interaction ML Model Repository Model (updated) Inference Train Model User log

Resource-intensive functionalities for FL - Resource limited mobile environment -
Simultaneous on-device training - May degrade user experience On-device training

Table of contents - What is Federated Learning? - Federated
Learning at LINE - Why do we need a planform supporting multiple Federated Learning instances? - LFL client platform supporting multiple Federated Learning instances - On-device training of LFL client platform - Inside of LFL client platform - Lesson learned

LINE app LINE Federated Learning architecture Feature service server Download
ML model LFL Platform (Client-side) Feature service client Download to feature service Model (updated) Model Inference Train LFL Platform (Server-side) Feature Model Repository ML Model Repository User log Model aggregation Inference request User interaction Upload ML model

LFL client platform structure ML library LFL Application LFL Application
LFL Application LFL Application module Common module LFL Client Platform Model version check Model download Result upload Init with config Push logs Inference Model training User logs ML model Update model, config Start training Get training result LINE Feature service

Dependency inversion in LFL client platform LFL Common's dependency to
Application Modules LFL_Common_Module LFL_Application_A_Module Application_A fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() LFL_Application_B_Module Application_B fun updateModel() fun startTrain() fun getTrainResult() Call functions Call functions LFLApplication Manager fun updateModelA() fun updateModelB() fun updateConfigA() fun startTrainA() fun startTrainB() fun getTrainResultA() fun getTrainResultB() C C C

Dependency inversion in LFL client platform Dependency Infeversion of LFL
Common and Application Module LFL_Common_Module LFL_Application_A_Module Application_A fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() LFL_Application_B_Module Application_B fun updateModel() fun startTrain() fun getTrainResult() Call functions Call functions LFLApplication Manager fun updateModelA() fun updateModelB() fun updateConfigA() fun startTrainA() fun startTrainB() fun getTrainResultA() fun getTrainResultB() C C C LFL_Application_A_Module LFL_Application_B_Module LFL_Common_Module LFLApplication Manager C LFLApplication fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() I call functions LFLApplication_A_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl()} LFLApplication_B_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl() C C Implement interfaces Dependency injection

LFL application module LFL application module LFL application module Interfaces
of LFL client platform LFL Client Platform Common module Machine learning library LFL application manager Local trigger 1 Training trigger 3 Start train to actual instance 2 Start train LFL application interface Dependency Infeversion of LFL Common and Application Module LFL application module LFL_Application_A_Module LFL_Application_B_Module LFL_Common_Module LFLApplication Manager C LFLApplication fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() I call functions LFLApplication_A_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl()} LFLApplication_B_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl() C C Dependency Injection Implement interfaces Dependency injection

Requirements for on-device training - iOS: BGTaskScheduler and BGProcessingTask -
Android: WorkManager - Background processing can run more than 10 min. - OS can interrupt the processing anytime - Requirements for on-device training - Battery, storage, device idle (background processing)

Excessively skewed FL participation 1st 2nd 3rd 4th 5th 6th
7th Days without Rollout User data Client Locally trained model Initial model Upload model Train Cloud Model aggregation

Excessively skewed FL participation !"##"$% = {()*((,"-,)%(./01203, /567203)) & :.;<07=>?0}
A- [1C66C.7D0E>F, 1C66C.7GFH] model_config.json { "training":{ … ＂uploaing limit":2 }, "rollout":{ "salt_key":"ranker", "slots":{ "begin":0, "end":10 } }, } 1st 2nd 3rd 4th 5th 6th 7th Days with Rollout

Trigger background processing repeatedly Need to wait for configuration changes
Check config (e.g. rollout) Background task OS trigger Check update Ready to train F LFL Platform (Server-side) Register LINE app LFL Platform (Client-side)

Trigger background processing repeatedly Need to wait for configuration changes
Check config (e.g. rollout) Background task OS trigger Check update Ready to train F LFL Platform (Server-side) Register LINE app LFL Platform (Client-side) Rollout is disabled

Trigger background processing repeatedly Need to wait for additional user
logs Background task OS trigger Check update Ready to train Train model Upload model T F LFL Platform (Server-side) Register LINE app LFL Platform (Client-side) Update model Delete user logs

Trigger background processing repeatedly Need to wait for additional user
logs Background task OS trigger Check update Ready to train Train model Upload model T F LFL Platform (Server-side) Register LINE app LFL Platform (Client-side) Update model Delete user logs Not enough user logs

Trigger background processing repeatedly Inefficient background processing of FL Introduce
retry interval and train interval Need to wait for configuration changes or additional user logs

Interval-based scheduling ⎯ Diverse retry interval and train interval ⎯
Single training session at a time à training duration estimation Schedule multiple LFL applications’ training Application A Application B Application C Retry interval Training Train interval Retry interval Retry interval Retry interval Training Train interval Training Train interval Training BGTask triggered BGTask triggered BGTask triggered BGTask triggered But not ready BGTask triggered But not ready BGTask triggered But not ready BGTask triggered

Interval-based scheduling ⎯ Diverse retry interval and train interval ⎯
Single training session at a time à training duration estimation Schedule multiple LFL applications’ training Application A Application B Application C Retry interval Training Train interval Retry interval Retry interval Retry interval Training Train interval Training Train interval Training BGTask triggered BGTask triggered BGTask triggered BGTask triggered But not ready BGTask triggered But not ready BGTask triggered But not ready BGTask triggered Duration estimation Training scheduling

Interval-based scheduling ⎯ Share a single background processing for all
LFL applications’ model training ⎯ Register background processing trigger with interval-based delay fun registerNextTrainingWithDelay() { val waitingTime = LFLApplications.minOf( application -> maxOf( application.getLatestTrainReadyCheckTime() + application.getRetryInterval(), application.getLatestTrainSuccessTime() + application.getTrainInterval() ) ) – System.currentTimeMillis() registerNextBackgroundProcessingTriggerWithDelay(waitingTime) } Interval-based scheduling Background Processing Training A Training C Training B Min. delay Min. delay Min. delay

Select application to train On-device training with interval-based scheduling LFL
Application List Application A Application B Application C Application A Background task OS trigger Model downloader Check update Update latest_retry_time Ready to train Minimum waiting time LFL Trainer Update latest_train_time delegate Train model Register Ready to upload? T Upload model delegate Model uploader Retrieve Calculate next delay time delegate T Filter applications with Interval conditions Model config Inference model Training model F F

ML library for training and inference Focused on lightweight library
⎯ A federated learning library for cross mobile platform (support iOS and Android) ⎯ Based on ONNX Runtime ( https://onnx.ai ) ⎯ Model conversion from TensorFlow or Pytorch ⎯ Currently, around 1.2MB ⎯ Limited number of operations and only CPU backend supported ⎯ Local Differential Privacy (LDP) supported ⎯ Use gaussian mechanism for differential privacy Yuki Federated Learning (YFL) SDK

LFL application’s Storage management Version update of ML model User
log DB ⎯ Training model and inference model ⎯ Model configurations ⎯ Integrity check for privacy configurations (e.g. rollout, uploading limit, LDP) ⎯ Different policy for major update and patch update ⎯ Delete user logs or reset uploading limit ⎯ Version matching with feature model ⎯ Delete old logs more than maximum training input ⎯ Delete logs used for training ML Model storage

Table of Contents - What is Federated Learning? - Federated

A large-scale project with the collaboration of multiple teams Lesson
Learned Blind spots and discrepancies can exist in the process of collaboration Larger project, higher complexity

Lesson Learned The importance of testing cannot be overemphasized Sample
app for end-to-end testing Test tools for background processing Remote logging for critical sections

Thank you

LFL Client Platform for Suppoting Multiple Fede...

LFL Client Platform for Suppoting Multiple Federated Learning Instances

More Decks by Tech-Verse2022

Other Decks in Technology

Featured

Transcript