user data Clients Server Aggregated ML model Clients User data Distribute ML model Training Download & upload ML model User data Training Server ML model Training
Learning at LINE - Why do we need a planform supporting multiple Federated Learning instances? - LFL client platform supporting multiple Federated Learning instances - On-device training of LFL client platform - Inside of LFL client platform - Lesson learned
ML model LFL Platform (Client-side) Feature service client Download to feature service Model (updated) Model Inference Train LFL Platform (Server-side) Feature Model Repository ML Model Repository User log Model aggregation Inference request User interaction Upload ML model
LFL Application LFL Application module Common module LFL Client Platform Model version check Model download Result upload Init with config Push logs Inference Model training User logs ML model Update model, config Start training Get training result LINE Feature service
Application Modules LFL_Common_Module LFL_Application_A_Module Application_A fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() LFL_Application_B_Module Application_B fun updateModel() fun startTrain() fun getTrainResult() Call functions Call functions LFLApplication Manager fun updateModelA() fun updateModelB() fun updateConfigA() fun startTrainA() fun startTrainB() fun getTrainResultA() fun getTrainResultB() C C C
Application Modules LFL_Common_Module LFL_Application_A_Module Application_A fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() LFL_Application_B_Module Application_B fun updateModel() fun startTrain() fun getTrainResult() Call functions Call functions LFLApplication Manager fun updateModelA() fun updateModelB() fun updateConfigA() fun startTrainA() fun startTrainB() fun getTrainResultA() fun getTrainResultB() C C C
Common and Application Module LFL_Common_Module LFL_Application_A_Module Application_A fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() LFL_Application_B_Module Application_B fun updateModel() fun startTrain() fun getTrainResult() Call functions Call functions LFLApplication Manager fun updateModelA() fun updateModelB() fun updateConfigA() fun startTrainA() fun startTrainB() fun getTrainResultA() fun getTrainResultB() C C C LFL_Application_A_Module LFL_Application_B_Module LFL_Common_Module LFLApplication Manager C LFLApplication fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() I call functions LFLApplication_A_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl()} LFLApplication_B_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl() C C Implement interfaces Dependency injection
of LFL client platform LFL Client Platform Common module Machine learning library LFL application manager Local trigger 1 Training trigger 3 Start train to actual instance 2 Start train LFL application interface Dependency Infeversion of LFL Common and Application Module LFL application module LFL_Application_A_Module LFL_Application_B_Module LFL_Common_Module LFLApplication Manager C LFLApplication fun updateModel() fun updateConfig() fun startTrain() fun getTrainResult() I call functions LFLApplication_A_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl()} LFLApplication_B_Impl fun updateModelImpl() fun updateConfigImpl() fun startTrainImpl() fun getTrainResultImpl() C C Dependency Injection Implement interfaces Dependency injection
Learning at LINE - Why do we need a planform supporting multiple Federated Learning instances? - LFL client platform supporting multiple Federated Learning instances - On-device training of LFL client platform - Inside of LFL client platform - Lesson learned
Android: WorkManager - Background processing can run more than 10 min. - OS can interrupt the processing anytime - Requirements for on-device training - Battery, storage, device idle (background processing)
logs Background task OS trigger Check update Ready to train Train model Upload model T F LFL Platform (Server-side) Register LINE app LFL Platform (Client-side) Update model Delete user logs
logs Background task OS trigger Check update Ready to train Train model Upload model T F LFL Platform (Server-side) Register LINE app LFL Platform (Client-side) Update model Delete user logs Not enough user logs
Single training session at a time à training duration estimation Schedule multiple LFL applications’ training Application A Application B Application C Retry interval Training Train interval Retry interval Retry interval Retry interval Training Train interval Training Train interval Training BGTask triggered BGTask triggered BGTask triggered BGTask triggered But not ready BGTask triggered But not ready BGTask triggered But not ready BGTask triggered
Single training session at a time à training duration estimation Schedule multiple LFL applications’ training Application A Application B Application C Retry interval Training Train interval Retry interval Retry interval Retry interval Training Train interval Training Train interval Training BGTask triggered BGTask triggered BGTask triggered BGTask triggered But not ready BGTask triggered But not ready BGTask triggered But not ready BGTask triggered Duration estimation Training scheduling
LFL applications’ model training ⎯ Register background processing trigger with interval-based delay fun registerNextTrainingWithDelay() { val waitingTime = LFLApplications.minOf( application -> maxOf( application.getLatestTrainReadyCheckTime() + application.getRetryInterval(), application.getLatestTrainSuccessTime() + application.getTrainInterval() ) ) – System.currentTimeMillis() registerNextBackgroundProcessingTriggerWithDelay(waitingTime) } Interval-based scheduling Background Processing Training A Training C Training B Min. delay Min. delay Min. delay
Application List Application A Application B Application C Application A Background task OS trigger Model downloader Check update Update latest_retry_time Ready to train Minimum waiting time LFL Trainer Update latest_train_time delegate Train model Register Ready to upload? T Upload model delegate Model uploader Retrieve Calculate next delay time delegate T Filter applications with Interval conditions Model config Inference model Training model F F
Application List Application A Application B Application C Application A Background task OS trigger Model downloader Check update Update latest_retry_time Ready to train Minimum waiting time LFL Trainer Update latest_train_time delegate Train model Register Ready to upload? T Upload model delegate Model uploader Retrieve Calculate next delay time delegate T Filter applications with Interval conditions Model config Inference model Training model F F
Application List Application A Application B Application C Application A Background task OS trigger Model downloader Check update Update latest_retry_time Ready to train Minimum waiting time LFL Trainer Update latest_train_time delegate Train model Register Ready to upload? T Upload model delegate Model uploader Retrieve Calculate next delay time delegate T Filter applications with Interval conditions Model config Inference model Training model F F
Application List Application A Application B Application C Application A Background task OS trigger Model downloader Check update Update latest_retry_time Ready to train Minimum waiting time LFL Trainer Update latest_train_time delegate Train model Register Ready to upload? T Upload model delegate Model uploader Retrieve Calculate next delay time delegate T Filter applications with Interval conditions Model config Inference model Training model F F
Application List Application A Application B Application C Application A Background task OS trigger Model downloader Check update Update latest_retry_time Ready to train Minimum waiting time LFL Trainer Update latest_train_time delegate Train model Register Ready to upload? T Upload model delegate Model uploader Retrieve Calculate next delay time delegate T Filter applications with Interval conditions Model config Inference model Training model F F
Learning at LINE - Why do we need a planform supporting multiple Federated Learning instances? - LFL client platform supporting multiple Federated Learning instances - On-device training of LFL client platform - Inside of LFL client platform - Lesson learned
⎯ A federated learning library for cross mobile platform (support iOS and Android) ⎯ Based on ONNX Runtime ( https://onnx.ai ) ⎯ Model conversion from TensorFlow or Pytorch ⎯ Currently, around 1.2MB ⎯ Limited number of operations and only CPU backend supported ⎯ Local Differential Privacy (LDP) supported ⎯ Use gaussian mechanism for differential privacy Yuki Federated Learning (YFL) SDK
⎯ A federated learning library for cross mobile platform (support iOS and Android) ⎯ Based on ONNX Runtime ( https://onnx.ai ) ⎯ Model conversion from TensorFlow or Pytorch ⎯ Currently, around 1.2MB ⎯ Limited number of operations and only CPU backend supported ⎯ Local Differential Privacy (LDP) supported ⎯ Use gaussian mechanism for differential privacy Yuki Federated Learning (YFL) SDK
log DB ⎯ Training model and inference model ⎯ Model configurations ⎯ Integrity check for privacy configurations (e.g. rollout, uploading limit, LDP) ⎯ Different policy for major update and patch update ⎯ Delete user logs or reset uploading limit ⎯ Version matching with feature model ⎯ Delete old logs more than maximum training input ⎯ Delete logs used for training ML Model storage
Learning at LINE - Why do we need a planform supporting multiple Federated Learning instances? - LFL client platform supporting multiple Federated Learning instances - On-device training of LFL client platform - Inside of LFL client platform - Lesson learned