$30 off During Our Annual Pro Sale. View Details »

How to Apply Large ML Models for AI-Text Filtering Models

How to Apply Large ML Models for AI-Text Filtering Models

Tech-Verse2022
PRO

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. How to Apply Large ML Models for AI-Text Filtering Models

    Hyung Rak Kim / LINE Plus
  2. • Hyungrak Kim • NLP Engineer • AI text filter

    model • Likes to learn and use new technology
  3. None
  4. • (Stable AI): https://beta.dreamstudio.ai/dream “beautiful forest”

  5. Contents › Introduction › Large ML model training tech ›

    Apply large ML model to AI text filter › Experiment Result › Expected Effectiveness › Conclusion
  6. Introduction

  7. Introduction What is AI text filter User LINE Monitoring System

    私と付き合いたい 場合は連絡してくださ い hr.k@email.com Please contact me if you want to date me hr.k@email.com Translation Check 380,000,000 Data every month Normal Personal Info Porn Harass illegal Advertising 0 0 0 1 1 JP Language AI Text filter Model 0
  8. Introduction What is AI text Filter Problem JP Language AI

    Text filter Model Fine-Tuning JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Public Pre-training Model
  9. Introduction What is AI text Filter Problem JP Language AI

    Text filter Model › Which model performance is better? › What if the language is different? • Research Cost • Development Cost • Service Cost Fine-Tuning Problem JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Public Pre-training Model
  10. Introduction Solution Language Multi Language Performance Large ML Model Technique

    Large ML Training Tech
  11. Introduction Solution Language Multi Language Performance Large ML Model Technique

    Large ML Training Tech
  12. Introduction Solution Language Multi Language Performance Large ML Model Technique

    Large ML Training Tech
  13. Introduction What is AI text Filter Solution Single Language Model

    Multi Language Model Impact X100 Cost Reduction Service Extension AI Text filter Model 110 Million Japan English Thailand Taiwan Indonesia ....... ..... .. 11 Billion Large Model Training Technique
  14. Introduction Contribution › Introduction and sharing of large ML model

    training technology › AI text filter advancement using large multi-language model › With MLU team of LINE MLOps › Model serving › With MLU serving team of LINE ML service
  15. Large ML Model Training Tech

  16. Large ML Model Training Tech Basic Scaling Lightweight Pruning Quantization

    Knowledge Distillation Data Parallelism Model Parallelism CPU-Offload
  17. Large ML Model Training Tech Data Parallelism Data 1 Data

    2 Data 3 V100 GPU 1 V100 GPU 2 V100 GPU 3 ML model ML model ML model
  18. Large ML Model Training Tech Model Parallelism Input Layer 1

    Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce
  19. Large ML Model Training Tech Model Parallelism + CPU Offload

    CPU Space Model Size UP CPU Offload Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce
  20. Large ML Model Training Tech Large ML model training framework

    › Framework › Why choose DeepSpeed › Open Source › CPU offload › Supported best method from ML Tech • (ICML2022 Big model tutorial): https://icml.cc/virtual/2022/tutorial/18440 • (DeepSpeed):https://www.deepspeed.ai
  21. Apply large ML model to AI Text filter

  22. Apply large ML model to AI text filter Large model

    training › Constructure GPU GPU GPU GPU Node 1 GPU GPU GPU GPU GPU GPU GPU GPU Node 2 GPU GPU GPU GPU GPU GPU GPU GPU Node 3 GPU GPU GPU GPU DeepSpeed Multi-Node Setting • (DeepSpeed): https://www.deepspeed.ai/ › Node › GPU : A100 40G › CPU core: 70 › CPU memory: 1T › GPU Number: 8
  23. Apply large ML model to AI text filter Large model

    training › Constructure GPU GPU GPU GPU Node 1 GPU GPU GPU GPU GPU GPU GPU GPU Node 2 GPU GPU GPU GPU GPU GPU GPU GPU Node 3 GPU GPU GPU GPU DeepSpeed Multi-Node Setting Training Configure + • (DeepSpeed): https://www.deepspeed.ai/ Training Pre-training Model Multi Language 11 Billion Fine-tuning AI Text Filter + + Data 730,000 › Node › GPU : A100 40G › CPU core: 70 › CPU memory: 1T › GPU Number: 8
  24. Problem Environment Setting Multi-node Sharing Pre-training model Dependency Apply large

    ML model to AI text filter
  25. Problem Environment Setting Multi-node Sharing Pre-training model Dependency Apply large

    ML model to AI text filter
  26. Problem Environment Setting Multi-node Sharing Pre-training model Dependency Apply large

    ML model to AI text filter
  27. Apply large ML model to AI text filter Environment setting

    problem • (DeepSpeed): https://hub.docker.com/r/deepspeed/deepspeed/tags?page=1&ordering=last_updated DeepSpeed Environment CPU GPU Library Dependency DeepSpeed OS System Library
  28. Apply large ML model to AI text filter Environment setting

    problem • (DeepSpeed): https://hub.docker.com/r/deepspeed/deepspeed/tags?page=1&ordering=last_updated DeepSpeed Environment CPU GPU Library Dependency DeepSpeed OS System Library Cuda Extension Build System CUDA Extension Ninja G++/C++ DeepSpeed
  29. Apply large ML model to AI text filter Environment setting

    solution DeepSpeed Env Setting OS System Library DeepSpeed Library Multi Node Library
  30. Apply large ML model to AI text filter Environment setting

    solution All Function used in MLU Fixed DeepSpeed Stable Version Training Library Free MLU Environment DeepSpeed Env Setting OS System Library DeepSpeed Library Multi Node Library
  31. Apply large ML model to AI text filter Environment setting

    solution All Function used in MLU Fixed DeepSpeed Stable Version Training Library Free MLU Environment Docker Installation Document Docker Hub DeepSpeed Env Setting OS System Library DeepSpeed Library Multi Node Library
  32. Apply large ML model to AI text filter Multi-node training

    file sharing problem 1 Training MLU Environment GPU Server First Training Start CUDA Extension GPU Accelerator • (DeepSpeed): https://www.deepspeed.ai/tutorials/advanced-install/
  33. Apply large ML model to AI text filter Multi-node training

    file sharing problem 2 Training Header GPU Node 1 MLU Environment Worker GPU Node 2 Worker GPU Node 3 Training Start CUDA Extension ? ? • (DeepSpeed): https://www.deepspeed.ai/tutorials/advanced-install/ ssh ssh
  34. Apply large ML model to AI text filter Multi-node training

    file sharing solution 1 Worker GPU Node 2 Worker GPU Node 3 Header GPU Node 1 Training Start CUDA Extension Worker Node IP address List Worker GPU Node N Secure File Transfer Sharing Module Multi Node File Sharing Module
  35. Apply large ML model to AI text filter Multi-node training

    file sharing solution 2 Header GPU Node 1 MLU Environment Worker GPU Node 2 Worker GPU Node 3 Training Start CUDA Extension Sharing Training • (DeepSpeed): https://www.deepspeed.ai/tutorials/advanced-install/ ssh ssh
  36. Apply large ML model to AI text filter Pre-training model

    parallelism dependency problem Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce Coding
  37. Apply large ML model to AI text filter Pre-training model

    parallelism dependency problem Public Pre-training model JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Un Parallelized Model Code Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce Coding
  38. Apply large ML model to AI text filter Pre-training model

    parallelism dependency problem Public Pre-training model JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Un Parallelized Model Code Pre-training Model Parallelism Dependency Fine-tuning Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce Coding
  39. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution Parallelism Converting Pre-training Model Code Parallelism
  40. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution Parallelism Converting Pre-training Model Code Parallelism Pre-training Model Weight Partitioning
  41. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 1 Public Pre-training Model Transformer model Encoder Layer 1 Layer 2 Layer N Decoder Layer 1 Layer 2 Layer N
  42. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 1 Public Pre-training Model Transformer model Encoder Layer 1 Layer 2 Layer N Decoder Layer 1 Layer 2 Layer N Layer 1 Multi Head Attention Key Query Value Feed Forward Network + Intermediate Feed Forward Network H to 4H FFN 4H to H FFN
  43. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 2 Multi Language Pre-training Model Code Parallelism Layer GPU 1 GPU 2 Key Query Multi Head Attention Layer Value + All Reduce • (Megatron ML): https://arxiv.org/pdf/1909.08053.pdf FFN Feed Forward Layer
  44. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 2 Multi Language Pre-training Model Code Parallelism Layer GPU 1 GPU 2 Key Query Multi Head Attention Layer Value + Intermediate Feed Forward Layer All Reduce Output All Reduce • (Megatron ML): https://arxiv.org/pdf/1909.08053.pdf H to 4H 4H to H FFN Feed Forward Layer
  45. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism Model Parameter Partitioning Algorithm • (Megatron ML): https://github.com/NVIDIA/Megatron-LM Model Code Parallelism Model Load Weight Partitioning Pre-Training Model Weight Fine-tuning
  46. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism Model Parameter Partitioning Algorithm • (Megatron ML): https://github.com/NVIDIA/Megatron-LM Model Code Parallelism Model Load Weight Partitioning Pre-Training Model Weight Multi Head Attention Layer Feed Forward Layer Intermediated Feed Forward Layer Fine-tuning Auto Partitioning GPU 1 GPU 2
  47. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution Public Pre-training model Un-Parallelized Model Code Fine-tuning Model Parallelism Parallelized model Group N Model GPU 1 GPU 2 Parallelism Converter Model Code Parallelism Model Weight Partitioning
  48. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution analysis Disadvantage Advantage Unstable Converge Model Performance Down Parallelism Dependency Free More Research Model Size Up
  49. Apply large ML model to AI text filter Performance tunning

    label correlation Label Correlation Normal Advertising Personal Info Porn illegal Harass Algorithm Normal Advertising Personal Info Porn illegal Harass Global Correlation Embedding
  50. Apply large ML model to AI text filter Large model

    Serving › Large model serving • (DeepSpeed Inference):https://www.deepspeed.ai/tutorials/inference-tutorial/ Model Model Optimize FP16 Loss Scaling
  51. Apply large ML model to AI text filter Large model

    Serving › Large model serving • (DeepSpeed Inference):https://www.deepspeed.ai/tutorials/inference-tutorial/ Model GPU Kernel Optimization Inference Parallelism Inference Model Optimize FP16 Loss Scaling
  52. Apply large ML model to AI text filter Large model

    Serving › Large model serving • (DeepSpeed Inference):https://www.deepspeed.ai/tutorials/inference-tutorial/ V100 GPU Auto Scaling MLU Serving Model GPU Kernel Optimization Inference Parallelism Inference Model Optimize FP16 Loss Scaling
  53. Experiment Result

  54. Experiment Result Experiment setting VS AI-Text Filter Japanese single language

    model 110 Million Service model AI-Text Filter Multi Language Large Model 11 Billion Tuning VS AI-Text Filter Multi Language Large Model 11 Billion Not Tuning
  55. Experiment Result Experiment test data Label Count Ratio(%) Normal 99,996

    86.2 Info 10,278 8.8 Porn 2,299 1.9 Harass 1,106 0.9 illegal 106 0.09 AD 2,180 1.8 Total 115,965 99996 10278 2299 1106 106 2180 0 20000 40000 60000 80000 100000 120000 Normal Info Porn Harass illegal AD Test Data Count
  56. Experiment Result F1 Score result 0.4 0.5 0.6 0.7 0.8

    0.9 1 Normal Info Porn Harass illegal AD F1 score Multi-Tuning Multi JP Service
  57. Experiment Result F1 Score result 0.4 0.5 0.6 0.7 0.8

    0.9 1 Normal Info Porn Harass illegal AD F1 score Multi-Tuning Multi JP Service 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 Total Average F1 score Multi Tuning Multi JP Service -1% -9.9% 0%
  58. Experiment Result AUC result 0.6 0.65 0.7 0.75 0.8 0.85

    0.9 0.95 1 Normal Info Porn Harass illegal AD AUC score Multi Tuning Multi JP Service
  59. Experiment Result AUC result 0.6 0.65 0.7 0.75 0.8 0.85

    0.9 0.95 1 Normal Info Porn Harass illegal AD AUC score Multi Tuning Multi JP Service 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 Total Average AUC score Multi Tuning Multi JP Service -1% -9.1% 0%
  60. Experiment Result Qualitative evaluation 経営難で銀行等からの融資待ちの方、収入がなく 生活が出来ない……等々、コロナショックで困って る方🙀 連絡頂ければ即融資可能です😊‼ Those who

    are in trouble due to the corona shock, such as those who are waiting for loans from banks due to financial difficulties, those who cannot live without income, etc. 🙀 If you contact us, you can finance immediately 😊!! Translation User
  61. Experiment Result Qualitative evaluation 経営難で銀行等からの融資待ちの方、収入がなく 生活が出来ない……等々、コロナショックで困って る方🙀 連絡頂ければ即融資可能です😊‼ Those who

    are in trouble due to the corona shock, such as those who are waiting for loans from banks due to financial difficulties, those who cannot live without income, etc. 🙀 If you contact us, you can finance immediately 😊!! Translation User JP Service Model illegal 12%
  62. Experiment Result Qualitative evaluation 経営難で銀行等からの融資待ちの方、収入がなく 生活が出来ない……等々、コロナショックで困って る方🙀 連絡頂ければ即融資可能です😊‼ Those who

    are in trouble due to the corona shock, such as those who are waiting for loans from banks due to financial difficulties, those who cannot live without income, etc. 🙀 If you contact us, you can finance immediately 😊!! Translation User Multi Language Large Model - Tuning JP Service Model VS illegal 99% illegal 12%
  63. Expected Effectiveness

  64. Expected Effectiveness Effect › The effect of introducing the Large

    ML model Improvement Performance Service Extension Large ML Model Training Tech 1 2 3
  65. Expectation 1 › 10% more accurate AI text filter of

    LMP system › 0.3% Monitoring rate down based current AI text filter service model Total 380,000,000 every month JP Service Model 5,700,000 Monitoring Data 1.5% Expected Effectiveness
  66. Expectation 1 › 10% more accurate AI text filter of

    LMP system › 0.3% Monitoring rate down based current AI text filter service model Total 380,000,000 every month JP Service Model Multi Language Large Model 5,700,000 Monitoring Data 1.5% 1.2% 4,560,000 Monitoring Data − Expected Effectiveness
  67. Expectation 1 › 10% more accurate AI text filter of

    LMP system › 0.3% Monitoring rate down based current AI text filter service model Total 380,000,000 every month JP Service Model Multi Language Large Model Monthly Monitoring Resource 1.5% 1.2% 4,560,000 Monitoring Data − Expected Effectiveness 1,140,000 Reduction 5,700,000 Monitoring Data
  68. Expectation 2 Service Resource Monitored for a Year 300 400

    500 600 700 Resource Multi Language Large Model JP Service Model X axis: 𝟏𝟎𝟓 -13,680,000 Expected Effectiveness
  69. Expectation 2 Service Resource Monitored for a Year 300 400

    500 600 700 Resource Multi Language Large Model JP Service Model X axis: 𝟏𝟎𝟓 One Year Monitoring Resource -20% -13,680,000 Expected Effectiveness
  70. Conclusion

  71. Conclusion Conclusion & Future Work › Conclusion › Not easy

    to understand and put in practice › Fun to study as much as it was difficult › Large model effectiveness › Need more collaboration with other teams
  72. Conclusion Conclusion & Future Work › Conclusion › Not easy

    to understand and put in practice › Fun to study as much as it was difficult › Large model effectiveness › Need more collaboration with other teams › Future work › Large model hyper-parameter tuning
  73. Next Session Info MLU & MLU Serving

  74. Thank you