Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Apply Large ML Models for AI-Text Filter...

How to Apply Large ML Models for AI-Text Filtering Models

Tech-Verse2022

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. • Hyungrak Kim • NLP Engineer • AI text filter

    model • Likes to learn and use new technology
  2. Contents › Introduction › Large ML model training tech ›

    Apply large ML model to AI text filter › Experiment Result › Expected Effectiveness › Conclusion
  3. Introduction What is AI text filter User LINE Monitoring System

    私と付き合いたい 場合は連絡してくださ い [email protected] Please contact me if you want to date me [email protected] Translation Check 380,000,000 Data every month Normal Personal Info Porn Harass illegal Advertising 0 0 0 1 1 JP Language AI Text filter Model 0
  4. Introduction What is AI text Filter Problem JP Language AI

    Text filter Model Fine-Tuning JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Public Pre-training Model
  5. Introduction What is AI text Filter Problem JP Language AI

    Text filter Model › Which model performance is better? › What if the language is different? • Research Cost • Development Cost • Service Cost Fine-Tuning Problem JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Public Pre-training Model
  6. Introduction What is AI text Filter Solution Single Language Model

    Multi Language Model Impact X100 Cost Reduction Service Extension AI Text filter Model 110 Million Japan English Thailand Taiwan Indonesia ....... ..... .. 11 Billion Large Model Training Technique
  7. Introduction Contribution › Introduction and sharing of large ML model

    training technology › AI text filter advancement using large multi-language model › With MLU team of LINE MLOps › Model serving › With MLU serving team of LINE ML service
  8. Large ML Model Training Tech Basic Scaling Lightweight Pruning Quantization

    Knowledge Distillation Data Parallelism Model Parallelism CPU-Offload
  9. Large ML Model Training Tech Data Parallelism Data 1 Data

    2 Data 3 V100 GPU 1 V100 GPU 2 V100 GPU 3 ML model ML model ML model
  10. Large ML Model Training Tech Model Parallelism Input Layer 1

    Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce
  11. Large ML Model Training Tech Model Parallelism + CPU Offload

    CPU Space Model Size UP CPU Offload Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce
  12. Large ML Model Training Tech Large ML model training framework

    › Framework › Why choose DeepSpeed › Open Source › CPU offload › Supported best method from ML Tech • (ICML2022 Big model tutorial): https://icml.cc/virtual/2022/tutorial/18440 • (DeepSpeed):https://www.deepspeed.ai
  13. Apply large ML model to AI text filter Large model

    training › Constructure GPU GPU GPU GPU Node 1 GPU GPU GPU GPU GPU GPU GPU GPU Node 2 GPU GPU GPU GPU GPU GPU GPU GPU Node 3 GPU GPU GPU GPU DeepSpeed Multi-Node Setting • (DeepSpeed): https://www.deepspeed.ai/ › Node › GPU : A100 40G › CPU core: 70 › CPU memory: 1T › GPU Number: 8
  14. Apply large ML model to AI text filter Large model

    training › Constructure GPU GPU GPU GPU Node 1 GPU GPU GPU GPU GPU GPU GPU GPU Node 2 GPU GPU GPU GPU GPU GPU GPU GPU Node 3 GPU GPU GPU GPU DeepSpeed Multi-Node Setting Training Configure + • (DeepSpeed): https://www.deepspeed.ai/ Training Pre-training Model Multi Language 11 Billion Fine-tuning AI Text Filter + + Data 730,000 › Node › GPU : A100 40G › CPU core: 70 › CPU memory: 1T › GPU Number: 8
  15. Apply large ML model to AI text filter Environment setting

    problem • (DeepSpeed): https://hub.docker.com/r/deepspeed/deepspeed/tags?page=1&ordering=last_updated DeepSpeed Environment CPU GPU Library Dependency DeepSpeed OS System Library
  16. Apply large ML model to AI text filter Environment setting

    problem • (DeepSpeed): https://hub.docker.com/r/deepspeed/deepspeed/tags?page=1&ordering=last_updated DeepSpeed Environment CPU GPU Library Dependency DeepSpeed OS System Library Cuda Extension Build System CUDA Extension Ninja G++/C++ DeepSpeed
  17. Apply large ML model to AI text filter Environment setting

    solution DeepSpeed Env Setting OS System Library DeepSpeed Library Multi Node Library
  18. Apply large ML model to AI text filter Environment setting

    solution All Function used in MLU Fixed DeepSpeed Stable Version Training Library Free MLU Environment DeepSpeed Env Setting OS System Library DeepSpeed Library Multi Node Library
  19. Apply large ML model to AI text filter Environment setting

    solution All Function used in MLU Fixed DeepSpeed Stable Version Training Library Free MLU Environment Docker Installation Document Docker Hub DeepSpeed Env Setting OS System Library DeepSpeed Library Multi Node Library
  20. Apply large ML model to AI text filter Multi-node training

    file sharing problem 1 Training MLU Environment GPU Server First Training Start CUDA Extension GPU Accelerator • (DeepSpeed): https://www.deepspeed.ai/tutorials/advanced-install/
  21. Apply large ML model to AI text filter Multi-node training

    file sharing problem 2 Training Header GPU Node 1 MLU Environment Worker GPU Node 2 Worker GPU Node 3 Training Start CUDA Extension ? ? • (DeepSpeed): https://www.deepspeed.ai/tutorials/advanced-install/ ssh ssh
  22. Apply large ML model to AI text filter Multi-node training

    file sharing solution 1 Worker GPU Node 2 Worker GPU Node 3 Header GPU Node 1 Training Start CUDA Extension Worker Node IP address List Worker GPU Node N Secure File Transfer Sharing Module Multi Node File Sharing Module
  23. Apply large ML model to AI text filter Multi-node training

    file sharing solution 2 Header GPU Node 1 MLU Environment Worker GPU Node 2 Worker GPU Node 3 Training Start CUDA Extension Sharing Training • (DeepSpeed): https://www.deepspeed.ai/tutorials/advanced-install/ ssh ssh
  24. Apply large ML model to AI text filter Pre-training model

    parallelism dependency problem Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce Coding
  25. Apply large ML model to AI text filter Pre-training model

    parallelism dependency problem Public Pre-training model JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Un Parallelized Model Code Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce Coding
  26. Apply large ML model to AI text filter Pre-training model

    parallelism dependency problem Public Pre-training model JP BERT JP Char BERT JP RoBERTa JP small BERT JP Distill BERT ....... ..... Un Parallelized Model Code Pre-training Model Parallelism Dependency Fine-tuning Input Layer 1 Layer 2 Output GPU 1 GPU 2 Intra Operator Parallelism All Reduce Coding
  27. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution Parallelism Converting Pre-training Model Code Parallelism
  28. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution Parallelism Converting Pre-training Model Code Parallelism Pre-training Model Weight Partitioning
  29. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 1 Public Pre-training Model Transformer model Encoder Layer 1 Layer 2 Layer N Decoder Layer 1 Layer 2 Layer N
  30. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 1 Public Pre-training Model Transformer model Encoder Layer 1 Layer 2 Layer N Decoder Layer 1 Layer 2 Layer N Layer 1 Multi Head Attention Key Query Value Feed Forward Network + Intermediate Feed Forward Network H to 4H FFN 4H to H FFN
  31. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 2 Multi Language Pre-training Model Code Parallelism Layer GPU 1 GPU 2 Key Query Multi Head Attention Layer Value + All Reduce • (Megatron ML): https://arxiv.org/pdf/1909.08053.pdf FFN Feed Forward Layer
  32. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism 2 Multi Language Pre-training Model Code Parallelism Layer GPU 1 GPU 2 Key Query Multi Head Attention Layer Value + Intermediate Feed Forward Layer All Reduce Output All Reduce • (Megatron ML): https://arxiv.org/pdf/1909.08053.pdf H to 4H 4H to H FFN Feed Forward Layer
  33. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism Model Parameter Partitioning Algorithm • (Megatron ML): https://github.com/NVIDIA/Megatron-LM Model Code Parallelism Model Load Weight Partitioning Pre-Training Model Weight Fine-tuning
  34. Apply large ML model to AI text filter Pre-training model

    parallelism dependency code parallelism Model Parameter Partitioning Algorithm • (Megatron ML): https://github.com/NVIDIA/Megatron-LM Model Code Parallelism Model Load Weight Partitioning Pre-Training Model Weight Multi Head Attention Layer Feed Forward Layer Intermediated Feed Forward Layer Fine-tuning Auto Partitioning GPU 1 GPU 2
  35. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution Public Pre-training model Un-Parallelized Model Code Fine-tuning Model Parallelism Parallelized model Group N Model GPU 1 GPU 2 Parallelism Converter Model Code Parallelism Model Weight Partitioning
  36. Apply large ML model to AI text filter Pre-training model

    parallelism dependency solution analysis Disadvantage Advantage Unstable Converge Model Performance Down Parallelism Dependency Free More Research Model Size Up
  37. Apply large ML model to AI text filter Performance tunning

    label correlation Label Correlation Normal Advertising Personal Info Porn illegal Harass Algorithm Normal Advertising Personal Info Porn illegal Harass Global Correlation Embedding
  38. Apply large ML model to AI text filter Large model

    Serving › Large model serving • (DeepSpeed Inference):https://www.deepspeed.ai/tutorials/inference-tutorial/ Model Model Optimize FP16 Loss Scaling
  39. Apply large ML model to AI text filter Large model

    Serving › Large model serving • (DeepSpeed Inference):https://www.deepspeed.ai/tutorials/inference-tutorial/ Model GPU Kernel Optimization Inference Parallelism Inference Model Optimize FP16 Loss Scaling
  40. Apply large ML model to AI text filter Large model

    Serving › Large model serving • (DeepSpeed Inference):https://www.deepspeed.ai/tutorials/inference-tutorial/ V100 GPU Auto Scaling MLU Serving Model GPU Kernel Optimization Inference Parallelism Inference Model Optimize FP16 Loss Scaling
  41. Experiment Result Experiment setting VS AI-Text Filter Japanese single language

    model 110 Million Service model AI-Text Filter Multi Language Large Model 11 Billion Tuning VS AI-Text Filter Multi Language Large Model 11 Billion Not Tuning
  42. Experiment Result Experiment test data Label Count Ratio(%) Normal 99,996

    86.2 Info 10,278 8.8 Porn 2,299 1.9 Harass 1,106 0.9 illegal 106 0.09 AD 2,180 1.8 Total 115,965 99996 10278 2299 1106 106 2180 0 20000 40000 60000 80000 100000 120000 Normal Info Porn Harass illegal AD Test Data Count
  43. Experiment Result F1 Score result 0.4 0.5 0.6 0.7 0.8

    0.9 1 Normal Info Porn Harass illegal AD F1 score Multi-Tuning Multi JP Service
  44. Experiment Result F1 Score result 0.4 0.5 0.6 0.7 0.8

    0.9 1 Normal Info Porn Harass illegal AD F1 score Multi-Tuning Multi JP Service 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 Total Average F1 score Multi Tuning Multi JP Service -1% -9.9% 0%
  45. Experiment Result AUC result 0.6 0.65 0.7 0.75 0.8 0.85

    0.9 0.95 1 Normal Info Porn Harass illegal AD AUC score Multi Tuning Multi JP Service
  46. Experiment Result AUC result 0.6 0.65 0.7 0.75 0.8 0.85

    0.9 0.95 1 Normal Info Porn Harass illegal AD AUC score Multi Tuning Multi JP Service 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 Total Average AUC score Multi Tuning Multi JP Service -1% -9.1% 0%
  47. Experiment Result Qualitative evaluation 経営難で銀行等からの融資待ちの方、収入がなく 生活が出来ない……等々、コロナショックで困って る方🙀 連絡頂ければ即融資可能です😊‼ Those who

    are in trouble due to the corona shock, such as those who are waiting for loans from banks due to financial difficulties, those who cannot live without income, etc. 🙀 If you contact us, you can finance immediately 😊!! Translation User
  48. Experiment Result Qualitative evaluation 経営難で銀行等からの融資待ちの方、収入がなく 生活が出来ない……等々、コロナショックで困って る方🙀 連絡頂ければ即融資可能です😊‼ Those who

    are in trouble due to the corona shock, such as those who are waiting for loans from banks due to financial difficulties, those who cannot live without income, etc. 🙀 If you contact us, you can finance immediately 😊!! Translation User JP Service Model illegal 12%
  49. Experiment Result Qualitative evaluation 経営難で銀行等からの融資待ちの方、収入がなく 生活が出来ない……等々、コロナショックで困って る方🙀 連絡頂ければ即融資可能です😊‼ Those who

    are in trouble due to the corona shock, such as those who are waiting for loans from banks due to financial difficulties, those who cannot live without income, etc. 🙀 If you contact us, you can finance immediately 😊!! Translation User Multi Language Large Model - Tuning JP Service Model VS illegal 99% illegal 12%
  50. Expected Effectiveness Effect › The effect of introducing the Large

    ML model Improvement Performance Service Extension Large ML Model Training Tech 1 2 3
  51. Expectation 1 › 10% more accurate AI text filter of

    LMP system › 0.3% Monitoring rate down based current AI text filter service model Total 380,000,000 every month JP Service Model 5,700,000 Monitoring Data 1.5% Expected Effectiveness
  52. Expectation 1 › 10% more accurate AI text filter of

    LMP system › 0.3% Monitoring rate down based current AI text filter service model Total 380,000,000 every month JP Service Model Multi Language Large Model 5,700,000 Monitoring Data 1.5% 1.2% 4,560,000 Monitoring Data − Expected Effectiveness
  53. Expectation 1 › 10% more accurate AI text filter of

    LMP system › 0.3% Monitoring rate down based current AI text filter service model Total 380,000,000 every month JP Service Model Multi Language Large Model Monthly Monitoring Resource 1.5% 1.2% 4,560,000 Monitoring Data − Expected Effectiveness 1,140,000 Reduction 5,700,000 Monitoring Data
  54. Expectation 2 Service Resource Monitored for a Year 300 400

    500 600 700 Resource Multi Language Large Model JP Service Model X axis: 𝟏𝟎𝟓 -13,680,000 Expected Effectiveness
  55. Expectation 2 Service Resource Monitored for a Year 300 400

    500 600 700 Resource Multi Language Large Model JP Service Model X axis: 𝟏𝟎𝟓 One Year Monitoring Resource -20% -13,680,000 Expected Effectiveness
  56. Conclusion Conclusion & Future Work › Conclusion › Not easy

    to understand and put in practice › Fun to study as much as it was difficult › Large model effectiveness › Need more collaboration with other teams
  57. Conclusion Conclusion & Future Work › Conclusion › Not easy

    to understand and put in practice › Fun to study as much as it was difficult › Large model effectiveness › Need more collaboration with other teams › Future work › Large model hyper-parameter tuning