Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Opening the Black Box of NLP Models: A Self-Service NLP Platform for Content Moderation @ TECHPULSE 2023

Opening the Black Box of NLP Models: A Self-Service NLP Platform for Content Moderation @ TECHPULSE 2023

- Speaker: Chester Her
- Event: http://techpulse.line.me/

本次議程將分享如何透過 NLP 的技術減少不同服務在內容審核時的時間。透過我們提供的平台 - SmartText,讓內部用戶能夠建立各服務服務的NLP模型,讓 AI 能夠存在於各位日常生活中。然而,由於機器學習的可解釋性較弱,因此我們透過 XAI 和 EDA,讓用戶除了能夠了解模型是如何預測,並進一步進行故障排除和改進模型,提高模型性能和預測結果。

LINE Developers Taiwan

February 21, 2023
Tweet

More Decks by LINE Developers Taiwan

Other Decks in Technology

Transcript

  1. 1

  2. Agenda › Content Moderation using NLP Service › What is

    SmartText › How can User Adjust the Model on NLP Platform › Conclusion
  3. Content Moderation › The task requires lots of manpower ›

    Number of Ad is increasing Problem Statement Case Introduction › The reviewer has to review all the Ad contents, decide if needs to reject, and provide the rejected reason 【 】 1 . 2 .
  4. NLP Service › Obtain representation of text via deep neural

    network › Pre-trained language model can generate representation vectors with basic semantic meaning for each token of text › Ex: BERT, ELECTRA › Fine tune model based on pre-trained model for the target downstream task › Ex: sentiment classification, question answering Machine Learning Method Natural Language Processing › Process and parse natural language such as text to understand the semantic meaning
  5. NLP Service Multi-label Classifier › Categorize text content into multi-labels

    from a customized set of labels n ELECTRA Dense Layer Sigmoid p1 p2 pn …
  6. NLP Service Result › Accuracy: 0.9 › F1 Score: 0.89

    Ad Content › 3 labels for main reject reasons , , › Each content can have 0 ~ 3 labels › Data count: 6429 › Approved: 3000 › Rejected: 3429 【蜂毒牙膏】修復牙⿒ 美⽩去⼝臭去黃
  7. SmartText - A Self-Service NLP Platform Help users create their

    own NLP models and integrate the models into application through APIs without coding Classifier Multilabel Classifier NER Duplication Detector LINE TODAY LINE VOOM LINE SHOPPING LINE FactChecker No Code
  8. Machine Learning Pipeline Data Preparation Model Development Model Validation Scoping

    Deployment › Target of task › Application › Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration
  9. How SmartText Works Deployment › Build prediction API with BentoML

    › Package API as docker image and push it to Harbor › Deploy the API image on VKS cluster and create DNS for the API Model Development › Provide 7 NLP services with best model and setting › Classifier, Multi-label Classifier, Duplication Detector, Key Phrase Extraction, Related Search, Tokenizer, NER › Train on IU k8s clusters with GPU › Apply metrics for different models Model Validation
  10. Package Image How SmartText Works Build Prediction API Model Training

    & Validation Deploy Image & Create DNS Automation › Set the following flow as Airflow DAG › 1 DAG for 1 NLP service
  11. SmartText Portal A Web System for User to Use Smart

    Text › Build your own NLP model › Upload data, train model, try result Service Domain › A space for user to use NLP services based on their application › Each domain provides 7 NLP services
  12. SmartText Portal Subtitle › Upload CSV files with specific format

    › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
  13. SmartText Portal Subtitle › Upload CSV files with specific format

    › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
  14. SmartText Portal Subtitle › Upload CSV files with specific format

    › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
  15. SmartText Portal Subtitle › Upload CSV files with specific format

    › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
  16. Machine Learning Pipeline › Target of task › Application ›

    Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration Data Preparation Model Development Model Validation Scoping Deployment
  17. EDA Plot › Multi-label Classifier › Text length histogram ›

    Category distribution › Category count distribution › Category correlation Exploratory Data Analysis › Use visualization and basic statistics to get an overview of the data › Propose › Know the information and the structure of the data › Check the outlier or unusual value › Find out correlation between the data
  18. XAI › A python package used to explain the output

    of any machine learning model › Based on the classic Shapley values from game theory SHAP (SHapley Additive exPlanations) Explainable AI › The AI whose decisions or predictions can be understood by human › Benefit › Improve user experience by helping users trust that AI is making good decisions › Figure out the bias of the AI by observing the explanation of AI’s decisions
  19. XAI › Brown, Sally, Cony build a team to beat

    monsters in LINE GAME › They beat 100 monsters in 1 hour and earn 10000 coins as award › Suppose that in 1 hour: › Brown, Sally, Cony can beat 35, 15, 10 monsters respectively › Brown and Sally can beat 70 monsters together; Brown and Cony can beat 60 monsters together; Sally and Cony can beat 40 monsters together › How to split the award ? Example Shapley Value › A solution concept in cooperative game theory, introduced by Lloyd Shapley in 1951 › Used to fairly distribute gains and costs to several actors working in coalition › Calculate the average of all marginal contributions to all possible coalitions
  20. XAI Brown Sally Cony B S C 35 35 30

    B C S 35 40 25 S C B 60 15 25 S B C 55 15 30 C B S 50 40 10 C S B 60 30 10 Contribution 49.16667 29.16667 21.66667 70 - 35 100 - 70
  21. Summary › Get the overview of data by means of

    statistics and visualization before training EDA XAI › Explain why AI makes the decisions and further convince users or reveal the bias SmartText › A self-service NLP platform that helps users create models and use them through API
  22. Future Work › Monitor the usage and application of prediction

    API › Provide new data when using prediction API Monitoring and Relabeling Data Versioning › Increase flexibility of the data for model training › Realize the concept of data drift