Opening the Black Box of NLP Models: A Self-Service NLP Platform for Content Moderation @ TECHPULSE 2023

Agenda › Content Moderation using NLP Service › What is
SmartText › How can User Adjust the Model on NLP Platform › Conclusion

Content Moderation using NLP Service

Content Moderation › The task requires lots of manpower ›
Number of Ad is increasing Problem Statement Case Introduction › The reviewer has to review all the Ad contents, decide if needs to reject, and provide the rejected reason 【】 1 . 2 .

NLP Service › Obtain representation of text via deep neural
network › Pre-trained language model can generate representation vectors with basic semantic meaning for each token of text › Ex: BERT, ELECTRA › Fine tune model based on pre-trained model for the target downstream task › Ex: sentiment classification, question answering Machine Learning Method Natural Language Processing › Process and parse natural language such as text to understand the semantic meaning

NLP Service Multi-label Classifier › Categorize text content into multi-labels
from a customized set of labels n ELECTRA Dense Layer Sigmoid p1 p2 pn …

NLP Service Result › Accuracy: 0.9 › F1 Score: 0.89
Ad Content › 3 labels for main reject reasons , , › Each content can have 0 ~ 3 labels › Data count: 6429 › Approved: 3000 › Rejected: 3429 【蜂毒牙膏】修復牙⿒美⽩去⼝臭去黃

What is SmartText

SmartText - A Self-Service NLP Platform Help users create their
own NLP models and integrate the models into application through APIs without coding Classifier Multilabel Classifier NER Duplication Detector LINE TODAY LINE VOOM LINE SHOPPING LINE FactChecker No Code

Machine Learning Pipeline Data Preparation Model Development Model Validation Scoping
Deployment › Target of task › Application › Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration

How SmartText Works Deployment › Build prediction API with BentoML
› Package API as docker image and push it to Harbor › Deploy the API image on VKS cluster and create DNS for the API Model Development › Provide 7 NLP services with best model and setting › Classifier, Multi-label Classifier, Duplication Detector, Key Phrase Extraction, Related Search, Tokenizer, NER › Train on IU k8s clusters with GPU › Apply metrics for different models Model Validation

Package Image How SmartText Works Build Prediction API Model Training
& Validation Deploy Image & Create DNS Automation › Set the following flow as Airflow DAG › 1 DAG for 1 NLP service

SmartText Portal A Web System for User to Use Smart
Text › Build your own NLP model › Upload data, train model, try result Service Domain › A space for user to use NLP services based on their application › Each domain provides 7 NLP services

SmartText Portal Subtitle › Upload CSV files with specific format
› Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result

Machine Learning Pipeline › Target of task › Application ›
Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration Data Preparation Model Development Model Validation Scoping Deployment

How can User Adjust the Model on NLP Platform

EDA Plot › Multi-label Classifier › Text length histogram ›
Category distribution › Category count distribution › Category correlation Exploratory Data Analysis › Use visualization and basic statistics to get an overview of the data › Propose › Know the information and the structure of the data › Check the outlier or unusual value › Find out correlation between the data

EDA Category Distribution Text Length Histogram 涉及誇⼤涉及療效最⾼級

EDA Category Correlation Category Count Distribution 涉及誇⼤涉及療效最⾼級涉及誇⼤
涉及療效最⾼級

EDA SmartText Portal - Upload Data

XAI › A python package used to explain the output
of any machine learning model › Based on the classic Shapley values from game theory SHAP (SHapley Additive exPlanations) Explainable AI › The AI whose decisions or predictions can be understood by human › Benefit › Improve user experience by helping users trust that AI is making good decisions › Figure out the bias of the AI by observing the explanation of AI’s decisions

XAI › Brown, Sally, Cony build a team to beat
monsters in LINE GAME › They beat 100 monsters in 1 hour and earn 10000 coins as award › Suppose that in 1 hour: › Brown, Sally, Cony can beat 35, 15, 10 monsters respectively › Brown and Sally can beat 70 monsters together; Brown and Cony can beat 60 monsters together; Sally and Cony can beat 40 monsters together › How to split the award ? Example Shapley Value › A solution concept in cooperative game theory, introduced by Lloyd Shapley in 1951 › Used to fairly distribute gains and costs to several actors working in coalition › Calculate the average of all marginal contributions to all possible coalitions

XAI Brown Sally Cony B S C 35 35 30
B C S 35 40 25 S C B 60 15 25 S B C 55 15 30 C B S 50 40 10 C S B 60 30 10 Contribution 49.16667 29.16667 21.66667 70 - 35 100 - 70

XAI SHAP on Natural Language Model (Transformers)

XAI 【】

XAI SmartText Portal - Try Result

Conclusion

Summary › Get the overview of data by means of
statistics and visualization before training EDA XAI › Explain why AI makes the decisions and further convince users or reveal the bias SmartText › A self-service NLP platform that helps users create models and use them through API

Future Work › Monitor the usage and application of prediction
API › Provide new data when using prediction API Monitoring and Relabeling Data Versioning › Increase flexibility of the data for model training › Realize the concept of data drift

Thank you

Opening the Black Box of NLP Models: A Self-Ser...

Opening the Black Box of NLP Models: A Self-Service NLP Platform for Content Moderation @ TECHPULSE 2023

LINE Developers Taiwan PRO

More Decks by LINE Developers Taiwan

Other Decks in Technology

Featured

Transcript

1

Agenda › Content Moderation using NLP Service › What is

Content Moderation using NLP Service

Content Moderation › The task requires lots of manpower ›

NLP Service › Obtain representation of text via deep neural

NLP Service Multi-label Classifier › Categorize text content into multi-labels

NLP Service Result › Accuracy: 0.9 › F1 Score: 0.89

What is SmartText

SmartText - A Self-Service NLP Platform Help users create their

Machine Learning Pipeline Data Preparation Model Development Model Validation Scoping

How SmartText Works Deployment › Build prediction API with BentoML

Package Image How SmartText Works Build Prediction API Model Training

SmartText Portal A Web System for User to Use Smart

SmartText Portal Subtitle › Upload CSV files with specific format

SmartText Portal Subtitle › Upload CSV files with specific format

SmartText Portal Subtitle › Upload CSV files with specific format

SmartText Portal Subtitle › Upload CSV files with specific format

Machine Learning Pipeline › Target of task › Application ›

How can User Adjust the Model on NLP Platform

EDA Plot › Multi-label Classifier › Text length histogram ›

EDA Category Distribution Text Length Histogram 涉及誇⼤涉及療效最⾼級

EDA Category Correlation Category Count Distribution 涉及誇⼤涉及療效最⾼級涉及誇⼤

EDA SmartText Portal - Upload Data

XAI › A python package used to explain the output

XAI › Brown, Sally, Cony build a team to beat

XAI Brown Sally Cony B S C 35 35 30

XAI SHAP on Natural Language Model (Transformers)

XAI 【】

XAI SmartText Portal - Try Result

Conclusion

Summary › Get the overview of data by means of

Future Work › Monitor the usage and application of prediction

Thank you