Content Moderation › The task requires lots of manpower › Number of Ad is increasing Problem Statement Case Introduction › The reviewer has to review all the Ad contents, decide if needs to reject, and provide the rejected reason 【 】 1 .
NLP Service › Obtain representation of text via deep neural network › Pre-trained language model can generate representation vectors with basic semantic meaning for each token of text › Ex: BERT, ELECTRA › Fine tune model based on pre-trained model for the target downstream task › Ex: sentiment classification, question answering Machine Learning Method Natural Language Processing › Process and parse natural language such as text to understand the semantic meaning
NLP Service Multi-label Classifier › Categorize text content into multi-labels from a customized set of labels n ELECTRA Dense Layer Sigmoid p1 p2 pn …
SmartText - A Self-Service NLP Platform Help users create their own NLP models and integrate the models into application through APIs without coding Classifier Multilabel Classifier NER Duplication Detector LINE TODAY LINE VOOM LINE SHOPPING LINE FactChecker No Code
Machine Learning Pipeline Data Preparation Model Development Model Validation Scoping Deployment › Target of task › Application › Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration
How SmartText Works Deployment › Build prediction API with BentoML › Package API as docker image and push it to Harbor › Deploy the API image on VKS cluster and create DNS for the API Model Development › Provide 7 NLP services with best model and setting › Classifier, Multi-label Classifier, Duplication Detector, Key Phrase Extraction, Related Search, Tokenizer, NER › Train on IU k8s clusters with GPU › Apply metrics for different models Model Validation
Package Image How SmartText Works Build Prediction API Model Training & Validation Deploy Image & Create DNS Automation › Set the following flow as Airflow DAG › 1 DAG for 1 NLP service
SmartText Portal A Web System for User to Use Smart Text › Build your own NLP model › Upload data, train model, try result Service Domain › A space for user to use NLP services based on their application › Each domain provides 7 NLP services
SmartText Portal Subtitle › Upload CSV files with specific format › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
SmartText Portal Subtitle › Upload CSV files with specific format › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
SmartText Portal Subtitle › Upload CSV files with specific format › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
SmartText Portal Subtitle › Upload CSV files with specific format › Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
Machine Learning Pipeline › Target of task › Application › Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration Data Preparation Model Development Model Validation Scoping Deployment
EDA Plot › Multi-label Classifier › Text length histogram › Category distribution › Category count distribution › Category correlation Exploratory Data Analysis › Use visualization and basic statistics to get an overview of the data › Propose › Know the information and the structure of the data › Check the outlier or unusual value › Find out correlation between the data
XAI › A python package used to explain the output of any machine learning model › Based on the classic Shapley values from game theory SHAP (SHapley Additive exPlanations) Explainable AI › The AI whose decisions or predictions can be understood by human › Benefit › Improve user experience by helping users trust that AI is making good decisions › Figure out the bias of the AI by observing the explanation of AI’s decisions
XAI › Brown, Sally, Cony build a team to beat monsters in LINE GAME › They beat 100 monsters in 1 hour and earn 10000 coins as award › Suppose that in 1 hour: › Brown, Sally, Cony can beat 35, 15, 10 monsters respectively › Brown and Sally can beat 70 monsters together; Brown and Cony can beat 60 monsters together; Sally and Cony can beat 40 monsters together › How to split the award ? Example Shapley Value › A solution concept in cooperative game theory, introduced by Lloyd Shapley in 1951 › Used to fairly distribute gains and costs to several actors working in coalition › Calculate the average of all marginal contributions to all possible coalitions
XAI Brown Sally Cony B S C 35 35 30 B C S 35 40 25 S C B 60 15 25 S B C 55 15 30 C B S 50 40 10 C S B 60 30 10 Contribution 49.16667 29.16667 21.66667 70 - 35 100 - 70
Summary › Get the overview of data by means of statistics and visualization before training EDA XAI › Explain why AI makes the decisions and further convince users or reveal the bias SmartText › A self-service NLP platform that helps users create models and use them through API
Future Work › Monitor the usage and application of prediction API › Provide new data when using prediction API Monitoring and Relabeling Data Versioning › Increase flexibility of the data for model training › Realize the concept of data drift