Number of Ad is increasing Problem Statement Case Introduction › The reviewer has to review all the Ad contents, decide if needs to reject, and provide the rejected reason 【 】 1 . 2 .
network › Pre-trained language model can generate representation vectors with basic semantic meaning for each token of text › Ex: BERT, ELECTRA › Fine tune model based on pre-trained model for the target downstream task › Ex: sentiment classification, question answering Machine Learning Method Natural Language Processing › Process and parse natural language such as text to understand the semantic meaning
Ad Content › 3 labels for main reject reasons , , › Each content can have 0 ~ 3 labels › Data count: 6429 › Approved: 3000 › Rejected: 3429 【蜂毒牙膏】修復牙⿒ 美⽩去⼝臭去黃
own NLP models and integrate the models into application through APIs without coding Classifier Multilabel Classifier NER Duplication Detector LINE TODAY LINE VOOM LINE SHOPPING LINE FactChecker No Code
Deployment › Target of task › Application › Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration
› Package API as docker image and push it to Harbor › Deploy the API image on VKS cluster and create DNS for the API Model Development › Provide 7 NLP services with best model and setting › Classifier, Multi-label Classifier, Duplication Detector, Key Phrase Extraction, Related Search, Tokenizer, NER › Train on IU k8s clusters with GPU › Apply metrics for different models Model Validation
Text › Build your own NLP model › Upload data, train model, try result Service Domain › A space for user to use NLP services based on their application › Each domain provides 7 NLP services
› Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
› Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
› Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
› Click delete button will delete all the data Upload Data › Start a new build or deploy previous build › Set training cronjob › Test the prediction API of the active model Build Try Result
Constraint › Collection › Cleaning › Which model › Training › Tuning › Baseline › Score › Expectation › How to use › Service integration Data Preparation Model Development Model Validation Scoping Deployment
Category distribution › Category count distribution › Category correlation Exploratory Data Analysis › Use visualization and basic statistics to get an overview of the data › Propose › Know the information and the structure of the data › Check the outlier or unusual value › Find out correlation between the data
of any machine learning model › Based on the classic Shapley values from game theory SHAP (SHapley Additive exPlanations) Explainable AI › The AI whose decisions or predictions can be understood by human › Benefit › Improve user experience by helping users trust that AI is making good decisions › Figure out the bias of the AI by observing the explanation of AI’s decisions
monsters in LINE GAME › They beat 100 monsters in 1 hour and earn 10000 coins as award › Suppose that in 1 hour: › Brown, Sally, Cony can beat 35, 15, 10 monsters respectively › Brown and Sally can beat 70 monsters together; Brown and Cony can beat 60 monsters together; Sally and Cony can beat 40 monsters together › How to split the award ? Example Shapley Value › A solution concept in cooperative game theory, introduced by Lloyd Shapley in 1951 › Used to fairly distribute gains and costs to several actors working in coalition › Calculate the average of all marginal contributions to all possible coalitions
statistics and visualization before training EDA XAI › Explain why AI makes the decisions and further convince users or reveal the bias SmartText › A self-service NLP platform that helps users create models and use them through API
API › Provide new data when using prediction API Monitoring and Relabeling Data Versioning › Increase flexibility of the data for model training › Realize the concept of data drift