Review AI from LINE EC NLP

Slide 1

Slide 1 text

Review AI from LINE EC NLP EC Data Lead Vila Lin

Slide 2

Slide 2 text

01 02 03 04 NLP E-commerce Past To Present Advance Brief Introduction CONTENT 05 Takeaway

Slide 3

Slide 3 text

SECTION 01 Brief Introduction

Slide 4

Slide 4 text

Vila Lin LINE TW, EC Data Lead • Education: NTHU MS • Specialty: • Machine Learning • Neuron Network • Design of Algorithm • Statistics

Slide 5

Slide 5 text

2 model concepts in AI Generative • Probability Distribution Discriminative • Decision Boundaries

Slide 6

Slide 6 text

NLP Evolution Neuron Network Traditional Approach Pre-Trained Model Prompt Engineeing SVM TF-IDF LDA Word2Vec CNN LSTM BERT GPT T5 ChatGPT LLaMA Claude Gemini

Slide 7

Slide 7 text

SECTION 02 NLP in E-Commerce

Slide 8

Slide 8 text

NLP in E-Commerce Search Query Product Article Brand Store Product Specification Segmentation & Embedding NER Search Intention Search Suggestion Search Spell Checker Query Understanding Product Category Store Vendor Type Classification Article Advertisement NLG

Slide 9

Slide 9 text

SECTION 03 Past to Present

Slide 10

Slide 10 text

EC NLP Neuron Network Traditional Approach Pre-Trained Model Trie BiLSTM BERT

Slide 11

Slide 11 text

Trie Trie (Dictionary Tree) • Dictionary • Dynamic programming root

Slide 12

Slide 12 text

Trie HMM with states BMSE: Begin Middle Single End Viterbi Algorithm Choose path with maximum probability

Slide 13

Slide 13 text

Quality • Word coverage • Appearance frequency • Design preprocessing • Collect dictionaries Hard to capture complex/non-linear relationship Computing cost increase with large datasets/complex states Problem HMM

Slide 14

Slide 14 text

BiLSTM Architecture of BiLSTM Couple of LSTM (Long Short-Term Memory) Cross-BiLSTM-CNN Corpora Embedding NLP task Peng-Hsuan Li, Tsu-Jui Fu, WeiYun Ma. 2019. Remedying BiLSTM-CNN Deficiency in Modeling Cross-Context for NER.

Slide 15

Slide 15 text

No pre-trained with large corpora Need more task-specific data Problem OOV words Fixed vocabulary is constraint on segmentation with new/rare word Parallelize limitation Sequential attribute of LSTM cause hard to parallelize computation

Slide 16

Slide 16 text

BERT Architecture of BERT • BERT Base: 12 layers (110 M parameters) • BERT Large: 24 layers (340M parameters) sentence 1 sentence 2 Encoder only Attention • Multi-Head Attention • Self-Attention tag tag Self-Attention Multi-Head Attention

Slide 17

Slide 17 text

BERT Transfer Learning • Pre-Training • Fine-Tuning

Slide 18

Slide 18 text

Need significant size of task-specific data Data augmentation Problem Semi-supervise learning Active learning Knowledge distillation External knowledge

Slide 19

Slide 19 text

SECTION 03 Advance

Slide 20

Slide 20 text

Discriminative vs. Generative AI Input data Discriminative Model learn relationship Input tag Output data Discriminative AI Input data Generative Model learn unstructured content Input unstructured data New data Generative AI

Slide 21

Slide 21 text

NLP Evolution Prompt Engineering GPT Hybrid Neuron Network Traditional Approach Pre-Trained Model Trie BiLSTM BERT

Slide 22

Slide 22 text

BERT+GPT BERT GPT Model type Encoder only Decoder only Pre-Training MLM AR Direction Bidirectional Unidirectional Fine-tuning Task specific layer added on pre-trained model Task specific prompting with one-shot/few-shot adaption Use case Word segmentation Classification NER Text generation Summarization

Slide 23

Slide 23 text

Generate Dataset by GPT Initial data Raw data Small labeled exist data Design data format and prompting GPT Synthetic data Strong contextual understand Zero-shot/Few-shot learning External knowledge Manual review Filter low-quality data

Slide 24

Slide 24 text

Fine-Tune BERT Synthetic data Auto labeling Fine-tuning data Split dataset Shuffling Fine-tuning Evaluation BERT

Slide 25

Slide 25 text

LLM-Driven Fine-Tuning BERT Initial data Raw data GPT Synthetic data Optimize training parameters Fine-tuning data BERT Generate more data Refine prompt

Slide 26

Slide 26 text

SECTION 05 Takeaway

Slide 27

Slide 27 text

• Every NLP model is designed for several purpose. It’s like most of mathematical questions - not only one solution. • We’re able to cook appropriate solution with our goal and resource if we have gotten core concept about NLP models and frequent application. Takeaway

Slide 28

Slide 28 text

No content