OOV words Fixed vocabulary is constraint on segmentation with new/rare word Parallelize limitation Sequential attribute of LSTM cause hard to parallelize computation
MLM AR Direction Bidirectional Unidirectional Fine-tuning Task specific layer added on pre-trained model Task specific prompting with one-shot/few-shot adaption Use case Word segmentation Classification NER Text generation Summarization
exist data Design data format and prompting GPT Synthetic data Strong contextual understand Zero-shot/Few-shot learning External knowledge Manual review Filter low-quality data
like most of mathematical questions - not only one solution. • We’re able to cook appropriate solution with our goal and resource if we have gotten core concept about NLP models and frequent application. Takeaway