Natural Language Processing • Information Retrieval • LINE CLOVA • Japanese NLU system • HyperCLOVA • Japanese Corpus / Evaluation • OSS: Main Contributor of NEologd project • mecab-ipadic-NEologd
are interested in natural language processing. And then there is the part for NLP professionals. NLP means Natural Language Processing. Omit from this session is Detailed information related to the following. - Building language models - Tuning methods for language models See below for more information. - https://arxiv.org/abs/2109.04650 * 40-minute session Please enjoy listening to it over a cup of coffee or something ! * What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers, Boseop Kim et.al, EMNLP 2021
To Text Text To Speech Response text Client App Dialog App HyperCLOVA Result text Query text ɾɾɾ Knowledge Base Search Voice Hello. Sounds Long time, no see. Large-scale Language Model
Text Text To Speech Response text Client App Dialog App HyperCLOVA 㲔Ç㲋 Result text Query text ɾɾɾ Knowledge Base Search Sounds Voice Long time, no see. Hello. Large-scale Language Model
unsupervised autoregressive language model - Autoregressive language model … - is capable of calculating probability distributions - provides maximum likelihood estimation of parameters for a sample - can generate future text based on a text up to the present - A model that gives the probability of a certain sequence of words - E.g. P(It’s sunny today) > P(Sunny is today)
any data of our conversation service. - All messages on LINE - All posts on OpenChat We maintain this corpus with the utmost consideration for the rights held by various customers. - Add versatility to this corpus - Make a subset of this corpus available for use outside of LINE
LINE Open-chat is used to build our LMs - Developed based on a corpus built for training the BERT models after 2019 - Used crawled data for LINE search - Eliminated data that can be easily extracted as "non-public personal information" - Covered important sites for learning Japanese expressions - Purchased and used of external content after resolving rights issues !
for HyperCLOVA e.t.c. - Few-shot: Give “a description of a task” and “some demonstration cases” - One-shot: Give “a description of a task” and “a demonstration case” - Zero-shot: Give “a description of a task” only - Pros: Possibility to solve a task from brief instructions or short examples - Cons: Possibility of not reaching the performance of a SOTA model achieved by fine-tuning - Method for BERT e.t.c. - Fine-tuning: Supervised learning on a dataset of a target task based on a general- purpose pre-trained model - Pros: Excellent performance in benchmarks - Cons: Need to learn for each target task / Possible loss of generalization ability
TextField Prompt Task outline Output Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
TextField Prompt Task outline Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
TextField Prompt Task outline Output Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
TextField Prompt Task outline Output Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
TextField Prompt Task outline Title(description of task) Another information Samples Query (In some cases, a next output is given as a suffix after inference) Shot Previous output …. Output
Θ͔͍ͬͯͳ͍ͷͰɺগ͠ϛεςϦΞεͰӵΈղ͘͜ͱʹϩϚϯ͕͋Δആ۟Ͱ͢ɻʮؓ͞ʯͱʮઊͷʯͱ͍ ͏Ұݟໃ६͢Δදݱͷҙਤ͕ಛʹߟ͑ͤ͞ΒΕ·͢Ͷɻ • 065ؓ͞ؠʹ͠Έೖઊͷ • */݄ͷॵ͍ͷޕޙͷձٞʹͱͯେͳγεςϜ։ൃͷΛ͠Α͏ͱ͍ͯ͠Δ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ ୯७ͳ۟Ͱ͋Γ·͕͢ɺαϥϦʔϚϯͳΤϯδχΞ͕༷ʑͳݒ೦͕͋ΔதͰɺͦΕͰ͜ͷٕज़Λܗʹ͢Δ ͜ͱͰਓೳٕज़ͷະདྷΛ։͖͍ͨͱ͍͏رΛ͜Ίͨؾ࣋ͪΛදݱ͍ͯ͠Δɻ • 065݄ͷ͍࠭ʹຒΊͨΔΘ͕ࠢ • ˡ8JUIBMJUUMFFEJUJOH JUCFDPNFTBIBJLV FH͖݄࠭ʹຒΊͨΔΘ͕ࠢ Response to Individual Tasks with Task Outline and Few-shot Example: Support for writing Haiku with 2-shots
many cases, a advertising description is transcribed by a salesperson - In this demo, the parameters of HyperCLOVA Studio(Play-ground) are adjusted to generate an attractive description - From a food product title and a summary, HyperCLOVA Studio generate an attractive description to advertise it Document generation using Play-ground E.g. Product summary to the description
/ ு ࡎɺຯྉ(ఱ) ◦༰ྔ: 60g આ໌: ΄ΜͷΓࢎຯ ΄ΜͷΓࢎຯ͕ޮ͍ͨΧχ෩ຯͷ ͤΜ͍Ͱ͢ɻޱͷதʹೖΕΔͱ ϗϩϗϩͱ่Ε͍ͯ͘৯ײͨ· Γ·ͤΜɻ͓ञͷ͓ͭ·Έ͓ ͭʹͲ͏ͧɻ Base Prefix Output → ͷ֓ཁ͔ΒࢥΘͣങ͍ͨ͘ͳΔઆ໌จΛੜ͢Δ ໊ۤʑ͍͠γϣίϥΧΧΦ ֓ཁ˓໊শνϣίϨʔτ˓ݪࡐྉ໊ΧΧΦϚε ࠃ ɺίίΞύμʔɺ࠭ɺίίΞόλʔೕԽࡎɺ߳ྉ˓༰ྔH˓อ ଘํ๏ˆҎԼͷྫྷ҉ॴͰอଘ͍ͯͩ͘͠͞ɻ આ໌݈߁ͷ͜ͱ͚ͩΛߟ͑ͨΧΧΦϙϦϑΣϊʔϧɻѹతͳڧ͍ۤຯͱඓΛ؏͘ͳ߳ΓͷԞʹ΄ͷ͔ͳຯΛײ͡ΒΕ·͢ɻ͠ ͔Hͱେ༰ྔͰձͷേήʔϜʹϐολϦ One-shot => Increased the Temperature (randomness) and lowered the Repetition penalty (for control of repetition) to make the text contain the appeal
JP Model for 4 tasks Annotations were made for all task/model combinations Subjective evaluation by the same 5 annotators Each session is N round-trip conversational pairs The user receives a list of N topics for evaluation Each session consumes one vocabulary item from the list Conducted in Play-ground 4. Free chatting 3. Reacting to user sentiment on a topic 2. Tracking different multiple topics 1. Understanding of basic vocabulary
the goal Natural response Q: Was it a natural reaction? Are there any breakdowns or inconsistencies in the history of the conversation? Following a topic Q: Did it stay on topic? Did it lose track of the topic (in this case, did it lose track of what it was being asked about? Was it able to switch topics (in this case, was it able to pull back to the previous question? Providing a topic or asking a question Q: Did it provide a topic? Was it able to get the speaker to talk during the answer (most likely not)? Achievement of goals Q: Did it achieve your objective? Common evaluation criteria for all tasks !
word meanings (Level 1) and emotions (Level 2)? 6.7B 13B 39B Natural response 0.55 0.66 0.98 Following a topic 0.63 0.84 0.98 Providing a topic or asking a question 0.00 0.01 0.00 Achievement of goals 0.55 0.68 0.84
B during a conversation? 6.7B 13B 39B Natural response 0.66 0.53 0.91 Following a topic 0.71 0.61 0.95 Providing a topic or asking a question 0.04 0.01 0.02 Achievement of goals 0.66 0.55 0.91 2. Tracking different multiple topics
A sentiment B ͕Μͬͯཉ͍͠ ࣙΊͯཉ͍͠ ৽ܕίϩφΠϧε ؤுΖ͏ ෆ҆ͩ Πϯόϯυ ͬͯ͘Δ Βͳ͍ ৽ܕίϩφϫΫνϯ ͱ͏ ͍ͭʹͳΔ YouTuber Γ͍ͨ Γͨ͘ͳ͍ େ୩ᠳฏ ׆༂ͯ͠ཉ͍͠ ࡾৼͯ͠ཉ͍͠ AR(֦ுݱ࣮) ໘ന͍ ͖ͨ ߴྸԽࣾձ େৎ ৺ ւ֎ཱྀߦ ߦ͖͍ͨ ߦ͖ͨ͘ͳ͍ ిؾࣗಈं Γ͍ͨ Γͨ͘ͳ͍ ϦχΞதԝ৽װઢ Γ͍ͨ Γͨ͘ͳ͍ ToDO: Have a 15 back and forth conversation about the Topic. Speak with the feeling of sentiment A at first, then sentiment B.
when he or she was feeling sentiment A about the topic? 3. Reacting to user sentiment on a topic 6.7B 13B 39B Natural response 0.69 0.45 0.90 Following a topic 0.74 0.52 0.95 Providing a topic or asking a question 0.04 0.02 0.03 Achievement of goals 0.68 0.46 0.90
topic, could the system disagree? 3. Reacting to user sentiment on a topic 6.7B 13B 39B Natural response 0.61 0.40 0.87 Following a topic 0.67 0.45 0.93 Providing a topic or asking a question 0.09 0.02 0.03 Achievement of goals 0.46 0.36 0.50
basic vocabulary 2. Tracking different multiple topics 3. Reacting to positive sentiment on a topic 3. Reacting to negative sentiment on a topic 4. Free chatting Natural response 0.978 0.908 0.908 0.872 0.925 Following a topic 0.984 0.952 0.951 0.930 0.935 Providing a topic or asking a question 0.003 0.023 0.033 0.035 0.086 Achievement of goals 0.835 0.907 0.899 0.505 -
basic vocabulary 2. Tracking different multiple topics 3. Reacting to positive sentiment on a topic 3. Reacting to negative sentiment on a topic 4. Free chatting Natural response 0.978 0.908 0.908 0.872 0.925 Following a topic 0.984 0.952 0.951 0.930 0.935 Providing a topic or asking a question 0.003 0.023 0.033 0.035 0.086 Achievement of goals 0.835 0.907 0.899 0.505 -
use - Hiragana - Katakana - Kanji - Romaji - e.t.c. to write a single document Large amount of essential vocabulary Required for daily conversation - over 8,000 words Need to know many of - Homonyms - Honorifics - Dialects Omission of words Japanese speakers may omit following words in a document - Subject - Object Omitted words may not be uniquely inferred
research institutions and companies in the future Providing HyperCLOVA’s APIs to universities, research institutes, and companies Collaborating to dramatically improve system performance and detect and eliminate bias in language models with - Osaka University Graduate School - Tokyo Metropolitan University - Waseda University
following technologies need to be developed - Improving the content bias of a corpus and its notation - Ensuring the truthfulness and security of a output text Implementation of AI Ethics Various ethical considerations need to be taken into account for input and output texts - Toxicity - Sexual - Offensive - Profane - Intimidating - Attack on identity Automation of intrinsic evaluation Need metrics that can be applied to dynamic text generation results - Accuracy of topical content - Consistency of generated text - Determination of achievement of objectives
Few-shots were created by randomly extracting a context from the RCQA possible only dev-set for each inference Create Few-shot with a context, a question text and an answer - If correct answer is contained and easily extracted from inference result, we judged it is correct TASK: RCQA* possible only - Removed unanswerable questions from dataset of the normal RCQA task * ղՄೳੑ͖ಡղσʔληοτ: http://www.cl.ecei.tohoku.ac.jp/rcqa/
questions from the normal RCQA task) - It is possible that BERT can achieve higher results with fine-tuning on specific tasks - HyperCLOVA can achieve the same level of performance with Prompting and rough parameter search test acc test F1 memo HyperCLOVA 85.03 89.95 JP 39B 2-shots, temperature=0.4, top_p= 0.5 BERT-jp-large 86.68 90.49 Using subset of LINE LM corpus
conversation and topic tracking The truth of what it says should be verified before it responds Conversation is smooth and the meaning of what is said is understood Some ambiguous responses e.g. Temperature of hot water during washing Effect of data bias e.g. unsettled but became female The consistency of its persona is a bit suspect Start with NO character set
If you're on a budget … - The history of NLP is strongly linked to the development of AI-related technologies - LINE wants to move in the direction of building our own models and having customers use them Large-scale general-purpose LMs DNN Traditional only ML Rule only Small LM only
those models to make inferences - the biggest challenge of all • Fine-tuning and other parameter-efficient transfer learning methods, as well as compact models • Responding to new topics/events that have arisen since a model was built • Implementing AI Ethics • Filtering according to the application and specifying the reason • Building a Web corpus • Removing duplicate data • Realization of accountability for each entry used • Responding to deletion requests on a URL/ID basis • Detection and anonymization of personal information
challenge together at LINE LINE's various services needs essential improvements using NLP technology ! • Large-scale general-purpose LMs • “High Road” NLP • Information Retrieval • String processing • Data creation • Evaluation tasks, e.t.c.
models “other than” HyperCLOVA Performance target: LINE's LMs for OSS > other OSS LMs Would like to update a few times a year, if possible !! Train using a subset of the corpus for HyperCLOVA(LINE LM Corpus) !
- Reported on large-scale general-purpose LMs and Prompting, using several topics as examples - There are cases where surprisingly high quality can be achieved - There are issues that cannot be solved ad hoc - On LINE, we can work on all layers of NLP R&D for not only HyperCLOVA - Please stay tuned for a next NLP information by LINE