Principal Software Engineer / Manager - Japanese Corpus / Evaluation / Application - mecab-ipadic-NEologd HyperCLOVA OSS: Main Contributor of NEologd ࢁ࡚ ఱ Takato Yamazaki Takato Yamazaki (@y_takaten) - Dialog Systems - HyperCLOVA Applications Software Engineer - Aizawa Lab. / University of Tokyo - “Phrase-Level Action Reinforcement Learning for Neural Dialog Response Generation”, Findings of ACL 2021 - Young Researcher Award for Excellent Research at 11th Dialogue System Symposium MS Degree in Computer Science
predict the next word from that input Pros - Highly economical in terms of time - High accuracy can be achieved without model tuning if the appropriate information and example (few-shot) can be described Cons - Requires a considerably larger number of parameters than when using a tunable mode - Requires a linguistic background and extensive social experience Two Major Ways of Controlling Output with Foundation Model Prompting Fine-tuning Meta-learning to treat target task information as text input and predict the next word from that input Pros - Highly economical in terms of money - High accuracy can be achieved if sufficient time is available for model training and parameter exploration Cons - Need to have the right amount of properly labeled teacher data for each task - Need to create a model for each task
".ZCFTUTDPSFJT*VTVBMMZHFUBSPVOE #5IBU`TBXFTPNF*`MMOFWFSCFBCMFUPHFUUIBUTDPSF ")J XIBUEPZPVEPJOZPVSGSFFUJNF #*MJLFUPXBUDINPWJFT " 1SPNQU HyperCLOVA is also used with Prompt 1SPNQUUPHFOFSBUFBSFTQPOTFPGUIFEJBMPH
#/JDF8IBU`TZPVSCFTUTDPSF ".ZCFTUTDPSFJT*VTVBMMZHFUBSPVOE #5IBU`TBXFTPNF*`MMOFWFSCFBCMFUPHFUUIBUTDPSF %JBMPH&YBNQMFº/ ")J XIBUEPZPVEPJOZPVSGSFFUJNF #*MJLFUPXBUDINPWJFT " $VSSFOU%JBMPH HyperCLOVA is also used with Prompt *OUIJTDBTF JU`TDBMMFETIPU 1SPNQUUPHFOFSBUFBSFTQPOTFPGUIFEJBMPH
#/JDF8IBU`TZPVSCFTUTDPSF ".ZCFTUTDPSFJT*VTVBMMZHFUBSPVOE #5IBU`TBXFTPNF*`MMOFWFSCFBCMFUPHFUUIBUTDPSF %JBMPH&YBNQMFº/ ")J XIBUEPZPVEPJOZPVSGSFFUJNF #*MJLFUPXBUDINPWJFT " $VSSFOU%JBMPH HyperCLOVA is also used with Prompt 1SPNQUUPHFOFSBUFBSFTQPOTFPGUIFEJBMPH
#/JDF8IBU`TZPVSCFTUTDPSF ".ZCFTUTDPSFJT*VTVBMMZHFUBSPVOE #5IBU`TBXFTPNF*`MMOFWFSCFBCMFUPHFUUIBUTDPSF %JBMPH&YBNQMFº/ ")J XIBUEPZPVEPJOZPVSGSFFUJNF #*MJLFUPXBUDINPWJFT " $VSSFOU%JBMPH HyperCLOVA is also used with Prompt Input the prompt and inference 1SPNQUUPHFOFSBUFBSFTQPOTFPGUIFEJBMPH
")NN *MJLFUPEPCPXMJOH #/JDF8IBU`TZPVSCFTUTDPSF ".ZCFTUTDPSFJT*VTVBMMZHFUBSPVOE #5IBU`TBXFTPNF*`MMOFWFSCFBCMFUPHFUUIBUTDPSF %JBMPH&YBNQMFº/ ")J XIBUEPZPVEPJOZPVSGSFFUJNF #*MJLFUPXBUDINPWJFT " $VSSFOU%JBMPH Generate Input the prompt and inference HyperCLOVA is also used with Prompt 1SPNQUUPHFOFSBUFBSFTQPOTFPGUIFEJBMPH
")NN *MJLFUPEPCPXMJOH #/JDF8IBU`TZPVSCFTUTDPSF ".ZCFTUTDPSFJT*VTVBMMZHFUBSPVOE #5IBU`TBXFTPNF*`MMOFWFSCFBCMFUPHFUUIBUTDPSF %JBMPH&YBNQMFº/ ")J XIBUEPZPVEPJOZPVSGSFFUJNF #*MJLFUPXBUDINPWJFT " $VSSFOU%JBMPH 8IBULJOEPGNPWJFTEPZPVXBUDI (FOFSBUFEXJUI )ZQFS$-07" HyperCLOVA is also used with Prompt Input the prompt and inference Generate 1SPNQUUPHFOFSBUFBSFTQPOTFPGUIFEJBMPH
ZPVEPGPSZPVSMJWJOH $VTUPNFS*BNBNBOBHFSJOBO*5DPNQBOZ 4IPLP RVFTUJPO BIJHIMZEFUBJMFEFQJDDJOFNBUJDDPODFQUBSU$(SFOEFS EJHJUBMQBJOUJOHBSUXPSLEJFTFMQVOLTUFBNJOHIBMGNBOIBMG SPCPU#Z)PLVTBJ,BUTVTIJLB )JSPTIJHF6UBHBXB &JTFO ,FJTBJ ,VOJZPTIJ6UBHBXB ,VOJTBEB6UBHBXB 4IVOTIPV ,BUTVLBXB 4IJHFOPCV:BOBHBXB USFOEJOHPO"SU4UBUJPO TVCUMFNVUFEDJOFNBUJDDPMPST NBEFJO.BZB #MFOEFSBOE 1IPUPTIPQ PDUBOFSFOEFS FYDFMMFOUDPNQPTJUJPO DJOFNBUJD BUNPTQIFSF EZOBNJDESBNBUJDDJOFNBUJDMJHIUJOH QSFDJTF DPSSFDUBOBUPNZ BFTUIFUJD WFSZJOTQJSBUJPOBM BSUIPVTF Output Format Text Output Example Prompt Example Number of people who can agree on an output itself Number of people who receive an output as intended Evaluation Viewpoint Prompt is written differently depending on output modality
ZPVEPGPSZPVSMJWJOH $VTUPNFS*BNBNBOBHFSJOBO*5DPNQBOZ 4IPLP RVFTUJPO BIJHIMZEFUBJMFEFQJDDJOFNBUJDDPODFQUBSU$(SFOEFS EJHJUBMQBJOUJOHBSUXPSLEJFTFMQVOLTUFBNJOHIBMGNBOIBMG SPCPU#Z)PLVTBJ,BUTVTIJLB )JSPTIJHF6UBHBXB &JTFO ,FJTBJ ,VOJZPTIJ6UBHBXB ,VOJTBEB6UBHBXB 4IVOTIPV ,BUTVLBXB 4IJHFOPCV:BOBHBXB USFOEJOHPO"SU4UBUJPO TVCUMFNVUFEDJOFNBUJDDPMPST NBEFJO.BZB #MFOEFSBOE 1IPUPTIPQ PDUBOFSFOEFS FYDFMMFOUDPNQPTJUJPO DJOFNBUJD BUNPTQIFSF EZOBNJDESBNBUJDDJOFNBUJDMJHIUJOH QSFDJTF DPSSFDUBOBUPNZ BFTUIFUJD WFSZJOTQJSBUJPOBM BSUIPVTF Output Format Text Output Example Prompt Example Number of people who can agree on an output itself Number of people who receive an output as intended Evaluation Viewpoint Prompt is written differently depending on output modality
ZPVEPGPSZPVSMJWJOH $VTUPNFS*BNBNBOBHFSJOBO*5DPNQBOZ 4IPLP RVFTUJPO BIJHIMZEFUBJMFEFQJDDJOFNBUJDDPODFQUBSU$(SFOEFS EJHJUBMQBJOUJOHBSUXPSLEJFTFMQVOLTUFBNJOHIBMGNBOIBMG SPCPU#Z)PLVTBJ,BUTVTIJLB )JSPTIJHF6UBHBXB &JTFO ,FJTBJ ,VOJZPTIJ6UBHBXB ,VOJTBEB6UBHBXB 4IVOTIPV ,BUTVLBXB 4IJHFOPCV:BOBHBXB USFOEJOHPO"SU4UBUJPO TVCUMFNVUFEDJOFNBUJDDPMPST NBEFJO.BZB #MFOEFSBOE 1IPUPTIPQ PDUBOFSFOEFS FYDFMMFOUDPNQPTJUJPO DJOFNBUJD BUNPTQIFSF EZOBNJDESBNBUJDDJOFNBUJDMJHIUJOH QSFDJTF DPSSFDUBOBUPNZ BFTUIFUJD WFSZJOTQJSBUJPOBM BSUIPVTF Output Format Text Output Example Prompt Example Number of people who can agree on an output itself Number of people who receive an output as intended Evaluation Viewpoint Prompt is written differently depending on output modality
XIPJTBMXBZTDIFFSGVM BTLTRVFTUJPOTBCPVUUIFDMJFOUT PDDVQBUJPO> "DMJFOUJTBNBOJOIJTMBUFT 4IPLP RVFTUJPO /JDFUPNFFUZPV*BNBUPVSJTUHVJEF8IBUEP ZPVEPGPSZPVSMJWJOH $VTUPNFS*BNBNBOBHFSJOBO*5DPNQBOZ 4IPLP RVFTUJPO BIJHIMZEFUBJMFEFQJDDJOFNBUJDDPODFQUBSU$(SFOEFS EJHJUBMQBJOUJOHBSUXPSLEJFTFMQVOLTUFBNJOHIBMGNBOIBMG SPCPU#Z)PLVTBJ,BUTVTIJLB )JSPTIJHF6UBHBXB &JTFO ,FJTBJ ,VOJZPTIJ6UBHBXB ,VOJTBEB6UBHBXB 4IVOTIPV ,BUTVLBXB 4IJHFOPCV:BOBHBXB USFOEJOHPO"SU4UBUJPO TVCUMFNVUFEDJOFNBUJDDPMPST NBEFJO.BZB #MFOEFSBOE 1IPUPTIPQ PDUBOFSFOEFS FYDFMMFOUDPNQPTJUJPO DJOFNBUJD BUNPTQIFSF EZOBNJDESBNBUJDDJOFNBUJDMJHIUJOH QSFDJTF DPSSFDUBOBUPNZ BFTUIFUJD WFSZJOTQJSBUJPOBM BSUIPVTF Output Format Text Output Example Prompt Example Number of people who can agree on an output itself Number of people who receive an output as intended Evaluation Viewpoint That's wonderful! What do you think is the most important for your management work?
"BOE#BSFDIJUDIBUUJOH #)FZ XIBU`TVQ ")PX`TJUHPJOH "OZUIJOHOFXXJUIZPVMBUFMZ #*IFBSEUIBUUIFOFXUFOUIPVTBOEZFOCJMMXJMMIBWF 4IJCVTBXB&JJDIJPOJU CVUEPZPVLOPXBCPVUIJN " (FOFSBUFEXJUI)ZQFS$-07" 3FTQPOTFGSPNVTFS • The user's response contains a named entity “Shibusawa Eiichi” • And we can detect that “Shibusawa Eiichi” includes in the entries of Wikipedia • For simplicity, we would like to make use of the WIkipedia text after extracting “Shibusawa Eiichi” from the user's response ? "*WFCFFOSFBEJOHBMPUMBUFMZ #5IBUTHSFBU8IBUBSFZPVSFBEJOH "*WFCFFOSFBEJOHCPPLTCZ4PTFLJ/BUTVNF #)FZ *LOPXIJN5IFQFSTPOJOUIF ZFOCJMM SJHIU
What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: ɹɹɹɹ ? Can the training data be accurately retrieved from a model?
What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: I don’t really know him, who’s that? Can the training data be accurately retrieved from a model? We should assume that an accurate knowledge in the foundation model cannot be extracted. No !! Foundation Model is not a search engine !!
was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. Dialog: A: I've been reading a lot lately. B: That's great! What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === Dialog: B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: "VUPHFOFSBUFE,OPXMFEHF A: I've been reading a lot lately. B: That's great! What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: )BOENBEF,OPXMFEHF I don’t really know him, who’s that? (Something better with knowledge…) Copy Copy
CVUEPZPVLOPXBCPVUIJN /BNFE&OUJUZ3FDPHOJUJPO 4IJCVTBXB&JJDIJ %#4FBSDI FH8JLJQFEJB Shibusawa Eiichi, 1st Viscount Shibusawa (March 16, 1840 – November 11, 1931) was a Japanese industrialist use of double-entry accounting, joint- stock corporations and modern note-issuing banks. He founded the first modern bank based on joint stock ownership in Japan. The bank was aptly named The First National Bank (Dai Ichi Kokuritsu Ginkō, now merged into Mizuho Bank) and had the power to issue its own notes. Through this bank, he founded hundreds of other joint stock corporations in Japan. Many of these companies still survive … Search Result
CVUEPZPVLOPXBCPVUIJN Original: Natsume Sōseki (9 February 1867 – 9 December 1916), born Natsume Kin'nosuke, was a Japanese novelist. He is best known around the world for his novels Kokoro, Botchan, I Am a Cat, Kusamakura and his unfinished work Light and Darkness. He was also a scholar of British literature and writer of haiku, kanshi, and fairy tales. From 1984 until 2004, his portrait appeared on the front of the Japanese 1,000 yen note. Summary: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. === Original: Shibusawa Eiichi, 1st Viscount Shibusawa (March 16, 1840 – November 11, 1931) was a Japanese industrialist use of double- entry accounting, joint-stock corporations and modern note-issuing banks… Summary: 4FBSDI3FTVMUPO 4IJCVTBXB&JJDIJ 4VNNBSJ[BUJPO /BNFE&OUJUZ3FDPHOJUJPO 4IJCVTBXB&JJDIJ %#4FBSDI FH8JLJQFEJB
CVUEPZPVLOPXBCPVUIJN Original: Natsume Sōseki (9 February 1867 – 9 December 1916), born Natsume Kin'nosuke, was a Japanese novelist. He is best known around the world for his novels Kokoro, Botchan, I Am a Cat, Kusamakura and his unfinished work Light and Darkness. He was also a scholar of British literature and writer of haiku, kanshi, and fairy tales. From 1984 until 2004, his portrait appeared on the front of the Japanese 1,000 yen note. Summary: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. === Original: Shibusawa Eiichi, 1st Viscount Shibusawa (March 16, 1840 – November 11, 1931) was a Japanese industrialist use of double- entry accounting, joint-stock corporations and modern note-issuing banks… Summary: 4VNNBSJ[BUJPO /BNFE&OUJUZ3FDPHOJUJPO 4IJCVTBXB&JJDIJ Eiichi Shibusawa was born in 1840 in Kuroaraijima, Fukaya City. During his lifetime, he was involved in the establishment of approximately 500 companies and is known as the "father of Japanese capitalism”. Summary Result 4FBSDI3FTVMUPO 4IJCVTBXB&JJDIJ %#4FBSDI FH8JLJQFEJB
CVUEPZPVLOPXBCPVUIJN Original: Natsume Sōseki (9 February 1867 – 9 December 1916), born Natsume Kin'nosuke, was a Japanese novelist. He is best known around the world for his novels Kokoro, Botchan, I Am a Cat, Kusamakura and his unfinished work Light and Darkness. He was also a scholar of British literature and writer of haiku, kanshi, and fairy tales. From 1984 until 2004, his portrait appeared on the front of the Japanese 1,000 yen note. Summary: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. === Original: Shibusawa Eiichi, 1st Viscount Shibusawa (March 16, 1840 – November 11, 1931) was a Japanese industrialist use of double- entry accounting, joint-stock corporations and modern note-issuing banks… Summary: Knowledge: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. Dialog: A: I've been reading a lot lately. B: That's great! What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === Knowledge: Eiichi Shibusawa was born in 1840 in Kuroaraijima, Fukaya City. During his lifetime, he was involved in the establishment of approximately 500 companies and is known as the "father of Japanese capitalism”. Dialog: B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: 4VNNBSZ3FTVMU GSPN)ZQFS$-07" 4VNNBSJ[BUJPO 3FTQPOTF(FOFSBUJPO /BNFE&OUJUZ3FDPHOJUJPO 4IJCVTBXB&JJDIJ 4FBSDI3FTVMUPO 4IJCVTBXB&JJDIJ %#4FBSDI FH8JLJQFEJB
CVUEPZPVLOPXBCPVUIJN Original: Natsume Sōseki (9 February 1867 – 9 December 1916), born Natsume Kin'nosuke, was a Japanese novelist. He is best known around the world for his novels Kokoro, Botchan, I Am a Cat, Kusamakura and his unfinished work Light and Darkness. He was also a scholar of British literature and writer of haiku, kanshi, and fairy tales. From 1984 until 2004, his portrait appeared on the front of the Japanese 1,000 yen note. Summary: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. === Original: Shibusawa Eiichi, 1st Viscount Shibusawa (March 16, 1840 – November 11, 1931) was a Japanese industrialist use of double- entry accounting, joint-stock corporations and modern note-issuing banks… Summary: Knowledge: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. Dialog: A: I've been reading a lot lately. B: That's great! What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === Knowledge: Eiichi Shibusawa was born in 1840 in Kuroaraijima, Fukaya City. During his lifetime, he was involved in the establishment of approximately 500 companies and is known as the "father of Japanese capitalism”. Dialog: B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: 4VNNBSZ3FTVMU GSPN)ZQFS$-07" 4VNNBSJ[BUJPO 3FTQPOTF(FOFSBUJPO /BNFE&OUJUZ3FDPHOJUJPO 4IJCVTBXB&JJDIJ 4FBSDI3FTVMUPO 4IJCVTBXB&JJDIJ )BOENBEF4VNNBSZ %#4FBSDI FH8JLJQFEJB
CVUEPZPVLOPXBCPVUIJN Original: Natsume Sōseki (9 February 1867 – 9 December 1916), born Natsume Kin'nosuke, was a Japanese novelist. He is best known around the world for his novels Kokoro, Botchan, I Am a Cat, Kusamakura and his unfinished work Light and Darkness. He was also a scholar of British literature and writer of haiku, kanshi, and fairy tales. From 1984 until 2004, his portrait appeared on the front of the Japanese 1,000 yen note. Summary: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. === Original: Shibusawa Eiichi, 1st Viscount Shibusawa (March 16, 1840 – November 11, 1931) was a Japanese industrialist use of double- entry accounting, joint-stock corporations and modern note-issuing banks… Summary: Knowledge: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. Dialog: A: I've been reading a lot lately. B: That's great! What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === Knowledge: Eiichi Shibusawa was born in 1840 in Kuroaraijima, Fukaya City. During his lifetime, he was involved in the establishment of approximately 500 companies and is known as the "father of Japanese capitalism”. Dialog: B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: 4VNNBSJ[BUJPO 3FTQPOTF(FOFSBUJPO /BNFE&OUJUZ3FDPHOJUJPO 4IJCVTBXB&JJDIJ 4VNNBSZ3FTVMU GSPN)ZQFS$-07" 4FBSDI3FTVMUPO 4IJCVTBXB&JJDIJ )BOENBEF4VNNBSZ %#4FBSDI FH8JLJQFEJB
February 1867 – 9 r 1916), born Natsume Kin'nosuke, was se novelist. He is best known around the his novels Kokoro, Botchan, I Am a makura and his unfinished work Light ness. He was also a scholar of British and writer of haiku, kanshi, and fairy m 1984 until 2004, his portrait appeared nt of the Japanese 1,000 yen note. y: Soseki Natsume was a Japanese His real name is Natsume Kinnosuke. it was used on the 1,000 yen bill. Shibusawa Eiichi, 1st Viscount a (March 16, 1840 – November 11, s a Japanese industrialist use of double- ounting, joint-stock corporations and ote-issuing banks… y: Knowledge: Soseki Natsume was a Japanese novelist. His real name is Natsume Kinnosuke. His portrait was used on the 1,000 yen bill. Dialog: A: I've been reading a lot lately. B: That's great! What are you reading? A: I've been reading books by Soseki Natsume. B: Hey, I know him! The person in the 1,000 yen bill, right? === Knowledge: Eiichi Shibusawa was born in 1840 in Kuroaraijima, Fukaya City. During his lifetime, he was involved in the establishment of approximately 500 companies and is known as the "father of Japanese capitalism”. Dialog: B: Hey, what’s up! A: How’s it going? Anything new with you lately? B: I heard that the new ten-thousand yen bill will have Shibusawa Eiichi on it, but do you know about him? A: 4VNNBSJ[BUJPO 3FTQPOTF(FOFSBUJPO Yeah, I know him! He’s the one who laid the foundation for Japanese capitalism. He definitely deserves to be on the bill! 4VNNBSZ3FTVMU GSPN)ZQFS$-07" 4FBSDI3FTVMUPO 4IJCVTBXB&JJDIJ )BOENBEF4VNNBSZ
documents • Data size: 1.8TB • Token Size: 530B We do NOT use any data of our LINE Messenger Service • All messages on LINE • All posts on OpenChat Policy for Data Collection LINE LM Corpus Tokenizer Byte-level BPE tokenizer Library Megatron-LM Infrastructure NVIDIA Superpod 128 clustered DGX servers 1,024 A100 GPUs = Architecture Transformer Encoder-Decoder
HyperCLOVA w/ prompting VS BERT-large - Questionɿதࠃ།Ұͷঁఇɺଇఱ͕ݐͯͨԦேͷ໊લԿͰ͠ΐ͏? - Contextɿ࣌ɺԦேଇఱʢଇఱʣʹΑΔცୣͰपԦேʹΘ ͍ͬͯͨ͜ͱΛຊଆ͕Ѳ͍ͯ͠ͳ͔ͬͨͨΊɺ҄ాਅਓΒݱͰ एׯͷࠞཚΛੜͨ͡ɻ·ͨɺ൴Β͕ɾ҆Ͱݟ࣮ͨࡍͷྩ੍ ͷӡ༻࣮ଶɺຊࠃͰͷ૾ͱࣅͯඇͳΔͷͰ͋ͬͨɻͨͱ͑ ౻ݪژͰେۃ఼ΛؚΉٶʢ౻ݪٶʣΛͷதԝʹஔ͍ͯͨ͠ ͕ɺ҆Λ͡Ίͱ͢ΔதࠃͷͰଠۃٶΛؚΉߖɺͷ தԝʹ͋Δͷ͕௨ྫͰ͋ͬͨɻྩͷӡ༻ܗଶຊͱҟͳΓɺ ྩͷෆඋΛߦ͏֨ࣜͳͲ੍ఆ͞Ε͍ͯͨɻେ͖ͳিܸΛड͚ͯؼࠃ ͨ҄͠ాਅਓΒɺ͜ΕΒͷதͷྩ੍ͷࠩҟΛใࠂ͠ɺͷͪ ͷվֵʹੜ͔͞Ε͍ͯ͘ɻ Example of an entry of RCQA possible only tasks TASK: RCQA* (answerable ones only) • Removed unanswerable questions from dataset of the normal RCQA task
HyperCLOVA w/ prompting VS BERT-large - Questionɿதࠃ།Ұͷঁఇɺଇఱ͕ݐͯͨԦேͷ໊લԿͰ͠ΐ͏? - Contextɿ࣌ɺԦேଇఱʢଇఱʣʹΑΔცୣͰपԦேʹΘ ͍ͬͯͨ͜ͱΛຊଆ͕Ѳ͍ͯ͠ͳ͔ͬͨͨΊɺ҄ాਅਓΒݱͰ एׯͷࠞཚΛੜͨ͡ɻ·ͨɺ൴Β͕ɾ҆Ͱݟ࣮ͨࡍͷྩ੍ ͷӡ༻࣮ଶɺຊࠃͰͷ૾ͱࣅͯඇͳΔͷͰ͋ͬͨɻͨͱ͑ ౻ݪژͰେۃ఼ΛؚΉٶʢ౻ݪٶʣΛͷதԝʹஔ͍ͯͨ͠ ͕ɺ҆Λ͡Ίͱ͢ΔதࠃͷͰଠۃٶΛؚΉߖɺͷ தԝʹ͋Δͷ͕௨ྫͰ͋ͬͨɻྩͷӡ༻ܗଶຊͱҟͳΓɺ ྩͷෆඋΛߦ͏֨ࣜͳͲ੍ఆ͞Ε͍ͯͨɻେ͖ͳিܸΛड͚ͯؼࠃ ͨ҄͠ాਅਓΒɺ͜ΕΒͷதͷྩ੍ͷࠩҟΛใࠂ͠ɺͷͪ ͷվֵʹੜ͔͞Ε͍ͯ͘ɻ - Answerɿप Example of an entry of RCQA possible only tasks TASK: RCQA* (answerable ones only) • Removed unanswerable questions from dataset of the normal RCQA task
HyperCLOVA w/ prompting VS BERT-large - Questionɿதࠃ།Ұͷঁఇɺଇఱ͕ݐͯͨԦேͷ໊લԿͰ͠ΐ͏? - Contextɿ࣌ɺԦேଇఱʢଇఱʣʹΑΔცୣͰपԦேʹΘ ͍ͬͯͨ͜ͱΛຊଆ͕Ѳ͍ͯ͠ͳ͔ͬͨͨΊɺ҄ాਅਓΒݱͰ एׯͷࠞཚΛੜͨ͡ɻ·ͨɺ൴Β͕ɾ҆Ͱݟ࣮ͨࡍͷྩ੍ ͷӡ༻࣮ଶɺຊࠃͰͷ૾ͱࣅͯඇͳΔͷͰ͋ͬͨɻͨͱ͑ ౻ݪژͰେۃ఼ΛؚΉٶʢ౻ݪٶʣΛͷதԝʹஔ͍ͯͨ͠ ͕ɺ҆Λ͡Ίͱ͢ΔதࠃͷͰଠۃٶΛؚΉߖɺͷ தԝʹ͋Δͷ͕௨ྫͰ͋ͬͨɻྩͷӡ༻ܗଶຊͱҟͳΓɺ ྩͷෆඋΛߦ͏֨ࣜͳͲ੍ఆ͞Ε͍ͯͨɻେ͖ͳিܸΛड͚ͯؼࠃ ͨ҄͠ాਅਓΒɺ͜ΕΒͷதͷྩ੍ͷࠩҟΛใࠂ͠ɺͷͪ ͷվֵʹੜ͔͞Ε͍ͯ͘ɻ - Answerɿप Example of an entry of RCQA possible only tasks Create Few-shot with a context, a question text and an answer • If the correct answer is contained and easily extracted from the inference result, we judged it is correct TASK: RCQA* (answerable ones only) • Removed unanswerable questions from dataset of the normal RCQA task
HyperCLOVA w/ prompting VS BERT-large - Questionɿதࠃ།Ұͷঁఇɺଇఱ͕ݐͯͨԦேͷ໊લԿͰ͠ΐ͏? - Contextɿ࣌ɺԦேଇఱʢଇఱʣʹΑΔცୣͰपԦேʹΘ ͍ͬͯͨ͜ͱΛຊଆ͕Ѳ͍ͯ͠ͳ͔ͬͨͨΊɺ҄ాਅਓΒݱͰ एׯͷࠞཚΛੜͨ͡ɻ·ͨɺ൴Β͕ɾ҆Ͱݟ࣮ͨࡍͷྩ੍ ͷӡ༻࣮ଶɺຊࠃͰͷ૾ͱࣅͯඇͳΔͷͰ͋ͬͨɻͨͱ͑ ౻ݪژͰେۃ఼ΛؚΉٶʢ౻ݪٶʣΛͷதԝʹஔ͍ͯͨ͠ ͕ɺ҆Λ͡Ίͱ͢ΔதࠃͷͰଠۃٶΛؚΉߖɺͷ தԝʹ͋Δͷ͕௨ྫͰ͋ͬͨɻྩͷӡ༻ܗଶຊͱҟͳΓɺ ྩͷෆඋΛߦ͏֨ࣜͳͲ੍ఆ͞Ε͍ͯͨɻେ͖ͳিܸΛड͚ͯؼࠃ ͨ҄͠ాਅਓΒɺ͜ΕΒͷதͷྩ੍ͷࠩҟΛใࠂ͠ɺͷͪ ͷվֵʹੜ͔͞Ε͍ͯ͘ɻ - Answerɿप Example of an entry of RCQA possible only tasks Few-shots were created randomly by extracting a context from the RCQA dev-set for each inference Create Few-shot with a context, a question text and an answer • If the correct answer is contained and easily extracted from the inference result, we judged it is correct TASK: RCQA* (answerable ones only) • Removed unanswerable questions from dataset of the normal RCQA task
# # # # )ZQFS$-07"+1 #&35MBSHF . HyperCLOVA w/ prompting VS BERT-large Acc. Parameters of HyperCLOVA Pre-training BERT-large by using subset of LINE LM corpus HyperCLOVA 6.9B w/ prompting is far below Fine-tuned BERT
# # # # )ZQFS$-07"+1 #&35MBSHF . HyperCLOVA w/ prompting VS BERT-large Acc. Parameters of HyperCLOVA HyperCLOVA 39B JP is near BERT-large with fine tuning and parameter tuning by prompting
# # # # )ZQFS$-07"+1 #&35MBSHF . HyperCLOVA w/ prompting VS BERT-large Acc. Parameters of HyperCLOVA HyperCLOVA JP 82B with Prompting over BERT-large with full Fine-tuning and Parameter-tuning
to determine next response • Pros: Fully controllable • Cons: Lacks flexibility Rule-Based Dialog System )J IPX`TJUHPJOH"SFZPVSFBEZGPS5FDI7FSTF )PX`TJUHPJOH 5FDI7FSTFJTBNB[JOH *`NHSFBU Hey, do you like Brown? USER
to determine next response • Pros: Fully controllable • Cons: Lacks flexibility Rule-Based Dialog System )J IPX`TJUHPJOH"SFZPVSFBEZGPS5FDI7FSTF )PX`TJUHPJOH 5FDI7FSTFJTBNB[JOH *`NHSFBU Sorry, I didn’t understand the question. BOT Hey, do you like Brown? USER
to determine next response • Pros: Fully controllable • Cons: Lacks flexibility Rule-Based Dialog System )J IPX`TJUHPJOH"SFZPVSFBEZGPS5FDI7FSTF )PX`TJUHPJOH 5FDI7FSTFJTBNB[JOH *`NHSFBU Sorry, I didn’t understand the question. BOT Hey, do you like Brown? USER Guess what, I just got a LINE GIFT from my friend! USER
to determine next response • Pros: Fully controllable • Cons: Lacks flexibility Rule-Based Dialog System )J IPX`TJUHPJOH"SFZPVSFBEZGPS5FDI7FSTF )PX`TJUHPJOH 5FDI7FSTFJTBNB[JOH *`NHSFBU Sorry, I didn’t understand the question. BOT Hey, do you like Brown? USER Guess what, I just got a LINE GIFT from my friend! USER According to Wikipedia, LINE is a freeware app for instant communications... BOT
to determine next response • Pros: Fully controllable • Cons: Lacks flexibility Rule-Based Dialog System )J IPX`TJUHPJOH"SFZPVSFBEZGPS5FDI7FSTF )PX`TJUHPJOH 5FDI7FSTFJTBNB[JOH *`NHSFBU Sampl Sampl Sampl Sampl )J IPX`TJUHPJOH"SFZPVSFBEZGPS5FDI7FSTF *`NHSFBU :FT *`NTPSFBEZGPSUIFBXFTPNFQSFTFOUBUJPOT • Trending in dialog research field • Uses deep learning technique to generate response • Pros: very flexible and responses are interesting • Cons: Lacks controllability Generation-Based Dialog System
1SPNQU1SPHSBNNJOHBOE 1BSBNFUFS$POUSPMMJOH 4VCUBTLTPMWJOHXJUI )ZQFS$-07"PS/-15FDI *OTUSVDUUIF EJSFDUJPOPGPVUQVU w 4FBSDIJOHLOPXMFEHFT w %FDJEJOHEJBMPHBDU w FUD
1SPNQU1SPHSBNNJOHBOE 1BSBNFUFS$POUSPMMJOH 4VCUBTLTPMWJOHXJUI )ZQFS$-07"PS/-15FDI *OTUSVDUUIF EJSFDUJPOPGPVUQVU w 4FBSDIJOHLOPXMFEHFT w %FDJEJOHEJBMPHBDU w FUD (FOFSBUF $PSF0VUQVU
8IBUDBOXFEPIFSF 1SPNQU1SPHSBNNJOHBOE 1BSBNFUFS$POUSPMMJOH 4VCUBTLTPMWJOHXJUI )ZQFS$-07"PS/-15FDI *OTUSVDUUIF EJSFDUJPOPGPVUQVU w 4FBSDIJOHLOPXMFEHFT w %FDJEJOHEJBMPHBDU w FUD 1PTUQSPDFTT 8IBUDBOXFEPIFSF 'JMUFSJOHX/-15FDI "EEJOHBOE&EJUJOHX )ZQFS$-07"PS/-15FDI w $IFDLJOH)BMMVDJOBUJPO w $IFDLJOH&UIJDT w FUD %FUFDUPSDPSSFDU JNQSPQFSPVUQVUT (FOFSBUF $PSF0VUQVU
4 (2021) Dialogue system researchers gather and do live-evaluations of submitted chatbots Dialog Robot Competition 2022 A competition of controlling humanoid robot that can communicate and recommend a tourist sight inviting boss to a party Situation Track Situated Chat TU1MBDF X)ZQFS$-07"# Open Track Open-Domain Chitchat w/ random topics TU1MBDF X)ZQFS$-07"# recommending a spot Main Competition Tourist- Guide Robot TU1MBDF X)ZQFS$-07"#
and Ethics of Generated Text Accuracy and Reliability of Generated Text Dealing with the Cost per Inference by a Model There are still many things to think about…
will hold the "Tech- Verse" conference online (live streaming format) on Thursday, November 17 and Friday, November 18, 2022. Up until now, LINE and Yahoo Japan have held annual technology conferences as "LINE DEVELOPER DAY" and "Yahoo! JAPAN Tech Conference" respectively, but this year, for the first time, LINE and Yahoo Japan will hold a joint event, "Tech-Verse JAPAN Tech Conference, but this year, for the first time, LINE and Yahoo! ... Original Sentence Summarizer The overview of the sessions and speakers are available on the official website. LINE and Yahoo will jointly hold a technology conference. The theme of the conference will be "Pioneering the Future Society with Technology.” The official website, which went live today, provides an overview of the 80 sessions and speakers, with more details on the panel discussions to follow. …
will hold the "Tech- Verse" conference online (live streaming format) on Thursday, November 17 and Friday, November 18, 2022. Up until now, LINE and Yahoo Japan have held annual technology conferences as "LINE DEVELOPER DAY" and "Yahoo! JAPAN Tech Conference" respectively, but this year, for the first time, LINE and Yahoo Japan will hold a joint event, "Tech-Verse JAPAN Tech Conference, but this year, for the first time, LINE and Yahoo! ... Original Sentence Summarizer The overview of the sessions and speakers are available on the official website. LINE and Yahoo will jointly hold a technology conference. The theme of the conference will be "Pioneering the Future Society with Technology.” The official website, which went live today, provides an overview of the 80 sessions and speakers, with more details on the panel discussions to follow. …
will hold the "Tech- Verse" conference online (live streaming format) on Thursday, November 17 and Friday, November 18, 2022. Up until now, LINE and Yahoo Japan have held annual technology conferences as "LINE DEVELOPER DAY" and "Yahoo! JAPAN Tech Conference" respectively, but this year, for the first time, LINE and Yahoo Japan will hold a joint event, "Tech-Verse JAPAN Tech Conference, but this year, for the first time, LINE and Yahoo! ... Original Sentence Summarizer The overview of the sessions and speakers are available on the official website. LINE and Yahoo will jointly hold a technology conference. The theme of the conference will be "Pioneering the Future Society with Technology.” The official website, which went live today, provides an overview of the 80 sessions and speakers, with more details on the panel discussions to follow. …
will hold the "Tech- Verse" conference online (live streaming format) on Thursday, November 17 and Friday, November 18, 2022. Up until now, LINE and Yahoo Japan have held annual technology conferences as "LINE DEVELOPER DAY" and "Yahoo! JAPAN Tech Conference" respectively, but this year, for the first time, LINE and Yahoo Japan will hold a joint event, "Tech-Verse JAPAN Tech Conference, but this year, for the first time, LINE and Yahoo! ... Original Sentence Summarizer The overview of the sessions and speakers are available on the official website. LINE and Yahoo will jointly hold a technology conference. The theme of the conference will be "Pioneering the Future Society with Technology.” The official website, which went live today, provides an overview of the 80 sessions and speakers, with more details on the panel discussions to follow. … ?
Understanding Evaluation - https://github.com/yahoojapan/JGLUE HyperCLOVA w/ LoRA tuning VS Waseda RoBERTa large {"q_id": 3016, "question": "ձࣾͷ࠷ߴऀΛԿͱ͍͏͔ʁ (What do you call the chief executive officer of a company?)", "choice0": "ࣾ (president)", "choice1": "ڭࢣ (teacher)", "choice2": "෦ (manager)", "choice3": "όΠτ (part-time worker)", "choice4": "෦Լ (subordinate)", "label": 0} Example of an entry of JCommonsenseQA tasks Verification of LoRA performance of HyperCLOVA Japanese 6.7B model • Unlike Prompting, a solution is possible in 0-shot due to tuning by supervised data TASK: JCommonsenseQA • Japanese version of CommonsenseQA dataset • Five-choice QA questions to assess common sense reasoning ability This experiment was the result of a summer internship 2022 at LINE Corp. • More details will be provided at NLP2023
System (e.g. For Dialog system competition 4 w/ HyperCLOVA JP 39B) Speech Recognition (LINE’s Speech-to-Text system) Speech Synthesis (LINE’s Text-to-Speech system) Text Text Voice Voice Chatting Smooth response w/ beautiful voice A dialogue system beyond the “Uncanny Valley” is imminent Wow
Recognition Text Speech Synthesis Text Computer Vision Features Features Control Control Robot / Avatar In the future, NLP engineers need to acquire features from many modalities for NLP that other fields do not need.
Recognition Text Speech Synthesis Text Computer Vision Features Features Control Control Robot / Avatar We also need to send signals to control everything that other fields are not willing to do. In the future, NLP engineers need to acquire features from many modalities for NLP that other fields do not need.
NLP is strongly linked to the development of AI-related technologies - The widespread use of the Foundation Model has made it easier to leverage the output of speech and image recognition - LINE wants to move in the direction of R&D of Multimodal NLP with our own foundation models Foundation Model DNN Traditional only ML Rule only Small LM only Text only Multi-Modal
state to produce results that are useful enough for humans, and we are working hard to make it available to you! - However, it is necessary to prepare many subsystems to compensate for the missing functions in the foundation model - LINE is R&D these systems with the aim of providing them to you as well - LINE has successfully built a Japanese model of HyperCLOVA's 82B. We shared evaluation results and know-how for application