$30 off During Our Annual Pro Sale. View Details »

All We Need Is Prompting on a Pre-trained Japanese Large Language Model

All We Need Is Prompting on a Pre-trained Japanese Large Language Model

LINE DEVDAY 2021
PRO

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. None
  2. Toshinori Sato (@overlast) • Senior Software Engineer / Manager •

    Natural Language Processing • Information Retrieval • LINE CLOVA • Japanese NLU system • HyperCLOVA • Japanese Corpus / Evaluation • OSS: Main Contributor of NEologd project • mecab-ipadic-NEologd
  3. LINE NLP team and contributors Toshinori Sato Takashi Uemura Wataru

    Sakata Akifumi Nakamachi Kenta Shinzato Takuto Asakura Tatsuya Uchiyama Masahiko Higashiyama Tung Nguyen Shengzhe Li Koga Kobayashi Takato Yamazaki Seiichi Inoue Yoshifumi Kondo Jumon Nozaki et al.
  4. Attention, please ! The target audience is Mainly engineers who

    are interested in natural language processing. And then there is the part for NLP professionals. NLP means Natural Language Processing. Omit from this session is Detailed information related to the following. - Building language models - Tuning methods for language models See below for more information. - https://arxiv.org/abs/2109.04650 * 40-minute session Please enjoy listening to it over a cup of coffee or something ! * What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers, Boseop Kim et.al, EMNLP 2021
  5. None
  6. Application example of HyperCLOVA A dialogue system with role-playing functions

  7. Large-scale general-purpose language models + α Automatic evaluation with 39B

    JP Model for a QA task
  8. None
  9. Application example of HyperCLOVA A dialogue system with role-playing functions

  10. Application example of HyperCLOVA A dialogue system with role-playing functions

    Purely auto-generated text
  11. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Hello.

  12. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Client App

    Voice Hello.
  13. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Speech To

    Text Client App Dialog App Voice Hello.
  14. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Speech To

    Text Client App Dialog App HyperCLOVA Query text Voice Hello.
  15. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Speech To

    Text Client App Dialog App HyperCLOVA Result text Query text Large-scale Language Model ɾɾɾ Knowledge Base Search Voice Hello.
  16. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Speech To

    Text Response text Client App Dialog App HyperCLOVA Result text Query text ɾɾɾ Knowledge Base Search Voice Hello. Large-scale Language Model
  17. 㲔Ç㲋 E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Speech

    To Text Text To Speech Response text Client App Dialog App HyperCLOVA Result text Query text ɾɾɾ Knowledge Base Search Voice Hello. Sounds Long time, no see. Large-scale Language Model
  18. E.g. A spoken dialogue system applying HyperCLOVA 㲔Ç㲋 Speech To

    Text Text To Speech Response text Client App Dialog App HyperCLOVA 㲔Ç㲋 Result text Query text ɾɾɾ Knowledge Base Search Sounds Voice Long time, no see. Hello. Large-scale Language Model
  19. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  20. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  21. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  22. What is Language Model (LM) ? - HyperCLOVA includes an

    unsupervised autoregressive language model - Autoregressive language model … - is capable of calculating probability distributions - provides maximum likelihood estimation of parameters for a sample - can generate future text based on a text up to the present - A model that gives the probability of a certain sequence of words - E.g. P(It’s sunny today) > P(Sunny is today)
  23. Our policy for corpus data collection We do not use

    any data of our conversation service. - All messages on LINE - All posts on OpenChat We maintain this corpus with the utmost consideration for the rights held by various customers. - Add versatility to this corpus - Make a subset of this corpus available for use outside of LINE
  24. LINE LM Corpus(for HyperCLOVA’s LMs) NO DATA from LINE or

    LINE Open-chat is used to build our LMs - Developed based on a corpus built for training the BERT models after 2019 - Used crawled data for LINE search - Eliminated data that can be easily extracted as "non-public personal information" - Covered important sites for learning Japanese expressions - Purchased and used of external content after resolving rights issues !
  25. Current status of LINE LM Corpus For 82B JP Model

    Samples 10B Tokens 500B Bytes 1.8T
  26. None
  27. Extensive use of pre-trained large-scale

  28. Modeling status of HyperCLOVA 1.3B → 6.7B → 13B →

    39B 13B → 39B 82B 204B ʙ (in 2022) Multi-lingual Model Large model JP / Multi-lingual Hyper scale JP Model JP Model Work in progress
  29. Architecture of HyperCLOVA Eco System Infra Model Data

  30. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  31. Methods for applying LMs to a target task - Method

    for HyperCLOVA e.t.c. - Few-shot: Give “a description of a task” and “some demonstration cases” - One-shot: Give “a description of a task” and “a demonstration case” - Zero-shot: Give “a description of a task” only - Pros: Possibility to solve a task from brief instructions or short examples - Cons: Possibility of not reaching the performance of a SOTA model achieved by fine-tuning - Method for BERT e.t.c. - Fine-tuning: Supervised learning on a dataset of a target task based on a general- purpose pre-trained model - Pros: Excellent performance in benchmarks - Cons: Need to learn for each target task / Possible loss of generalization ability
  32. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Output Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
  33. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

  34. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

  35. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField
  36. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt
  37. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline
  38. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task)
  39. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information
  40. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Shot Shot ….
  41. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Shot Shot ….
  42. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples
  43. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Shot
  44. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Shot Shot ….
  45. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Shot Shot ….
  46. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
  47. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Output Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
  48. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Output Title(description of task) Another information Samples Query (In some cases, output is given as a suffix after inference) Shot Shot ….
  49. Task Outlines and Few Shots for individual tasks Playground(HyperCLOVA Studio)

    TextField Prompt Task outline Title(description of task) Another information Samples Query (In some cases, a next output is given as a suffix after inference) Shot Previous output …. Output
  50. • ղઆ͔Βആ۟Λੜ੒͠·͢ɻ • */͕֝ݹ͍஑ʹඈͼࠐΜͩ࣌ͷԻͷ༷ࢠΛӵΜͩ۟Ͱ͢ɻ͕֝஑ʹඈͼࠐΉԻΛදݱͨ͠୯७ͳ۟Ͱ͸͋Γ ·͕͢ɺपғͷ੩ऐ΍ऐΕͨݹ஑ͷ༷ࢠɺ͕֝஑ʹඈͼࠐΉੜͷ༂ಈͷΑ͏ͳ৘ܠ͕·͟·͟ͱ఻Θͬͯ͘ Δɺझͷ͋Δ۟ͱͳ͍ͬͯ·͢ɻ͜ͷ۟ͷقޠ͸ʰ֝ʱͰɺ͜Ε͸य़Λදݱ͍ͯ͠ΔقޠͰ͢ɻ • 065ݹ஑΍֝ඈ͜Ήਫͷ͓ͱ • */ࢁܗݝʹ͋ΔཱੴࣉʢΓͬ͠Ό͘͡ʣͰηϛ͕໐͍͍ͯΔ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ͜ͷ۟͸໌֬ͳ۟༁͸

    Θ͔͍ͬͯͳ͍ͷͰɺগ͠ϛεςϦΞεͰӵΈղ͘͜ͱʹϩϚϯ͕͋Δആ۟Ͱ͢ɻʮؓ͞ʯͱʮઊͷ੠ʯͱ͍ ͏Ұݟໃ६͢Δදݱͷҙਤ͕ಛʹߟ͑ͤ͞ΒΕ·͢Ͷɻ • 065ؓ͞΍ؠʹ͠Έೖઊͷ੠ • */݄຤ͷॵ͍೔ͷޕޙͷձٞʹͱͯ΋૖େͳγεςϜ։ൃͷ࿩Λ͠Α͏ͱ͍ͯ͠Δ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ ୯७ͳ۟Ͱ͸͋Γ·͕͢ɺαϥϦʔϚϯͳΤϯδχΞ͕༷ʑͳݒ೦͕͋ΔதͰɺͦΕͰ΋͜ͷٕज़Λܗʹ͢Δ ͜ͱͰਓ޻஌ೳٕज़ͷະདྷΛ։͖͍ͨͱ͍͏ر๬Λ͜Ίͨؾ࣋ͪΛදݱ͍ͯ͠Δɻ • 065݄ͷ೤͍࠭ʹຒΊͨΔΘ͕ࠢ • ˡ8JUIBMJUUMFFEJUJOH JUCFDPNFTBIBJLV FH೤͖݄࠭ʹຒΊͨΔΘ͕ࠢ Response to Individual Tasks with Task Outline and Few-shot Example: Support for writing Haiku with 2-shots
  51. Extensive use of pre-trained large-scale

  52. - Product packages always contain a product summary, but in

    many cases, a advertising description is transcribed by a salesperson - In this demo, the parameters of HyperCLOVA Studio(Play-ground) are adjusted to generate an attractive description - From a food product title and a summary, HyperCLOVA Studio generate an attractive description to advertise it Document generation using Play-ground E.g. Product summary to the description
  53. Demo Movie 60sec

  54. ঎඼໊: ͬ͢ͺ ͔ʹͤΜ ֓ཁ: ◦εφοΫ՛ࢠ ◦ݪࡐྉ໊: খഴค(ࠃ಺੡଄)ɺ২෺༉ɺͰΜ คɺ͔ʹͷ਎ɺค຤ਣʢখഴΛؚ Ήʣɺ࠭౶ɺ৯Ԙɺ͔ʹύ΢μ ʔɺࠛ෍ύ΢μʔɺ࠭౶

    / ๲ு ࡎɺ؁ຯྉ(ఱ૲) ◦಺༰ྔ: 60g આ໌: ΄ΜͷΓࢎຯ ΄ΜͷΓࢎຯ͕ޮ͍ͨΧχ෩ຯͷ ͤΜ΂͍Ͱ͢ɻޱͷதʹೖΕΔͱ ϗϩϗϩͱ่Ε͍ͯ͘৯ײ͸ͨ· Γ·ͤΜɻ͓ञͷ͓ͭ·Έ΍͓΍ ͭʹͲ͏ͧɻ Base Prefix Output → ঎඼ͷ֓ཁ͔ΒࢥΘͣങ͍ͨ͘ͳΔઆ໌จΛੜ੒͢Δ ঎඼໊ۤʑ͍͠γϣίϥΧΧΦ ֓ཁ˓໊শνϣίϨʔτ˓ݪࡐྉ໊ΧΧΦϚε ࠃ಺੡଄ ɺίίΞύ΢μʔɺ࠭౶ɺίίΞόλʔೕԽࡎɺ߳ྉ˓಺༰ྔH˓อ ଘํ๏ˆҎԼͷྫྷ҉ॴͰอଘ͍ͯͩ͘͠͞ɻ આ໌݈߁ͷ͜ͱ͚ͩΛߟ͑ͨ௒ΧΧΦϙϦϑΣϊʔϧɻѹ౗తͳڧ͍ۤຯͱඓΛ؏͘઱྽ͳ߳ΓͷԞʹ΄ͷ͔ͳ؁ຯΛײ͡ΒΕ·͢ɻ͠ ͔΋Hͱେ༰ྔͰ๨೥ձͷേήʔϜʹ΋ϐολϦ One-shot => Increased the Temperature (randomness) and lowered the Repetition penalty (for control of repetition) to make the text contain the appeal
  55. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  56. Application development using Inference API Eco System Infra Model Data

  57. Subjective evaluation of a dialogue system using HyperCLOVA with 6.7B/13B/39B

    JP Model for 4 tasks Annotations were made for all task/model combinations Subjective evaluation by the same 5 annotators Each session is N round-trip conversational pairs The user receives a list of N topics for evaluation Each session consumes one vocabulary item from the list Conducted in Play-ground 4. Free chatting 3. Reacting to user sentiment on a topic 2. Tracking different multiple topics 1. Understanding of basic vocabulary
  58. Exception: Free chat tasks did not evaluate the achievement of

    the goal Natural response Q: Was it a natural reaction? Are there any breakdowns or inconsistencies in the history of the conversation? Following a topic Q: Did it stay on topic? Did it lose track of the topic (in this case, did it lose track of what it was being asked about? Was it able to switch topics (in this case, was it able to pull back to the previous question? Providing a topic or asking a question Q: Did it provide a topic? Was it able to get the speaker to talk during the answer (most likely not)? Achievement of goals Q: Did it achieve your objective? Common evaluation criteria for all tasks !
  59. 1. Understanding of basic vocabulary elementary vocabulary secondary vocabulary খֶߍ

    ۚͮͪ தֶߍ Ԗච େਓ νϡʔϦοϓ ઌੜ ώϚϫϦ ϥΠΦϯ ص ΩϦϯ Ҝࢠ ిं ۺ ं αϯμϧ ηʔλʔ ΓΜ͝ εΧʔτ Έ͔Μ Ωϟϕπ αϯϚ ͖Ύ͏Γ Ϛάϩ εζϝ ϋʔϞχΧ Πϯί ϐΞϊ τϯϘ ΞϦ IUUQTSFQPTJUPSZOJOKBMBDKQ BDUJPOSFQPTJUPSZ@BDUJPO@DPNNPO@EPXOMPBEJUFN@JEJUFN@OPBUUSJCVUF@JEpMF@OP ToDO: Ask the HyperCLOVA questions in the form of Level 1 and Level 2 for each vocabulary
  60. 1. Understanding of basic vocabulary

  61. 1. Understanding of basic vocabulary Does the system accurately answer

    word meanings (Level 1) and emotions (Level 2)? 6.7B 13B 39B Natural response 0.55 0.66 0.98 Following a topic 0.63 0.84 0.98 Providing a topic or asking a question 0.00 0.01 0.00 Achievement of goals 0.55 0.68 0.84
  62. Topic A Topic B Topic A Topic B ৽ܕίϩφ΢Πϧε Πϯό΢ϯυ

    Πνϩʔ େ୩ᠳฏ ۓٸࣄଶએݴ ৽ܕίϩφϫΫνϯ AR(֦ுݱ࣮) ࣗಈӡసٕज़ YouTuber VTuber ϨΦφϧυɾμɾϰΟϯν ΫϩʔυɾϞω ฏ੒ ྩ࿨ Πϯλʔωοτ 5G σϑϨܦࡁ ௒ߴྸԽࣾձ ւ֎ཱྀߦ ࠃ಺ཱྀߦ ిؾࣗಈं ϦχΞதԝ৽װઢ 2. Tracking different multiple topics ToDO: Start a conversation about topic A and switch to topic B before 10 round trips ฏ੒
  63. Evaluation: Can the system move from topic A to topic

    B during a conversation? 6.7B 13B 39B Natural response 0.66 0.53 0.91 Following a topic 0.71 0.61 0.95 Providing a topic or asking a question 0.04 0.01 0.02 Achievement of goals 0.66 0.55 0.91 2. Tracking different multiple topics
  64. 3. Reacting to user sentiment on a topic Topic sentiment

    A sentiment B ͕Μ͹ͬͯཉ͍͠ ࣙΊͯཉ͍͠ ৽ܕίϩφ΢Πϧε ؤுΖ͏ ෆ҆ͩ Πϯό΢ϯυ ໭ͬͯ͘Δ ໭Βͳ͍ ৽ܕίϩφϫΫνϯ ଴ͱ͏ ͍ͭʹͳΔ YouTuber ΍Γ͍ͨ ΍Γͨ͘ͳ͍ େ୩ᠳฏ ׆༂ͯ͠ཉ͍͠ ࡾৼͯ͠ཉ͍͠ AR(֦ுݱ࣮) ໘ന͍ ๞͖ͨ ௒ߴྸԽࣾձ େৎ෉ ৺഑ ւ֎ཱྀߦ ߦ͖͍ͨ ߦ͖ͨ͘ͳ͍ ిؾࣗಈं ৐Γ͍ͨ ৐Γͨ͘ͳ͍ ϦχΞதԝ৽װઢ ৐Γ͍ͨ ৐Γͨ͘ͳ͍ ToDO: Have a 15 back and forth conversation about the Topic. Speak with the feeling of sentiment A at first, then sentiment B.
  65. Evaluation: Was the system able to agree with the user

    when he or she was feeling sentiment A about the topic? 3. Reacting to user sentiment on a topic 6.7B 13B 39B Natural response 0.69 0.45 0.90 Following a topic 0.74 0.52 0.95 Providing a topic or asking a question 0.04 0.02 0.03 Achievement of goals 0.68 0.46 0.90
  66. Evaluation When a user has sentiment B feelings about a

    topic, could the system disagree? 3. Reacting to user sentiment on a topic 6.7B 13B 39B Natural response 0.61 0.40 0.87 Following a topic 0.67 0.45 0.93 Providing a topic or asking a question 0.09 0.02 0.03 Achievement of goals 0.46 0.36 0.50
  67. 4. Free chatting

  68. Evaluation: Facilitate a free dialogue with the system 4. Free

    chatting 6.7B 13B 39B Natural response 0.65 0.40 0.92 Following a topic 0.76 0.40 0.94 Providing a topic or asking a question 0.12 0.04 0.09 Achievement of goals - - -
  69. Summary: Subjective Evaluation of 39B JP Model 1. Understanding of

    basic vocabulary 2. Tracking different multiple topics 3. Reacting to positive sentiment on a topic 3. Reacting to negative sentiment on a topic 4. Free chatting Natural response 0.978 0.908 0.908 0.872 0.925 Following a topic 0.984 0.952 0.951 0.930 0.935 Providing a topic or asking a question 0.003 0.023 0.033 0.035 0.086 Achievement of goals 0.835 0.907 0.899 0.505 -
  70. Summary: Subjective Evaluation of 39B JP Model 1. Understanding of

    basic vocabulary 2. Tracking different multiple topics 3. Reacting to positive sentiment on a topic 3. Reacting to negative sentiment on a topic 4. Free chatting Natural response 0.978 0.908 0.908 0.872 0.925 Following a topic 0.984 0.952 0.951 0.930 0.935 Providing a topic or asking a question 0.003 0.023 0.033 0.035 0.086 Achievement of goals 0.835 0.907 0.899 0.505 -
  71. The goal of LINE NLP team is to achieve high

    quality and safe output
  72. Difficulties with the Japanese language Difficult to learn Japanese speakers

    use - Hiragana - Katakana - Kanji - Romaji - e.t.c. to write a single document Large amount of essential vocabulary Required for daily conversation - over 8,000 words Need to know many of - Homonyms - Honorifics - Dialects Omission of words Japanese speakers may omit following words in a document - Subject - Object Omitted words may not be uniquely inferred
  73. Conducting joint research using HyperCLOVA Hope to collaborate with more

    research institutions and companies in the future Providing HyperCLOVA’s APIs to universities, research institutes, and companies Collaborating to dramatically improve system performance and detect and eliminate bias in language models with - Osaka University Graduate School - Tokyo Metropolitan University - Waseda University
  74. Difficulties in text generation Potential risks of generated text The

    following technologies need to be developed - Improving the content bias of a corpus and its notation - Ensuring the truthfulness and security of a output text Implementation of AI Ethics Various ethical considerations need to be taken into account for input and output texts - Toxicity - Sexual - Offensive - Profane - Intimidating - Attack on identity Automation of intrinsic evaluation Need metrics that can be applied to dynamic text generation results - Accuracy of topical content - Consistency of generated text - Determination of achievement of objectives
  75. Eco System Infra Model Data Automatic evaluation for 39B JP

    model
  76. Automatic evaluation with 39B JP Model for a QA task

    Few-shots were created by randomly extracting a context from the RCQA possible only dev-set for each inference Create Few-shot with a context, a question text and an answer - If correct answer is contained and easily extracted from inference result, we judged it is correct TASK: RCQA* possible only - Removed unanswerable questions from dataset of the normal RCQA task * ղ౴Մೳੑ෇͖ಡղσʔληοτ: http://www.cl.ecei.tohoku.ac.jp/rcqa/
  77. Result of automatic evaluation with 39B JP Model for RCQA

    possible only task model / few-shot shot temperature top_p answer match 6.7B / contextual 0 0.5 0.8 - 4 0.1 0.9 66.52 13B / contextual 0 0.5 0.8 - 4 0.4 0.1 70.28 39B / contextual 0 0.4 0.5 80.51 1 0.4 0.5 89.18 2 0.4 0.5 89.31 3 0.4 0.5 89.09 4 0.4 0.5 89.83 39B / non-contextual 0 0.4 0.5 69.50 1 0.4 0.5 76.97 2 0.4 0.5 79.08 3 0.4 0.5 79.38 4 0.4 0.5 80.51
  78. HyperCLOVA’s LM vs BERT-large TASK: RCQA possible only (Removed unanswerable

    questions from the normal RCQA task) - It is possible that BERT can achieve higher results with fine-tuning on specific tasks - HyperCLOVA can achieve the same level of performance with Prompting and rough parameter search test acc test F1 memo HyperCLOVA 85.03 89.95 JP 39B 2-shots, temperature=0.4, top_p= 0.5 BERT-jp-large 86.68 90.49 Using subset of LINE LM corpus
  79. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  80. Application example: HyperCLOVA Friends Talk with any adjustable character using

    HyperCLOVA
  81. Demo Movie 60sec

  82. Application example: HyperCLOVA Friends Talk with any adjustable character using

    HyperCLOVA HyperCLOVA allows for some role-playing
  83. HyperCLOVA allows for generic role-playing Challenge: Features other than smooth

    conversation and topic tracking The truth of what it says should be verified before it responds Conversation is smooth and the meaning of what is said is understood Some ambiguous responses e.g. Temperature of hot water during washing Effect of data bias e.g. unsettled but became female The consistency of its persona is a bit suspect Start with NO character set
  84. Is HyperCLOVA really necessary for NLP? - YES !! -

    If you're on a budget … - The history of NLP is strongly linked to the development of AI-related technologies - LINE wants to move in the direction of building our own models and having customers use them Large-scale general-purpose LMs DNN Traditional only ML Rule only Small LM only
  85. Agenda - What’s HyperCLOVA - Inside of HyperCLOVA - Application

    development by Prompting - Evaluation of HyperCLOVA’s JP LMs - Application to Dialogue Systems - The future of LINE and NLP
  86. The future of LINE and NLP

  87. LINE was released to the public 10 years FAQ from

    NLPer: Isn’t there a challenge left to tackle?
  88. FAQ from NLPer: Isn’t there a challenge left to tackle?

    No, not yet !! It's not over yet !!
  89. Various issues related to HyperCLOVA • Building models and using

    those models to make inferences - the biggest challenge of all • Fine-tuning and other parameter-efficient transfer learning methods, as well as compact models • Responding to new topics/events that have arisen since a model was built • Implementing AI Ethics • Filtering according to the application and specifying the reason • Building a Web corpus • Removing duplicate data • Realization of accountability for each entry used • Responding to deletion requests on a URL/ID basis • Detection and anonymization of personal information
  90. None
  91. None
  92. None
  93. LINE has more than 50 service brands

  94. LINE’s NLP journey is still in its early stages Let's

    challenge together at LINE LINE's various services needs essential improvements using NLP technology ! • Large-scale general-purpose LMs • “High Road” NLP • Information Retrieval • String processing • Data creation • Evaluation tasks, e.t.c.
  95. None
  96. HyperCLOVA Hands-on

  97. HyperCLOVA Hands-on to be held during 2021 Hands-on with HyperCLOVA

    Studio and APIs for engineers Please wait for informations from LINE Developers ( @LINE_DEV) A Python SDK to using HyperCLOVA API will be provided
  98. None
  99. Open & Share

  100. LINE’s LMs for OSS start in FY2021 Of course, for

    models “other than” HyperCLOVA Performance target: LINE's LMs for OSS > other OSS LMs Would like to update a few times a year, if possible !! Train using a subset of the corpus for HyperCLOVA(LINE LM Corpus) !
  101. Summary - Updated the current status of HyperCLOVA in LINE

    - Reported on large-scale general-purpose LMs and Prompting, using several topics as examples - There are cases where surprisingly high quality can be achieved - There are issues that cannot be solved ad hoc - On LINE, we can work on all layers of NLP R&D for not only HyperCLOVA - Please stay tuned for a next NLP information by LINE