Upgrade to Pro — share decks privately, control downloads, hide ads and more …

All We Need Is Prompting on a Pre-trained Japanese Large Language Model

All We Need Is Prompting on a Pre-trained Japanese Large Language Model

LINE DEVDAY 2021

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. Toshinori Sato (@overlast)
    ● Senior Software Engineer / Manager
    ● Natural Language Processing
    ● Information Retrieval
    ● LINE CLOVA
    ● Japanese NLU system
    ● HyperCLOVA
    ● Japanese Corpus / Evaluation
    ● OSS: Main Contributor of NEologd project
    ● mecab-ipadic-NEologd

    View full-size slide

  2. LINE NLP team and contributors
    Toshinori Sato
    Takashi Uemura
    Wataru Sakata
    Akifumi Nakamachi
    Kenta Shinzato
    Takuto Asakura
    Tatsuya Uchiyama
    Masahiko Higashiyama
    Tung Nguyen
    Shengzhe Li
    Koga Kobayashi
    Takato Yamazaki
    Seiichi Inoue
    Yoshifumi Kondo
    Jumon Nozaki
    et al.

    View full-size slide

  3. Attention, please !
    The target audience is
    Mainly engineers who are
    interested in natural
    language processing.
    And then there is the part
    for NLP professionals.
    NLP means Natural
    Language Processing.
    Omit from this session is
    Detailed information related
    to the following.
    - Building language models
    - Tuning methods for
    language models
    See below for more information.
    - https://arxiv.org/abs/2109.04650 *
    40-minute session
    Please enjoy
    listening to it
    over a cup of
    coffee or
    something
    !
    * What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers, Boseop Kim et.al, EMNLP 2021

    View full-size slide

  4. Application example of HyperCLOVA
    A dialogue system with role-playing functions

    View full-size slide

  5. Large-scale general-purpose language models + α
    Automatic evaluation with 39B JP Model for a QA task

    View full-size slide

  6. Application example of HyperCLOVA
    A dialogue system with role-playing functions

    View full-size slide

  7. Application example of HyperCLOVA
    A dialogue system with role-playing functions
    Purely auto-generated text

    View full-size slide

  8. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Hello.

    View full-size slide

  9. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Client App
    Voice
    Hello.

    View full-size slide

  10. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Speech To Text
    Client App
    Dialog App
    Voice
    Hello.

    View full-size slide

  11. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Speech To Text
    Client App
    Dialog App
    HyperCLOVA
    Query text
    Voice
    Hello.

    View full-size slide

  12. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Speech To Text
    Client App
    Dialog App
    HyperCLOVA
    Result text
    Query text
    Large-scale Language Model
    ɾɾɾ
    Knowledge Base Search
    Voice
    Hello.

    View full-size slide

  13. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Speech To Text
    Response text
    Client App
    Dialog App
    HyperCLOVA
    Result text
    Query text
    ɾɾɾ
    Knowledge Base Search
    Voice
    Hello.
    Large-scale Language Model

    View full-size slide

  14. 㲔Ç㲋
    E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Speech To Text
    Text To Speech
    Response text
    Client App
    Dialog App
    HyperCLOVA
    Result text
    Query text
    ɾɾɾ
    Knowledge Base Search
    Voice
    Hello.
    Sounds
    Long time, no see.
    Large-scale Language Model

    View full-size slide

  15. E.g. A spoken dialogue system applying HyperCLOVA
    㲔Ç㲋
    Speech To Text
    Text To Speech
    Response text
    Client App
    Dialog App
    HyperCLOVA
    㲔Ç㲋
    Result text
    Query text
    ɾɾɾ
    Knowledge Base Search
    Sounds
    Voice
    Long time, no see.
    Hello.
    Large-scale Language Model

    View full-size slide

  16. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  17. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  18. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  19. What is Language Model (LM) ?
    - HyperCLOVA includes an unsupervised autoregressive language model
    - Autoregressive language model …
    - is capable of calculating probability distributions
    - provides maximum likelihood estimation of parameters for a sample
    - can generate future text based on a text up to the present
    - A model that gives the probability of a certain sequence of words
    - E.g. P(It’s sunny today) > P(Sunny is today)

    View full-size slide

  20. Our policy for corpus data collection
    We do not use any data of our conversation service.
    - All messages on LINE
    - All posts on OpenChat
    We maintain this corpus with the utmost consideration
    for the rights held by various customers.
    - Add versatility to this corpus
    - Make a subset of this corpus available for use outside of LINE

    View full-size slide

  21. LINE LM Corpus(for HyperCLOVA’s LMs)
    NO DATA from LINE or LINE Open-chat is used to build our LMs
    - Developed based on a corpus built for training the BERT models after 2019
    - Used crawled data for LINE search
    - Eliminated data that can be easily extracted as "non-public personal information"
    - Covered important sites for learning Japanese expressions
    - Purchased and used of external content after resolving rights issues
    !

    View full-size slide

  22. Current status of LINE LM Corpus
    For 82B JP Model
    Samples
    10B
    Tokens
    500B
    Bytes
    1.8T

    View full-size slide

  23. Extensive use of pre-trained large-scale

    View full-size slide

  24. Modeling status of HyperCLOVA
    1.3B → 6.7B → 13B → 39B 13B → 39B 82B 204B ʙ (in 2022)
    Multi-lingual
    Model
    Large model
    JP / Multi-lingual
    Hyper scale
    JP Model
    JP Model
    Work in progress

    View full-size slide

  25. Architecture of HyperCLOVA
    Eco System
    Infra
    Model
    Data

    View full-size slide

  26. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  27. Methods for applying LMs to a target task
    - Method for HyperCLOVA e.t.c.
    - Few-shot: Give “a description of a task” and “some demonstration cases”
    - One-shot: Give “a description of a task” and “a demonstration case”
    - Zero-shot: Give “a description of a task” only
    - Pros: Possibility to solve a task from brief instructions or short examples
    - Cons: Possibility of not reaching the performance of a SOTA model achieved by fine-tuning
    - Method for BERT e.t.c.
    - Fine-tuning: Supervised learning on a dataset of a target task based on a general-
    purpose pre-trained model
    - Pros: Excellent performance in benchmarks
    - Cons: Need to learn for each target task / Possible loss of generalization ability

    View full-size slide

  28. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Output
    Title(description of task)
    Another information
    Samples
    Query (In some cases, output is given as a suffix after inference)
    Shot
    Shot
    ….

    View full-size slide

  29. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)

    View full-size slide

  30. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)

    View full-size slide

  31. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField

    View full-size slide

  32. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt

    View full-size slide

  33. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline

    View full-size slide

  34. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)

    View full-size slide

  35. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information

    View full-size slide

  36. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Shot
    Shot
    ….

    View full-size slide

  37. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Shot
    Shot
    ….

    View full-size slide

  38. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples

    View full-size slide

  39. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Shot

    View full-size slide

  40. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Shot
    Shot
    ….

    View full-size slide

  41. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Shot
    Shot
    ….

    View full-size slide

  42. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Query (In some cases, output is given as a suffix after inference)
    Shot
    Shot
    ….

    View full-size slide

  43. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Output
    Title(description of task)
    Another information
    Samples
    Query (In some cases, output is given as a suffix after inference)
    Shot
    Shot
    ….

    View full-size slide

  44. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Output
    Title(description of task)
    Another information
    Samples
    Query (In some cases, output is given as a suffix after inference)
    Shot
    Shot
    ….

    View full-size slide

  45. Task Outlines and Few Shots for individual tasks
    Playground(HyperCLOVA Studio)
    TextField
    Prompt
    Task outline
    Title(description of task)
    Another information
    Samples
    Query (In some cases, a next output is given as a suffix after inference)
    Shot
    Previous output
    ….
    Output

    View full-size slide

  46. ● ղઆ͔Βആ۟Λੜ੒͠·͢ɻ
    ● */͕֝ݹ͍஑ʹඈͼࠐΜͩ࣌ͷԻͷ༷ࢠΛӵΜͩ۟Ͱ͢ɻ͕֝஑ʹඈͼࠐΉԻΛදݱͨ͠୯७ͳ۟Ͱ͸͋Γ
    ·͕͢ɺपғͷ੩ऐ΍ऐΕͨݹ஑ͷ༷ࢠɺ͕֝஑ʹඈͼࠐΉੜͷ༂ಈͷΑ͏ͳ৘ܠ͕·͟·͟ͱ఻Θͬͯ͘
    Δɺझͷ͋Δ۟ͱͳ͍ͬͯ·͢ɻ͜ͷ۟ͷقޠ͸ʰ֝ʱͰɺ͜Ε͸य़Λදݱ͍ͯ͠ΔقޠͰ͢ɻ
    ● 065ݹ஑΍֝ඈ͜Ήਫͷ͓ͱ
    ● */ࢁܗݝʹ͋ΔཱੴࣉʢΓͬ͠Ό͘͡ʣͰηϛ͕໐͍͍ͯΔ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ͜ͷ۟͸໌֬ͳ۟༁͸
    Θ͔͍ͬͯͳ͍ͷͰɺগ͠ϛεςϦΞεͰӵΈղ͘͜ͱʹϩϚϯ͕͋Δആ۟Ͱ͢ɻʮؓ͞ʯͱʮઊͷ੠ʯͱ͍
    ͏Ұݟໃ६͢Δදݱͷҙਤ͕ಛʹߟ͑ͤ͞ΒΕ·͢Ͷɻ
    ● 065ؓ͞΍ؠʹ͠Έೖઊͷ੠
    ● */݄຤ͷॵ͍೔ͷޕޙͷձٞʹͱͯ΋૖େͳγεςϜ։ൃͷ࿩Λ͠Α͏ͱ͍ͯ͠Δ༷ࢠΛӵ·Εͨ۟Ͱ͢ɻ
    ୯७ͳ۟Ͱ͸͋Γ·͕͢ɺαϥϦʔϚϯͳΤϯδχΞ͕༷ʑͳݒ೦͕͋ΔதͰɺͦΕͰ΋͜ͷٕज़Λܗʹ͢Δ
    ͜ͱͰਓ޻஌ೳٕज़ͷະདྷΛ։͖͍ͨͱ͍͏ر๬Λ͜Ίͨؾ࣋ͪΛදݱ͍ͯ͠Δɻ
    ● 065݄ͷ೤͍࠭ʹຒΊͨΔΘ͕ࠢ
    ● ˡ8JUIBMJUUMFFEJUJOH JUCFDPNFTBIBJLV FH೤͖݄࠭ʹຒΊͨΔΘ͕ࠢ

    Response to Individual Tasks with Task Outline and Few-shot
    Example: Support for writing Haiku with 2-shots

    View full-size slide

  47. Extensive use of pre-trained large-scale

    View full-size slide

  48. - Product packages always contain a product summary, but in many cases, a
    advertising description is transcribed by a salesperson
    - In this demo, the parameters of HyperCLOVA Studio(Play-ground) are
    adjusted to generate an attractive description
    - From a food product title and a summary, HyperCLOVA Studio generate an
    attractive description to advertise it
    Document generation using Play-ground
    E.g. Product summary to the description

    View full-size slide

  49. Demo Movie 60sec

    View full-size slide

  50. ঎඼໊: ͬ͢ͺ ͔ʹͤΜ
    ֓ཁ: ○εφοΫ՛ࢠ ○ݪࡐྉ໊:
    খഴค(ࠃ಺੡଄)ɺ২෺༉ɺͰΜ
    คɺ͔ʹͷ਎ɺค຤ਣʢখഴΛؚ
    Ήʣɺ࠭౶ɺ৯Ԙɺ͔ʹύ΢μ
    ʔɺࠛ෍ύ΢μʔɺ࠭౶ / ๲ு
    ࡎɺ؁ຯྉ(ఱ૲) ○಺༰ྔ: 60g
    આ໌: ΄ΜͷΓࢎຯ
    ΄ΜͷΓࢎຯ͕ޮ͍ͨΧχ෩ຯͷ
    ͤΜ΂͍Ͱ͢ɻޱͷதʹೖΕΔͱ
    ϗϩϗϩͱ่Ε͍ͯ͘৯ײ͸ͨ·
    Γ·ͤΜɻ͓ञͷ͓ͭ·Έ΍͓΍
    ͭʹͲ͏ͧɻ
    Base Prefix Output

    ঎඼ͷ֓ཁ͔ΒࢥΘͣങ͍ͨ͘ͳΔઆ໌จΛੜ੒͢Δ
    ঎඼໊ۤʑ͍͠γϣίϥΧΧΦ
    ֓ཁ˓໊শνϣίϨʔτ˓ݪࡐྉ໊ΧΧΦϚε ࠃ಺੡଄
    ɺίίΞύ΢μʔɺ࠭౶ɺίίΞόλʔೕԽࡎɺ߳ྉ˓಺༰ྔH˓อ
    ଘํ๏ˆҎԼͷྫྷ҉ॴͰอଘ͍ͯͩ͘͠͞ɻ
    આ໌݈߁ͷ͜ͱ͚ͩΛߟ͑ͨ௒ΧΧΦϙϦϑΣϊʔϧɻѹ౗తͳڧ͍ۤຯͱඓΛ؏͘઱྽ͳ߳ΓͷԞʹ΄ͷ͔ͳ؁ຯΛײ͡ΒΕ·͢ɻ͠
    ͔΋Hͱେ༰ྔͰ๨೥ձͷേήʔϜʹ΋ϐολϦ
    One-shot
    => Increased the Temperature (randomness) and lowered the Repetition
    penalty (for control of repetition) to make the text contain the appeal

    View full-size slide

  51. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  52. Application development using Inference API
    Eco System
    Infra
    Model
    Data

    View full-size slide

  53. Subjective evaluation of a dialogue system using HyperCLOVA
    with 6.7B/13B/39B JP Model for 4 tasks
    Annotations were made for all task/model combinations
    Subjective evaluation by the same 5 annotators
    Each session is N round-trip conversational pairs
    The user receives a list of N topics for evaluation
    Each session consumes one vocabulary item from the
    list
    Conducted in Play-ground
    4. Free chatting
    3. Reacting to user sentiment on a topic
    2. Tracking different multiple topics
    1. Understanding of basic vocabulary

    View full-size slide

  54. Exception: Free chat tasks did not evaluate the achievement of the goal
    Natural response
    Q: Was it a natural reaction?
    Are there any breakdowns or inconsistencies in the history of
    the conversation?
    Following a topic
    Q: Did it stay on topic?
    Did it lose track of the topic (in this case, did it lose track of what it was being asked about?
    Was it able to switch topics (in this case, was it able to pull back to the previous question?
    Providing a topic or
    asking a question
    Q: Did it provide a topic?
    Was it able to get the speaker to talk during the answer (most
    likely not)?
    Achievement of goals Q: Did it achieve your objective?
    Common evaluation criteria for all tasks
    !

    View full-size slide

  55. 1. Understanding of basic vocabulary
    elementary vocabulary secondary vocabulary
    খֶߍ ۚͮͪ தֶߍ Ԗච
    େਓ νϡʔϦοϓ ઌੜ ώϚϫϦ
    ϥΠΦϯ ص ΩϦϯ Ҝࢠ
    ిं ۺ ं αϯμϧ
    ηʔλʔ ΓΜ͝ εΧʔτ Έ͔Μ
    Ωϟϕπ αϯϚ ͖Ύ͏Γ Ϛάϩ
    εζϝ ϋʔϞχΧ Πϯί ϐΞϊ
    τϯϘ ΞϦ
    IUUQTSFQPTJUPSZOJOKBMBDKQ BDUJPOSFQPTJUPSZ@BDUJPO@DPNNPO@EPXOMPBEJUFN@JEJUFN@OPBUUSJCVUF@JEpMF@OP
    ToDO: Ask the HyperCLOVA questions in the form of Level 1 and Level 2 for each vocabulary

    View full-size slide

  56. 1. Understanding of basic vocabulary

    View full-size slide

  57. 1. Understanding of basic vocabulary
    Does the system accurately answer word meanings (Level 1) and emotions (Level 2)?
    6.7B 13B 39B
    Natural response 0.55 0.66 0.98
    Following a topic 0.63 0.84 0.98
    Providing a topic or
    asking a question
    0.00 0.01 0.00
    Achievement of goals 0.55 0.68 0.84

    View full-size slide

  58. Topic A Topic B Topic A Topic B
    ৽ܕίϩφ΢Πϧε Πϯό΢ϯυ Πνϩʔ େ୩ᠳฏ
    ۓٸࣄଶએݴ ৽ܕίϩφϫΫνϯ AR(֦ுݱ࣮) ࣗಈӡసٕज़
    YouTuber VTuber ϨΦφϧυɾμɾϰΟϯν ΫϩʔυɾϞω
    ฏ੒ ྩ࿨ Πϯλʔωοτ 5G
    σϑϨܦࡁ ௒ߴྸԽࣾձ ւ֎ཱྀߦ ࠃ಺ཱྀߦ
    ిؾࣗಈं ϦχΞதԝ৽װઢ
    2. Tracking different multiple topics
    ToDO: Start a conversation about topic A and switch to topic B before 10 round trips
    ฏ੒

    View full-size slide

  59. Evaluation: Can the system move from topic A to topic B during a conversation?
    6.7B 13B 39B
    Natural response 0.66 0.53 0.91
    Following a topic 0.71 0.61 0.95
    Providing a topic or
    asking a question
    0.04 0.01 0.02
    Achievement of goals 0.66 0.55 0.91
    2. Tracking different multiple topics

    View full-size slide

  60. 3. Reacting to user sentiment on a topic
    Topic sentiment A sentiment B
    ͕Μ͹ͬͯཉ͍͠ ࣙΊͯཉ͍͠
    ৽ܕίϩφ΢Πϧε ؤுΖ͏ ෆ҆ͩ
    Πϯό΢ϯυ ໭ͬͯ͘Δ ໭Βͳ͍
    ৽ܕίϩφϫΫνϯ ଴ͱ͏ ͍ͭʹͳΔ
    YouTuber ΍Γ͍ͨ ΍Γͨ͘ͳ͍
    େ୩ᠳฏ ׆༂ͯ͠ཉ͍͠ ࡾৼͯ͠ཉ͍͠
    AR(֦ுݱ࣮) ໘ന͍ ๞͖ͨ
    ௒ߴྸԽࣾձ େৎ෉ ৺഑
    ւ֎ཱྀߦ ߦ͖͍ͨ ߦ͖ͨ͘ͳ͍
    ిؾࣗಈं ৐Γ͍ͨ ৐Γͨ͘ͳ͍
    ϦχΞதԝ৽װઢ ৐Γ͍ͨ ৐Γͨ͘ͳ͍
    ToDO: Have a 15 back and forth conversation about the Topic. Speak with the feeling of
    sentiment A at first, then sentiment B.

    View full-size slide

  61. Evaluation: Was the system able to agree with the user when he or she was feeling sentiment A about the topic?
    3. Reacting to user sentiment on a topic
    6.7B 13B 39B
    Natural response 0.69 0.45 0.90
    Following a topic 0.74 0.52 0.95
    Providing a topic or
    asking a question
    0.04 0.02 0.03
    Achievement of goals 0.68 0.46 0.90

    View full-size slide

  62. Evaluation When a user has sentiment B feelings about a topic, could the system disagree?
    3. Reacting to user sentiment on a topic
    6.7B 13B 39B
    Natural response 0.61 0.40 0.87
    Following a topic 0.67 0.45 0.93
    Providing a topic or
    asking a question
    0.09 0.02 0.03
    Achievement of goals 0.46 0.36 0.50

    View full-size slide

  63. 4. Free chatting

    View full-size slide

  64. Evaluation: Facilitate a free dialogue with the system
    4. Free chatting
    6.7B 13B 39B
    Natural response 0.65 0.40 0.92
    Following a topic 0.76 0.40 0.94
    Providing a topic or
    asking a question
    0.12 0.04 0.09
    Achievement of goals - - -

    View full-size slide

  65. Summary: Subjective Evaluation of 39B JP Model
    1. Understanding of
    basic vocabulary
    2. Tracking different
    multiple topics
    3. Reacting to
    positive sentiment
    on a topic
    3. Reacting to
    negative sentiment
    on a topic
    4. Free chatting
    Natural
    response
    0.978 0.908 0.908 0.872 0.925
    Following a
    topic
    0.984 0.952 0.951 0.930 0.935
    Providing a
    topic or asking
    a question
    0.003 0.023 0.033 0.035 0.086
    Achievement of
    goals
    0.835 0.907 0.899 0.505 -

    View full-size slide

  66. Summary: Subjective Evaluation of 39B JP Model
    1. Understanding of
    basic vocabulary
    2. Tracking different
    multiple topics
    3. Reacting to
    positive sentiment
    on a topic
    3. Reacting to
    negative sentiment
    on a topic
    4. Free chatting
    Natural
    response
    0.978 0.908 0.908 0.872 0.925
    Following a
    topic
    0.984 0.952 0.951 0.930 0.935
    Providing a
    topic or asking
    a question
    0.003 0.023 0.033 0.035 0.086
    Achievement
    of goals
    0.835 0.907 0.899 0.505 -

    View full-size slide

  67. The goal of LINE NLP team is
    to achieve high quality and safe output

    View full-size slide

  68. Difficulties with the Japanese language
    Difficult to learn
    Japanese speakers use
    - Hiragana
    - Katakana
    - Kanji
    - Romaji
    - e.t.c.
    to write a single document
    Large amount of
    essential vocabulary
    Required for daily
    conversation
    - over 8,000 words
    Need to know many of
    - Homonyms
    - Honorifics
    - Dialects
    Omission of words
    Japanese speakers may
    omit following words in a
    document
    - Subject
    - Object
    Omitted words may not
    be uniquely inferred

    View full-size slide

  69. Conducting joint research using HyperCLOVA
    Hope to collaborate with more research institutions and companies in the future
    Providing HyperCLOVA’s APIs to universities, research institutes,
    and companies
    Collaborating to dramatically improve system performance and
    detect and eliminate bias in language models with
    - Osaka University Graduate School
    - Tokyo Metropolitan University
    - Waseda University

    View full-size slide

  70. Difficulties in text generation
    Potential risks of
    generated text
    The following
    technologies need to be
    developed
    - Improving the content bias
    of a corpus and its notation
    - Ensuring the truthfulness
    and security of a output text
    Implementation of
    AI Ethics
    Various ethical considerations
    need to be taken into account
    for input and output texts
    - Toxicity
    - Sexual
    - Offensive
    - Profane
    - Intimidating
    - Attack on identity
    Automation of
    intrinsic evaluation
    Need metrics that can
    be applied to dynamic
    text generation results
    - Accuracy of topical content
    - Consistency of generated text
    - Determination of achievement
    of objectives

    View full-size slide

  71. Eco System
    Infra
    Model
    Data
    Automatic evaluation for 39B JP model

    View full-size slide

  72. Automatic evaluation with 39B JP Model for a QA task
    Few-shots were created by randomly extracting a context from the RCQA possible only
    dev-set for each inference
    Create Few-shot with a context, a question text and an answer
    - If correct answer is contained and easily extracted from inference result, we judged it is correct
    TASK: RCQA* possible only
    - Removed unanswerable questions from dataset of the normal RCQA task
    * ղ౴Մೳੑ෇͖ಡղσʔληοτ: http://www.cl.ecei.tohoku.ac.jp/rcqa/

    View full-size slide

  73. Result of automatic evaluation with 39B JP Model
    for RCQA possible only task
    model / few-shot shot temperature top_p answer match
    6.7B / contextual 0 0.5 0.8 -
    4 0.1 0.9 66.52
    13B / contextual 0 0.5 0.8 -
    4 0.4 0.1 70.28
    39B / contextual 0 0.4 0.5 80.51
    1 0.4 0.5 89.18
    2 0.4 0.5 89.31
    3 0.4 0.5 89.09
    4 0.4 0.5 89.83
    39B / non-contextual 0 0.4 0.5 69.50
    1 0.4 0.5 76.97
    2 0.4 0.5 79.08
    3 0.4 0.5 79.38
    4 0.4 0.5 80.51

    View full-size slide

  74. HyperCLOVA’s LM vs BERT-large
    TASK: RCQA possible only (Removed unanswerable questions from the normal RCQA task)
    - It is possible that BERT can achieve higher results with fine-tuning on specific tasks
    - HyperCLOVA can achieve the same level of performance with Prompting and rough parameter search
    test acc test F1 memo
    HyperCLOVA 85.03 89.95 JP 39B 2-shots,
    temperature=0.4, top_p= 0.5
    BERT-jp-large 86.68 90.49 Using subset of
    LINE LM corpus

    View full-size slide

  75. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  76. Application example: HyperCLOVA Friends
    Talk with any adjustable character using HyperCLOVA

    View full-size slide

  77. Demo Movie 60sec

    View full-size slide

  78. Application example: HyperCLOVA Friends
    Talk with any adjustable character using HyperCLOVA
    HyperCLOVA allows for some role-playing

    View full-size slide

  79. HyperCLOVA allows for generic role-playing
    Challenge: Features other than smooth conversation and topic tracking
    The truth of what it
    says should be verified
    before it responds
    Conversation is smooth
    and the meaning of what
    is said is understood
    Some ambiguous
    responses
    e.g. Temperature of hot
    water during washing
    Effect of data bias
    e.g. unsettled but
    became female
    The consistency of its
    persona is a bit suspect
    Start with
    NO character set

    View full-size slide

  80. Is HyperCLOVA really necessary for NLP?
    - YES !!
    - If you're on a budget …
    - The history of NLP is strongly linked to the
    development of AI-related technologies
    - LINE wants to move in the direction of building
    our own models and having customers use them
    Large-scale
    general-purpose
    LMs
    DNN
    Traditional only
    ML
    Rule only
    Small LM only

    View full-size slide

  81. Agenda
    - What’s HyperCLOVA
    - Inside of HyperCLOVA
    - Application development by Prompting
    - Evaluation of HyperCLOVA’s JP LMs
    - Application to Dialogue Systems
    - The future of LINE and NLP

    View full-size slide

  82. The future of LINE and NLP

    View full-size slide

  83. LINE was released to the public 10 years
    FAQ from NLPer: Isn’t there a challenge left to tackle?

    View full-size slide

  84. FAQ from NLPer: Isn’t there a challenge left to tackle?
    No, not yet !! It's not over yet !!

    View full-size slide

  85. Various issues related to HyperCLOVA
    ● Building models and using those models to make inferences - the biggest challenge of all
    ● Fine-tuning and other parameter-efficient transfer learning methods, as well as compact models
    ● Responding to new topics/events that have arisen since a model was built
    ● Implementing AI Ethics
    ● Filtering according to the application and specifying the reason
    ● Building a Web corpus
    ● Removing duplicate data
    ● Realization of accountability for each entry used
    ● Responding to deletion requests on a URL/ID basis
    ● Detection and anonymization of personal information

    View full-size slide

  86. LINE has more than 50 service brands

    View full-size slide

  87. LINE’s NLP journey is still in its early stages
    Let's challenge together at LINE
    LINE's various services needs essential
    improvements using NLP technology !
    • Large-scale general-purpose LMs
    • “High Road” NLP
    • Information Retrieval
    • String processing
    • Data creation
    • Evaluation tasks, e.t.c.

    View full-size slide

  88. HyperCLOVA Hands-on

    View full-size slide

  89. HyperCLOVA Hands-on to be held during 2021
    Hands-on with HyperCLOVA Studio and APIs for engineers
    Please wait for informations from LINE Developers ( @LINE_DEV)
    A Python SDK to using HyperCLOVA API will be provided

    View full-size slide

  90. Open & Share

    View full-size slide

  91. LINE’s LMs for OSS start in FY2021
    Of course, for models “other than” HyperCLOVA
    Performance target: LINE's LMs for OSS > other OSS LMs
    Would like to update a few times a year, if possible !!
    Train using a subset of the corpus for HyperCLOVA(LINE LM Corpus)
    !

    View full-size slide

  92. Summary
    - Updated the current status of HyperCLOVA in LINE
    - Reported on large-scale general-purpose LMs and
    Prompting, using several topics as examples
    - There are cases where surprisingly high quality can be achieved
    - There are issues that cannot be solved ad hoc
    - On LINE, we can work on all layers of NLP R&D for not
    only HyperCLOVA
    - Please stay tuned for a next NLP information by LINE

    View full-size slide