Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LLMの評価-近藤憲児

 LLMの評価-近藤憲児

2023/05/11【第1回】ChatGPT活用LT会 by LLM福岡

Kenji KONDO

May 13, 2023
Tweet

More Decks by Kenji KONDO

Other Decks in Technology

Transcript

  1. גࣜձࣾεΧΠσΟεΫ r  ۙ౻ݑࣇ ͜ΜͲ͏͚Μ͡ • εΧΠσΟεΫͷ"*Τϯδϯ։ൃνʔϜϚ ωʔδϟʔ • ੜ࢈ܭըΛ

    "* ͷྗͰཱҊ͢ΔαʔϏε ࠷దϫʔΫε Λ࡞ͬͯ·͢ɻ • 6EFNZʹίʔεΛग़ͯ͠·͢ • 5XJUUFS ΍ͬͯ·͢ 👉 !LPOEPLFOKJCBJ
  2. גࣜձࣾεΧΠσΟεΫ r  ࠷ద2"͘Μͷ֓ཁ • ࠷దϫʔΫεͷϚχϡΞ ϧʹؔ͢Δ࣭໰ʹ౴͑ͯ ͘ΕΔ 4MBDL#PU •

    ϲ݄લʹ࡞ͬͯɺࣾ಺Έ ΜͳͰར༻Ͱ͖Δঢ়ଶʹ ͨ͠ɻ 1P$ ͷ໨త • ࠷ऴతʹ͸͓٬༷ʹ΋࢖ ͑Δঢ়ଶʹঢ՚͍ͤͨ͞ɻ ͦͷͨΊͷݕূɻ • ·ͨ --.ͷ׆༻ઌͷΠ ϝʔδΛɺࣾһશһ͕๲ Β·ͤΒΕΔΑ͏ʹ͢Δ ͨΊɻ 質問 最適QAくん (最適ワークスの マニュアルを熟知 した Slack Bot) 回答
  3. גࣜձࣾεΧΠσΟεΫ r  ྫ͑͹ WFDUPSTFBSDIΛ IZCSJETFBSDIʹม͑ΔͳͲ͠ ͯɺ )BMMVDJOBUJPO Λ๷͙౒ྗ Λͨ͠ͱ͢Δɻ

    • ͦΕʹΑͬͯʮຊ౰ʹվળ͞ Εͨʯͱ͍͏ͷΛɺͲͷΑ͏ ʹଞਓʹઆ໌͢Δʁ • Ξ΢τϓοτͷྑ͠ѱ͠͸Ͳ ͷΑ͏ʹͯ͠٬؍తʹܭଌͰ ͖Δʁ • ͞ΒʹͦΕΛࣗಈͰ΍Δͱ͠ ͨΒͲͷΑ͏ʹͰ͖Δʁ ධՁʹͩ͜ΘΓͨ͘ͳΔཧ༝
  4. גࣜձࣾεΧΠσΟεΫ r  ྫ͑͹ɺ࠷ۙಡΜͩຊͷྫ 数字のセンスを磨く〜データの読み⽅・活かし⽅ (光⽂社新書 1241) 新書 – 2023/2/15

    筒井 淳也 (著) ʮ݁ࠗͨ͠ޙʹ਌ͱಉډ͠Α͏ͱ͢ Δ෉්͕ݮ͍ͬͯΔʯ ͱ͍͏͜ͱΛ਺ࣈͰࣔ͢ʹ͸ɺͲͷΑ ͏ͳൺֱΛߦ͑͹Α͍͔ʁ ˠ ؆୯Ͱ͋Ζ͏ɻੈ୅ͰॅΜͰ͍Δ Ո଒ ௚ܥՈ଒ ͷׂ߹ͷਪҠΛൺֱ͠ ͯΈΕ͹Α͍͡Όͳ͍͔ɻ ˠ ͍΍ɺͦ΋ͦ΋݁ࠗ͠ͳ͍ਓ΋૿ ͍͑ͯΔͷ͔ͩΒɺͦΕ͕ཧ༝Ͱݮͬ ͍ͯΔ͔΋͠Εͳ͍ɻͦΕΛߟྀ͠ͳ ͍ͱϑΣΞͳൺֱͰ͸ͳ͍ɻ
  5. גࣜձࣾεΧΠσΟεΫ r  ྫ͑͹ɺ࠷ۙಡΜͩຊͷྫ ˠ Ͱ͸ɺʮ݁ࠗͯ͠਌ͱಉډ͢Δ ͔Ͳ͏͔ͷҙࢥ͕͋Δਓͷͳ͔Ͱɺ ಉډͨ͠ਓʯͷׂ߹ΛΈΕ͹Α͍ɻ ˠ ͦΕ͸ྑͦ͞͏ɻͨͩɺܑఋ͕

    ͨ͘͞Μ͍Δ৔߹͸ɺͦͷ͏ͪͷ Ұਓ͔͠਌ͱಉډ͠ͳ͍ͷ͕ৗͳ ͷ͔ͩΒɺͦ͜Λߟྀ͠ͳ͍ͷ͸ ϑΣΞͰ͸ͳ͍ɻ ˠ ΋ͬͱݴ͑͹ɺ෉්ͷ͏ͪͲͪ Β͔ͷ਌ͱಉډ͢Δબ୒ͳͷ͔ͩ Βɺ͜ͷબ୒͸Λ௒͑Δ͜ͱ ͸ͳ͍ ˠ ΋ͬͱݴ͑͹ɺʮ݁ࠗޙ͸͠͹ Β͘ผډ͕ͩɺߴྸʹͳͬͯ਌ͱ ಉډ͢ΔʯਓΛɺͲͷΑ͏ʹΧ΢ ϯτ͢Δ͔ͰɺϑΣΞ͕͞มΘͬ ͯ͘Δɻ ˠ ΋ͬͱݴ͑͹ɺ཭ࠗͨ͠ਓΛͲ ͏ߟ͑Δ͔΋ߟྀ͠ͳ͚Ε͹ͳΒ ͳ͍ɻʮ཭ࠗͯ͠ࢠڙͱҰॹʹ࣮ Ոʹؼͬͨਓʯ͸ʮ݁ࠗͨ͠ޙʹ ਌ͱಉډ͠Α͏ͱ͢Δ෉්͕ݮͬ ͍ͯΔʯͱ͍͏࠷ॳͷ໰୊ҙࣝʹ ֘౰͢Δਓ͔ʁ ˠ ͓ෲ͍ͬͺ͍ʂ
  6. גࣜձࣾεΧΠσΟεΫ r  ͞Βʹݴ͑͹ɺʮ࠷௿௞ۚ๏͸ࣦۀʹͲͷΑ͏ͳӨڹΛ༩͑Δ ͔ʁʯ΁ͷճ౴͕ " ʮ࠷௿௞͕ۚߴͯ͘΋ޏ༻ΛݮΒ͞ͳ͍ʯ Ͱ͸ͳ͘ # ʮ࠷௿௞ۚ๏͸ޏ༻Λ࣮࣭తʹݮΒ͢Θ͚Ͱ͸ͳ͍Α͏ʹݟ

    ͑·͕͢ɺ͜ͷࠜڌʹ൓ର͢Δਓ΋ଟ਺ଘࡏ͠·͢ʯ ͩͬͨ৔߹ɻ " ͸ࣗ෼ʹࣗ৴͍࣋ͪ͗ͯ͢ΔͷͰͩΊ # ͸ࣗ৴ͷແ͞Λૉ௚ʹද໌͍ͯ͠ΔͷͰྑͦ͞͏ɻ ˠ # ͷճ౴Λྑ͍΋ͷͱͯ͠ධՁ͍ͯ͋͛ͨ͠ؾ࣋ͪʹͳΔɻ
  7. גࣜձࣾεΧΠσΟεΫ r  ͡Ό͋ɺ͖ͬ͞ͷ͕͍͍ͱͯ͠ɺ --. ͷΞ΢τϓοτ͕ ʮࢲ "* ʹ͸஍ٿ͕ฏ໘ͩͱࢥ͍ͬͯ·͕͢ɺੈքதͷ΄ͱ ΜͲͷਓʑɺՊֶऀΛؚΊͯ൓ରҙݟ͕͋Γ·͢ɻʯ

    ͩͬͨΒɺ͜Ε͸ຊ౰ʹྑ͍ճ౴ʁ஍ٿ͕ฏ໘ͩͱ৴͍ͯ͡Δ "* ͷग़ྗ͢Δճ౴Λɺ͍͘Βݠڏ͔ͩΒͱݴͬͯ΋๻Β͸ڐ༰ Ͱ͖Δ͔ʁ ·ͨɺ͜͏͍͏෩ʹࣗ৴ͷແ͞Λ͍͍ͪͪ఻͑Δ͜ͱ͸΄Μͱ ʹྑ͍ճ౴ͩΖ͏͔ʁ༧๷ઢΛషΓ·ͬͨ͘Ξ΢τϓοτ͹͔ ΓΛग़͢͜ͱ͸ɺຊ౰ʹΑ͍͜ͱ͔ʁ௕ͨΒ͍͠αʔϏεར༻ ن໿ͷΑ͏ͳճ౴Λ͢Δͷ͸ɺ๻Β͕ٻΊͯΔ "* ͔ʁ
  8. גࣜձࣾεΧΠσΟεΫ r  2"$IBJO • -BOH$IBJO Λ ࢖ ͬ ͨ

    ධ Ձ ͷ ํ๏ • --. ʹධՁ͞ ͤ Δ ɺ ͱ ͍ ͏ ΞΠσΟΞ • ˠ ධՁͷࣗಈ Խ͕Մೳʹ
  9. גࣜձࣾεΧΠσΟεΫ r  You are a teacher grading a quiz.

    You are given a question, the student's answer, and the true answer, and are asked to score the student answer as either CORRECT or INCORRECT. (…) Grade the student answers based ONLY on their factual accuracy. Ignore differences in punctuation and phrasing between the student answer and true answer. It is OK if the student answer contains more information than the true answer, as long as it does not contain any conflicting statements. Begin! QUESTION: 1+1 STUDENT ANSWER: 3 TRUE ANSWER: 2 GRADE: ઌఔͷίʔυΛ࣮ߦͨ͠ͱ͖ʹ࣮ࡍʹཪଆͰ࣮ߦ͞ΕΔ QSPNQU
  10. גࣜձࣾεΧΠσΟεΫ r  • ʮ--. Ͱ --. ΛධՁ͢ΔͬͯͲ͏ͳͷʁʯͱ΋ࢥ͕ͬͨɺ ࣮ࡍɺ (15

    ΋ (15 ࣗ਎Λ࢖ͬͯ )BMMVDJOBUJPO ͷධՁ Λͨ͠Β͍͠ɻ • ͜ͷ͋ͨΓ͕ɺલஈͰʮ׬ᘳͳධՁΛٻΊͳ͍ʯͱ͍͏͜ͱΛ͘Ͳ͘ Ͳͱઆ໌ͨ͠ཧ༝ GPT-4 System Card, OpenAI, March 23, 2023 ੈͷதతʹ͸࣮ࡍͲ͏΍ͬͯΔ͔ɻ
  11. גࣜձࣾεΧΠσΟεΫ r  3FGFSFODFT • -BOH$IBJO ͷ &WBMVBUJPO IUUQTQZUIPOMBOHDIBJODPNFOMBUFTUVTF@DBTFTFWBMVBUJPO IUNM

    • 5SVUIGVM "*IUUQTBSYJWPSHBCT • ͪΌΜͱ )BMMVDJOBUJPO ΍ӕͷఆٛΛߟ͑ΔͳΒ͹·ͣ͜ΕΛಡΉͷ͕͍ ͍ɻ • (154ZTUFN$BSEIUUQTDEOPQFOBJDPNQBQFSTHQU TZTUFNDBSEQEG • PQFOBJFWBMTIUUQTHJUIVCDPNPQFOBJFWBMT • -BOH$IBJO%BUBTFUT IUUQTIVHHJOHGBDFDP-BOH$IBJO%BUBTFUT