Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FunctionChat-Bench: 도구 연결 대화에서의 언어 모델 생성 능력 평가

kakao
November 01, 2024

FunctionChat-Bench: 도구 연결 대화에서의 언어 모델 생성 능력 평가

#Kanana #Function call #Tool-use dialog #Evaluation method

Function Call은 언어모델과 외부 도구를 연결해 실시간 정보를 이용하거나 실세계 동작을 수행하도록 하는 기술입니다.
기술 개발과 성능 향상을 위해서는 항상 좋은 측정 방법이 필요한데요.
외부 도구를 사용하는 한국어 대화 환경에서의 Function Call 성능을 다면적으로 평가할 수 있도록 개발한 FunctionChat-Bench를 소개합니다.

발표자 : niki.y
언어모델의 학습과 평가를 위해 언어데이터를 다루고 있는 링기스트 니키입니다.

kakao

November 01, 2024
Tweet

More Decks by kakao

Other Decks in Programming

Transcript

  1.  IUUQT PQFOBJDPNJOEFYGVODUJPO - DBMMJOH - BOE - PUIFS -

    BQJ - VQEBUFT] IUUQT BJHPPHMFEFWHFNJOJ - BQJEPDTGVODUJPO - DBMMJOH] IUUQT HJUIVCDPN.FFU,BJGVODUJPOBSZ 'VODUJPO$BMM૑ਗ੄ഛ੢   
  2. { "name": "get_movie_details", "description": "хة, ੢ܰ, ಣ੼ ١ ৔ച੄ ࢚ࣁ

    ੿ࠁܳ ઁҕ೤פ׮.", "parameters": { "type": "object", "properties": { "title": { "type": "string", "description": "৔ച ઁݾ" }, "year": { "type": "integer", "description": "৔ച ѐࠉ োب" } }, "required": [ "title" ] } } { "name": "get_movie_details", "arguments": "{\"title\": \"بقٜ\", \"year\": 2012}" } ೣࣻഐ୹ݫद૑ࢤࢿ੄੿ഛࢿ ֙ীѐࠉೠ৔ചبقٜ ੿ࠁખঌ۰઴ې
  3. { "name": "get_movie_details", "arguments": "{\"title\": \"بقٜ\", \"year\": 2012}" } ৔࢚ࢸݺఫझ౟ӝੑ

    ೣࣻഐ୹ݫद૑ࢤࢿ੄੿ഛࢿ { "name": "get_movie_details", "description": "хة, ੢ܰ, ಣ੼ ١ ৔ച੄ ࢚ࣁ ੿ࠁܳ ઁҕ೤פ׮.", "parameters": { "type": "object", "properties": { "title": { "type": "string", "description": "৔ച ઁݾ" }, "year": { "type": "integer", "description": "৔ച ѐࠉ োب" } }, "required": [ "title" ] } } ੸੺ೠೣࣻܳࢶఖ೮חо  ೣࣻ੉ܴਸ੿ഛೞѱ୹۱೮חо   ౵ۄ޷ఠী೧׼ೞח੿ࠁܳ੸੺൤୶୹೮חо ౵ۄ޷ఠ੄пчਸ੿੄ػನݘਵ۽ࢤࢿ೮חо
  4. { "type": "function", "function": { "name": "addMemo", "description": “ݫݽ੢ী ࢜۽਍

    ݫݽܳ ୶о", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "ݫݽ ղਊ" } }, "required": [ "content" ] } } } { "name": "addMemo", "arguments": "{\"content\": “\஠஠য় ѐߊ੗ ஶಌ۠झ 2024.10.22-24\"}" } ಽҊ੗ೠޙઁ ஠஠য়ѐߊ੗ஶಌ۠झ ੉Ѣݫݽ੢ী੷੢೧઻
  5. { "type": "function", "function": { "name": "addMemo", "description": “ݫݽ੢ী ࢜۽਍

    ݫݽܳ ୶о", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "ݫݽ ղਊ" } }, "required": [ "content" ] } } } { "name": "addMemo", "arguments": "{\"content\": “\஠஠য় ѐߊ੗ ஶಌ۠झ 2024.10.22-24\"}" } ಽҊ੗ೠޙઁ ஠஠য়ѐߊ੗ஶಌ۠झ ੉Ѣݫݽ੢ী੷੢೧઻
  6. { "type": "function", "function": { "name": "addMemo", "description": “ݫݽ੢ী ࢜۽਍

    ݫݽܳ ୶о", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "ݫݽ ղਊ" } }, "required": [ "content" ] } } } ಽҊ੗ೠޙઁ ஠஠য়ѐߊ੗ஶಌ۠झ ੉Ѣݫݽ੢ী੷੢೧઻ ݫݽ੷੢ӝמ਷হणפ׮
  7. { "type": "function", "function": { "name": "addMemo", "description": “ݫݽ੢ী ࢜۽਍

    ݫݽܳ ୶о", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "ݫݽ ղਊ" } }, "required": [ "content" ] } } } ಽҊ੗ೠޙઁ ஠஠য়ѐߊ੗ஶಌ۠झ ੉Ѣݫݽ੢ী੷੢೧઻
  8. { "type": "function", "function": { "name": "addMemo", "description": “ݫݽ੢ী ࢜۽਍

    ݫݽܳ ୶о", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "ݫݽ ղਊ" } }, "required": [ "content" ] } } } ಽҊ੗ೠޙઁ ݫݽ੢ীݫݽೞա੷੢೧઻
  9. { "type": "function", "function": { "name": "addMemo", "description": “ݫݽ੢ী ࢜۽਍

    ݫݽܳ ୶о", "parameters": { "type": "object", "properties": { "content": { "type": "string", "description": "ݫݽ ղਊ" } }, "required": [ "content" ] } } } ಽҊ੗ೠޙઁ ݫݽ੢ীݫݽೞա੷੢೧઻ ޤۄҊ੷੢ೡөਃ ݫݽղਊਸঌ۰઱ࣁਃ
  10. IUUQT HJUIVCDPNLBLBP'VODUJPO$IBU - #FODI USER USER I walked from Hangeul

    Museum today. How long did I walk? USER (Slot Question) AI {"origin": "Hangeul Museum", "destination": "Hannam police precinct", "distance": 3.3, "distance_unit": "km", "time": {"hours": null, "minutes": 52}} TOOL (Answer Completion) The walking distance from Hangeul Museum to Hannam police precinct is approximately 3.3km and takes approximately 52 minutes. AI (Relevance Detection) That sounds like a refreshing walk! It's great to hear that. AI (Relevance Detection) I'm sorry, I can't provide weather forecasts. AI USER I walked to the Hannam police precinct. I walked quite a bit. The weather was nice and I felt really good. What's the weather like tomorrow? I'll try walking tomorrow too. Please tell me the weather forecast. (Tool Call) AI "function": {"name": "getWalkInfo", "arguments": "{\"origin\": \"Hangeul Museum\", \"destination\": \"Hannam police precinct\"}"}} Please let me know your destination to calculate walking distance and walking time.
  11. IUUQT HJUIVCDPNLBLBP'VODUJPO$IBU - #FODI ೣࣻഐ୹ݫद૑ࢤࢿ USER USER I walked from

    Hangeul Museum today. How long did I walk? USER (Slot Question) AI {"origin": "Hangeul Museum", "destination": "Hannam police precinct", "distance": 3.3, "distance_unit": "km", "time": {"hours": null, "minutes": 52}} TOOL (Answer Completion) The walking distance from Hangeul Museum to Hannam police precinct is approximately 3.3km and takes approximately 52 minutes. AI (Relevance Detection) That sounds like a refreshing walk! It's great to hear that. AI (Relevance Detection) I'm sorry, I can't provide weather forecasts. AI USER I walked to the Hannam police precinct. I walked quite a bit. The weather was nice and I felt really good. What's the weather like tomorrow? I'll try walking tomorrow too. Please tell me the weather forecast. (Tool Call) AI "function": {"name": "getWalkInfo", "arguments": "{\"origin\": \"Hangeul Museum\", \"destination\": \"Hannam police precinct\"}"}} Please let me know your destination to calculate walking distance and walking time.
  12. IUUQT HJUIVCDPNLBLBP'VODUJPO$IBU - #FODI ؀ചݫद૑ࢤࢿ USER USER I walked from

    Hangeul Museum today. How long did I walk? USER (Slot Question) AI {"origin": "Hangeul Museum", "destination": "Hannam police precinct", "distance": 3.3, "distance_unit": "km", "time": {"hours": null, "minutes": 52}} TOOL (Answer Completion) The walking distance from Hangeul Museum to Hannam police precinct is approximately 3.3km and takes approximately 52 minutes. AI (Relevance Detection) That sounds like a refreshing walk! It's great to hear that. AI (Relevance Detection) I'm sorry, I can't provide weather forecasts. AI USER I walked to the Hannam police precinct. I walked quite a bit. The weather was nice and I felt really good. What's the weather like tomorrow? I'll try walking tomorrow too. Please tell me the weather forecast. (Tool Call) AI "function": {"name": "getWalkInfo", "arguments": "{\"origin\": \"Hangeul Museum\", \"destination\": \"Hannam police precinct\"}"}} Please let me know your destination to calculate walking distance and walking time.
  13. IUUQT HJUIVCDPNLBLBP'VODUJPO$IBU - #FODI ؀ചݫद૑ࢤࢿ USER USER I walked from

    Hangeul Museum today. How long did I walk? USER (Slot Question) AI {"origin": "Hangeul Museum", "destination": "Hannam police precinct", "distance": 3.3, "distance_unit": "km", "time": {"hours": null, "minutes": 52}} TOOL (Answer Completion) The walking distance from Hangeul Museum to Hannam police precinct is approximately 3.3km and takes approximately 52 minutes. AI (Relevance Detection) That sounds like a refreshing walk! It's great to hear that. AI (Relevance Detection) I'm sorry, I can't provide weather forecasts. AI USER I walked to the Hannam police precinct. I walked quite a bit. The weather was nice and I felt really good. What's the weather like tomorrow? I'll try walking tomorrow too. Please tell me the weather forecast. (Tool Call) AI "function": {"name": "getWalkInfo", "arguments": "{\"origin\": \"Hangeul Museum\", \"destination\": \"Hannam police precinct\"}"}} Please let me know your destination to calculate walking distance and walking time. 4MPU2VFTUJPO ೣࣻഐ୹ਸਤ೧ԙ೙ਃೠ ੿ࠁо־ۅغ঻ਸٸ  ࠗ઒ೠ੿ࠁܳ঳ӝਤ೧ ࢎਊ੗ীѱ૕ޙೞחݫद૑
  14. IUUQT HJUIVCDPNLBLBP'VODUJPO$IBU - #FODI ؀ചݫद૑ࢤࢿ USER USER I walked from

    Hangeul Museum today. How long did I walk? USER (Slot Question) AI {"origin": "Hangeul Museum", "destination": "Hannam police precinct", "distance": 3.3, "distance_unit": "km", "time": {"hours": null, "minutes": 52}} TOOL (Answer Completion) The walking distance from Hangeul Museum to Hannam police precinct is approximately 3.3km and takes approximately 52 minutes. AI (Relevance Detection) That sounds like a refreshing walk! It's great to hear that. AI (Relevance Detection) I'm sorry, I can't provide weather forecasts. AI USER I walked to the Hannam police precinct. I walked quite a bit. The weather was nice and I felt really good. What's the weather like tomorrow? I'll try walking tomorrow too. Please tell me the weather forecast. (Tool Call) AI "function": {"name": "getWalkInfo", "arguments": "{\"origin\": \"Hangeul Museum\", \"destination\": \"Hannam police precinct\"}"}} Please let me know your destination to calculate walking distance and walking time. "OTXFS $PNQMFUJPO ౠ੿ೣࣻܳഐ୹ೠѾҗܳ ࢎਊ੗ীѱ੹׳ೞחݫद૑
  15. IUUQT HJUIVCDPNLBLBP'VODUJPO$IBU - #FODI ؀ചݫद૑ࢤࢿ USER USER I walked from

    Hangeul Museum today. How long did I walk? USER (Slot Question) AI {"origin": "Hangeul Museum", "destination": "Hannam police precinct", "distance": 3.3, "distance_unit": "km", "time": {"hours": null, "minutes": 52}} TOOL (Answer Completion) The walking distance from Hangeul Museum to Hannam police precinct is approximately 3.3km and takes approximately 52 minutes. AI (Relevance Detection) That sounds like a refreshing walk! It's great to hear that. AI (Relevance Detection) I'm sorry, I can't provide weather forecasts. AI USER I walked to the Hannam police precinct. I walked quite a bit. The weather was nice and I felt really good. What's the weather like tomorrow? I'll try walking tomorrow too. Please tell me the weather forecast. (Tool Call) AI "function": {"name": "getWalkInfo", "arguments": "{\"origin\": \"Hangeul Museum\", \"destination\": \"Hannam police precinct\"}"}} Please let me know your destination to calculate walking distance and walking time. 3FMFWBODF %FUFDUJPO بҳࢎਊҗҙ۲੉হח ੑ۱ী؀ೠ਽׹ݫद૑  ഐ୹ೡࣻহחೣࣻ৬ҙ۲ ػਃ୒ী؀ೠѢ੺ݫद૑
  16. 'VODUJPO$IBU - #FODIEBUB 4JOHMFDBMM %JBMPH ಣо೦ݾࣻ   ੋುਬഋ ױੌఢߊച

    ׮઺ఢ؀ച ইਓುਬഋ 5PPM$BMMPOMZ 5PPM$BMM  "OTXFS$PNQMFUJPO  4MPU2VFTUJPO  3FMFWBODF%FUFDUJPO  ೣࣻݾ۾ ࣻ۝җਬࢎب۽Ӓܛച FYBDU  SBOEPN  DMPTF  SBOEPN  DMPTF  ࣻ۝_ߧਤ ਬࢎبӏ஗হ਺
  17. ಣо৘द  4JOHMFDBMM FYBDU SBOEPN DMPTF SBOEPN DMPTF "7( HQUP

          HQUUVSCP       HQUUVSCP        %JBMPH 5PPM $BMM "OTXFS $PNQMFUJPO 4MPU 2VFTUJPO 3FMFWBODF %FUFDUJPO "7( HQUP      HQUUVSCP      HQUUVSCP     
  18. ಣо৘द  4JOHMFDBMM FYBDU SBOEPN DMPTF SBOEPN DMPTF "7( HQUP

          HQUUVSCP       HQUUVSCP        4JOHMFDBMM FYBDU SBOEPN DMPTF SBOEPN DMPTF "7( HFNJOJQSP - QSFWJFX       HFNJOJ fl BTI - QSFWJFX       HFNJOJQSP      
  19. ಣо৘द   %JBMPH 5PPM $BMM "OTXFS $PNQMFUJPO 4MPU 2VFTUJPO

    3FMFWBODF %FUFDUJPO "7( HQUP      HQUUVSCP      HQUUVSCP      %JBMPH 5PPM $BMM "OTXFS $PNQMFUJPO 4MPU 2VFTUJPO 3FMFWBODF %FUFDUJPO "7( HFNJOJQSP - QSFWJFX      HFNJOJ fl BTI - QSFWJFX      HFNJOJQSP     
  20. &SSPSTJO5PPM$BMM *OQVU VTFSߣߑ੄ࢶޛ਷ޖटղਊ੉ঠ (SPVOE 5SVUI \OBNFHFU@NPWJF@EFUBJMT  BSHVNFOUT\=UJUMF==ߣߑ੄ࢶޛ=^^ .PEFM 0VUQVU

    \OBNFtHJGU@TFBSDI@QSPEVDU  BSHVNFOUT\=RVFSZ==ߣߑ੄ࢶޛ=^^ *OQVU VTFS׮ࢽݺ੉ࢶޛч࢖݅சୌਗਸաׇյԄؘੌੋ ׼঴݃ղݶغח૑ঌ۰઻ (SPVOE 5SVUI \OBNFDBMDVMBUF@CJMM@TQMJU  BSHVNFOUT\=CJMM@UPUBM=  =OVN@QFPQMF=^^ .PEFM 0VUQVU \OBNFDBMDVMBUF@CJMM@TQMJU  BSHVNFOUT\=CJMM@UPUBM=  =OVN@QFPQMF=^^
  21. *OQVU VTFS੢ࠁӝݾ۾ݫݽ୶о೧઻ BTTJTUBOUঌѷणפ׮ݫݽղਊਸঌ۰઱ࣁਃ VTFS઱ߑࣁઁ ࣻ࠙௼ܿ ਃѢ౟ ୒ನب BTTJTUBOU\OBNFBEE.FNP  BSHVNFOUT

    \=UJUMF==੢ࠁӝݾ۾= =DPOUFOU==઱ ߑࣁઁ ࣻ࠙௼ܿ ਃѢ౟ ୒ನب=^^ UPPM\OBNFBEE.FNP DPOUFOU \=TUBUVT==TVDDFTT=^^ (SPVOE 5SVUI ݫݽܳ୶о೮णפ׮ .PEFM 0VUQVU ঌѷणפ׮஫ܽ؊ী୶оೡডࣘ੄ઁݾҗղਊਸঌ ۰઱ࣁਃ *OQVU VTFS૑Ә౸Үզॿযٸ BTTJTUBOU\OBNFJOGPSN8FBUIFS  BSHVNFOUT \=MPDBUJPO==౸Ү=^^ UPPM\OBNFJOGPSN8FBUIFS DPOUFOU \=UFNQFSBUVSF= XFBUIFS%FTDSJQUJPO=ҳܴ݆਺=^^ (SPVOE 5SVUI അ੤౸Ү੄ӝৡ਷ب੉Ҋ ҳܴ੉݆णפ׮ .PEFM 0VUQVU ૑Ә౸Үחب੉Ҋ݉਷զॿীਃ &SSPSTJO"OTXFS$PNQMFUJPO
  22. *OQVU VTFSӝୡ؀ࢎਯ੉ޤঠ рױ൤ࢸݺ೧઻ BTTJTUBOUӝୡ؀ࢎਯ #.3 ਷ইޖ۠ഝزਸೞ૑ ঋਸٸ ૊৮੹ೠോध࢚కীࢲ਋ܻރ੉ࢤݺਬ૑৬ ӝࠄन୓ӝמਸਤ೧दрزউࣗݽೞח୭ࣗೠ੄ ীց૑۝ਸ݈೤פ׮

    VTFSղӝୡ؀ࢎਯ੉ҾӘ೧ (SPVOE 5SVUI ୓઺җః ա੉ ࢿ߹ਸঌ۰઱दݶӝୡ؀ࢎਯਸ҅࢑ ೡࣻ੓णפ׮ .PEFM 0VUQVU \OBNFDBMDVMBUF#.3 BSHVNFOUT \=XFJHIU= =IFJHIU= =BHF=  =HFOEFS==NBMF=^^ *OQVU VTFS֙ਘੌ٣ؘ੉ࢸ੿೧઻ BTTJTUBOUঌѷणפ׮٣ؘ੉੄੉ܴҗ٣ؘ੉׼ੌ զ૞ܳੌ۽ನೣೡ૑ৈࠗܳঌ۰઱ࣁਃ VTFSѾഒ (SPVOE 5SVUI ٣ؘ੉զ૞ܳੌ۽ನೣೡөਃ .PEFM 0VUQVU Ѿഒ٣ؘ੉ࢸ੿৮ܐ೮णפ׮֙ਘੌө૑ ੌթওणפ׮ &SSPSTJO4MPU2VFTUJPO
  23. *OQVU VTFSೖ੗ખ઱ޙ೧઴ې (SPVOE 5SVUI ೖ੗ח઱ޙೡࣻহणפ׮ .PEFM 0VUQVU જইਃযڃઙܨ੄ೖ੗ܳ઱ޙೡөਃ *OQVU VTFSೖ੗ખ઱ޙ೧઴ې

    (SPVOE 5SVUI ೖ੗ח઱ޙೡࣻহणפ׮ .PEFM 0VUQVU \OBNFPSEFS@QJ[[B BSHVNFOUT \=TJ[F==NFEJVN= =DSVTU==UIJO=  =UPQQJOHT=<=QFQQFSPOJ=  =NVTISPPNT= =POJPOT=>^^ &SSPSTJO3FMFWBODF%FUFDUJPO
  24. 2"