Upgrade to Pro — share decks privately, control downloads, hide ads and more …

【輪講資料】ReAct: Synergizing Reasoning and Acting i...

Yano
October 31, 2023
93

【輪講資料】ReAct: Synergizing Reasoning and Acting in Language Models / Tree of Thoughts: Deliberate Problem Solving with Large Language Models

研究室の輪講で発表したときの資料です。以下の2本の論文を紹介しています。
- ReAct: Synergizing Reasoning and Acting in Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Yano

October 31, 2023
Tweet

Transcript

  1. 10݄31೔ ෢ాɾ࡫໺ݚڀࣨɹM2 ໼໺ઍߛ ReAct: Synergizing Reasoning and Acting in Language

    Models Tree of Thoughts: Deliberate Problem Solving with Large Language Models Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Grif fi ths, Yuan Cao, Karthik Narasimhan Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao ICLR 2023 NeurIPS 2023
  2. ֓؍ • ReAct • ਪ࿦ͱߦಈΛަޓʹੜ੒͢Δ • ߦಈͰ֎෦πʔϧΛར༻͢Δ͜ͱͰHallucinationΛ๷͙͜ͱ͕Ͱ͖Δ • Tree of

    Thougt • ޙ໭Γ΍ઌಡΈΛߦ͍ɺख़ߟ͢ΔࣄͰCoTͳͲͰ͸೉͔ͬͨ͠ෳࡶͳਪ࿦Λ ߦ͏͜ͱ͕Ͱ͖Δ 3
  3. ReAct: Synergizing Reasoning and Acting in Language Models • LLMʹਪ࿦ͱߦಈΛަޓʹੜ੒ͤ͞Δϓϩϯϓτख๏ʮReActʯͷఏҊ

    • ਪ࿦Ͱ͸ߦಈΛܭըͨ͠Γɺߦಈͷ݁ՌΛॲཧͨ͠Γ͢Δ • ߦಈͰ͸஌ࣝϕʔε΍؀ڥʹΞΫηε͠৘ใΛಘΔ • ֎෦͔Βͷਖ਼͍͠৘ใΛར༻͢Δ͜ͱͰHallucinationΛ཈੍ • Hallucination: LLM͕΋ͬͱ΋Β͍͠ӕΛग़ྗ͢Δ͜ͱ 5 2ϋϩ΢Οϯ͸೥Ҏ্લ͔Βଓ͘ࡇΓͰ͔͢ʁ ·ͣ͸ϋϩ΢Ο ϯͷىݯΛௐ΂ Α͏ ϋϩ΢Οϯىݯ ϋϩ΢Οϯͷൃ঵͸ ೥Ҏ্΋લɻʜ ೥લ͸ ೥Ҏ্લ 
 ͔ͩΒ౴͑͸ ʮ͸͍ʯ ਪ࿦ ߦಈ ਪ࿦
  4. ஌ࣝू໿తͳਪ࿦λεΫ HotPotQA • ଟஈ֊ਪ࿦λεΫ • ̎ͭҎ্ͷWikipediaͷهࣄʹର͢Δਪ࿦Λඞཁͱ͢Δ • ྫ) Which magazine

    was started fi rst Arthur's Magazine or First for Women? Fever • ࣄ࣮ݕূλεΫ • ओுʹର͠ɺWikipediaͷهࣄΛࢀর͠[ Supported, Refuted NotEnoughInfo ]ͷϥϕϧ Λ͚ͭΔ • ྫʣNikolaj Coster-Waldau worked with the Fox Broadcasting Company.
 -Supported 7
  5. ൺֱϕʔεϥΠϯ Chain of Thought (CoT)[1] • தؒͷਪ࿦εςοϓ΋ग़ྗͤ͞Δ͜ͱͰɺෳࡶͳਪ࿦͕Մೳ • ߦಈΛར༻ͤͣɺਪ࿦ͷΈΛߦ͏ʢϞσϧ಺෦ͷ৘ใͷΈར༻ʣ Chain

    of Thought self-consistency (CoT-SC)[2] • ෳ਺ͷCoTʹΑͬͯଟ਺ܾ͢Δ • Թ౓ม਺Λ0.7ʹઃఆͨ͠CoTΛ21ճߦ͍ɺ
 ղ౴Λଟ਺ܾͰܾఆ͢Δ 11 <>$IBJOPG5IPVHIU1SPNQUJOH&MJDJUT3FBTPOJOHJO-BSHF-BOHVBHF.PEFMT <>4FMG$POTJTUFODZ*NQSPWFT$IBJOPG5IPVHIU3FBTPOJOHJO-BOHVBHF.PEFMT
  6. ReAct vs CoT @HotpotQA • CoTͷHallucination͸ReActΑΓਂࠁ (False positive, Hallucination) •

    ReAct͸ਪ࿦εςοϓ͕ॊೈͰͳ͍ (Reasoning error) • ಉ͡ਪ࿦ɺߦಈΛ܁Γฦ͢ϧʔϓʹϋϚΔ͜ͱ͕ଟ͍ • ReAct͸ݕࡧͰࣦഊ͢Δͱਪ࿦͕୤ઢͯ͠͠·͏ (Search result error) 16 ˛)PUQPU2"Ͱͷ੒ޭɺࣦഊࣄྫ͔ΒϥϯμϜʹநग़͠ɺਓखͰݪҼΛௐࠪ
  7. ஌ࣝू໿తͳਪ࿦λεΫͰͷ࣮ݧ݁Ռ • ReAct+CoT-SC > CoT-SC, ReAct • CoT-SC (sample=21) <

    ReAct+CoT-SC (sample=5) • ಺෦஌ࣝͱ֎෦஌ࣝΛ૊Έ߹ΘͤΔ͜ͱ͸༗༻ 18 ˛$P54$Ͱͷαϯϓϧ਺ ଟ਺ܾͷ฼਺ ͱੑೳ
  8. ҙࢥܾఆλεΫ ALFWorld • ՈͷதΛςΩετૢ࡞ʹΑͬͯΤʔδΣϯτ͕୳ࡧ͠ɺෳࡶͳ̒छྨͷλεΫΛ ୡ੒͢Δ • ߦಈͷબ୒ࢶ͸50छྨҎ্͋Γɺ૯౰ͨΓతʹ୳ࡧ͢Δ͜ͱ͸ࠔ೉ • ྫ) You

    are in the middle of a room. Looking quickly around you, you see a drawer 2, a shelf 5,…, and a drawer 4. 
 Your task is to: put some vase in safe. • vase΋safe΋ೖྗʹ͸ଘࡏͤͣɺৗࣝΛར༻ͯ͠୳ࡧ͠ͳ͚Ε͹ͳΒͳ͍ 
 (Ֆළ͸୨ͷ্ʹ͋Γͦ͏) WebShop 20 <> <>8FC4IPQ5PXBSET4DBMBCMF3FBM8PSME8FC*OUFSBDUJPOXJUI(SPVOEFE-BOHVBHF"HFOUT
  9. ҙࢥܾఆλεΫ ALFWorld WebShop • ݱ࣮ͷ঎඼Λར༻ͨ͠ΦϯϥΠϯγϣοϐϯά؀ڥͰϢʔβʔͷࢦࣔʹج ͍ͮͯ঎඼Λߪೖ͢ΔλεΫ • ΤʔδΣϯτ͸ݕࡧ΍ʮ໭ΔʯͳͲͷϘλϯΛར༻Ͱ͖Δ • Score

    (ճ౴͕Ͳͷఔ౓ཁ݅Λຬ͍ͨͯ͠Δ͔)ͱSuccess rate (શͯͷཁ݅ Λຬͨͨ͠੡඼Λճ౴ׂͨ͠߹)ͰධՁ • ྫ) get me a sixteen pack of apple cinnamon freeze dried banana chips, and price lower than 50.00 dollars 21
  10. ൺֱϕʔεϥΠϯ ALFWorld • BUTLER[1]: λεΫछຖʹ100000݅ͷσϞͰ໛฿ֶशͨ͠Ϟσϧ • ReAct-IM: ReActΛInner MonologueελΠϧ[2]ʹͨ͠΋ͷ •

    ΤʔδΣϯτ͕؍ଌͨ͠؀ڥͱɺୡ੒͕ඞཁͳখ໨ඪΛInner Monologue ͱͯ͠ੜ੒͢Δ WebShop • IL: 1012݅ͷσϞͰ໛฿ֶशͨ͠Ϟσϧ • ILʴRL: 10587݅ͷσϞͰ໛฿ֶशʴڧԽֶशͨ͠Ϟσϧ 22 <>"-'8PSME"MJHOJOH5FYUBOE&NCPEJFE&OWJSPONFOUTGPS*OUFSBDUJWF-FBSOJOH <>*OOFS.POPMPHVF&NCPEJFE3FBTPOJOHUISPVHI1MBOOJOHXJUI-BOHVBHF.PEFMT
  11. ҙࢥܾఆλεΫͷ࣮ݧ݁Ռ • ReAct > Act • Act͸໨ඪͷ෼ղͱ؀ڥΛ೺Ѳ͕Ͱ͖ͳ͍ • ฏۉతʹ͸ReAct >

    BUTLER • େྔͷֶशσʔλΛར༻͢Δख๏ʹউར 24 ˛"-'8PSMEʹ͓͚ΔλεΫछྨຖͷ੒ޭ཰ 
  12. ҙࢥܾఆλεΫͷ࣮ݧ݁Ռ • ReAct > ReAct-IM • Inner MonologueελΠϧͰ͸؀ڥͷ؍ଌͱୡ੒͢΂͖খ໨ඪʹ͍ͭͯͷΈੜ੒ • ্खʹৗࣝΛద༻ͯ͠ɺਪ࿦ͷεςοϓΛߏங͢Δ͜ͱ͕೉͍͠

    • ྫ) task: put a clean knife in countertop. • think: To solve the task, I need to fi nd and take a clean knife, then put it in countertop. • φΠϑΛݟ͚ͭΔ΋ͷͷɺͦΕ͕៉ྷͩͱ৴ͯ͡Χ΢ϯλʔʹஔ͖ଓ͚Δ 25 ˛"-'8PSMEʹ͓͚ΔλεΫछྨຖͷ੒ޭ཰ 
  13. ҙࢥܾఆλεΫͷ࣮ݧ݁Ռ • ReAct > Act > IL, IL+RL • ReAct͸ϊΠζͷଟ͍؍ଌ͔Β͏·࣍͘ͷߦಈΛੜ੒Ͱ͖͍ͯΔ

    • For ‘space-saving ottoman bench for living room’, the item has options ‘39x18x18inch’ and ‘blue’ and seems good to buy. 26 ˛8FC4IPQͰͷείΞ
  14. ReAct: Synergizing Reasoning and Acting in Language Models • LLMʹਪ࿦ͱߦಈΛަޓʹੜ੒ͤ͞Δϓϩϯϓτख๏ͷఏҊ

    • ਪ࿦Ͱ͸࣍ͷߦಈΛܭը͠ɺߦಈͷ݁ՌΛॲཧ͢Δ • ߦಈͰ͸஌ࣝϕʔε΍؀ڥʹΞΫηε͠ɺ௥Ճͷ৘ใΛಘΔ • ࣭໰Ԡ౴λεΫͱࣄ࣮ݕূλεΫʹ͓͍ͯɺWikipediaAPIͱର࿩͠ɺ HallucinationΛࠀ෰ • 2ͭͷର࿩ܕҙࢥܾఆϕϯνϚʔΫʹ͓͍ͯɺେྔͷֶशσʔλΛར༻͠ ͨ໛฿ֶशͱڧԽֶशʹΑΔϕʔεϥΠϯΛ1-shot/2-shotͷઃఆͰେ෯ʹ ্ճͬͨ 27
  15. Tree of Thoughts: Deliberate Problem Solving with Large Language Models

    • ݴޠϞσϧͷਪ࿦͸ࠨ͔Βӈʹݶఆ • ઌಡΈΛඞཁͱͨ͠Γɺ࠷ॳͷܾఆ͕ ۃΊͯॏཁͳλεΫͰ͸ࣦഊ ➡໰୊ղܾͷதؒஈ֊ (Thought)Λ୳ࡧ Մೳͱ͢ΔϑϨʔϜϫʔΫɺ
 ʮTree of ThoughtʯΛఏҊ • ෳ਺ͷҟͳΔਪ࿦ܦ࿏Λߟྀ͠ɺ࣍ͷ ߦಈΛܾఆ͢ΔͨΊʹࣗݾධՁΛߦ͏ ʢख़ߟ͢Δʣ 29
  16. Tree of Thoughtͷਪ࿦ྫ 31 ҎԼͷ਺ࣈͱ࢛ଇԋࢉΛ૊Έ߹ΘͤͯΛ࡞͍ͬͯͩ͘͞   Y  Y

     Y *OQVU ⭕ ⭕ ⭕ ⭕ ✖︎ ✖︎ ✖︎ Ͳͷਪ࿦͕ ༗๬͔ͳʜ
  17. Tree of Thought • ෳࡶͳਪ࿦໰୊Λখ໰୊ʹ෼ղ͠ɺͲͷਪ࿦ܦ࿏͕ے͕ྑ͍͔ධՁΛ ߦ͍ͳ͕Β୳ࡧ͢Δ • λεΫຖʹมߋՄೳͳҎԼͷ4ͭͷΦϓγϣϯ A. ࢥߟΛ෼ղ͢Δେ͖͞

    B. ࢥߟͷੜ੒ํ๏: ࠓ·Ͱͷࢥߟ΋ೖྗ͢Δ/ ͠ͳ͍ C. ֤ϊʔυͷධՁํ๏: ಠཱʹఆྔԽ / બ୒ࢶΛൺֱ • ઌಡΈͱৗࣝΛར༻ D. ໦ͷ୳ࡧํ๏: BFS / DFS 32
  18. Game of 24ͰͷϕʔεϥΠϯ IO • ໰୊Λೖྗͱ͠ɺղ౴Λग़ྗͱ͢Δඪ४తͳϓϩϯϓτ (5-shot) CoT • 4ͭͷ਺஋͔Β24Λ࡞ΔͨΊͷద౰ͳ࢛ଇԋࢉͷࣜΛ༩͑Δ

    (5-shot) • 13 - 9 = 4 (left: 4 4 10); 10 - 4 = 6 (left: 4 6); 4 * 6 = 24 (left: 24) CoT-SC • 100ճCoTͰͷਪ࿦Λ࣮ߦ͠ɺଟ਺ܾΛऔΔ IO + re fi ne • 10ճ൓෮ͯ͠ਪ࿦͢Δ • ؒҧ͍͑ͯͨ৔߹͸मਖ਼͢ΔΑ͏ͳࢦࣔͱཤྺΛ༩͑Δ 35
  19. Game of 24ͰͷToTηοτΞοϓ 36 • P32ʹࣔͨ͠Φϓγϣϯ͸ҎԼͷΑ͏ʹͳΔ A. ࢥߟΛ෼ղ͢Δେ͖͞ɿ̍౓ͷ࢛ଇԋࢉ B. ࢥߟͷੜ੒ํ๏ɿࠓ·Ͱͷࢥߟ͸ೖྗͤͣɺ୯७ͳ౳ࣜͷੜ੒ͷΈߦ͏

    C. ֤ϊʔυͷධՁํ๏ɿ24ʹ౸ୡͰ͖Δ͔ʁ- [ sure/maybe/impossible ] • ৗࣝΛར༻͠ɺ24ʹରͯ͠େ͖͗ͨ͢Γখ͗͢͞ΔީิΛল͘ D. ໦ͷ୳ࡧํ๏ɿ෯༏ઌ୳ࡧ
  20. Game of 24Ͱͷ݁Ռ • ToTͷ๚໰ࡁϊʔυͰͷ੒ޭ཰͕ߴ͍ • IO, CoTͰ͸ͦΕͧΕͷ࣮ߦશମΛ
 ϊʔυͱͯ͠ܭࢉ •

    (ےྑ͘୳ࡧͰ͖͍ͯΔ) 39 • CoTͰ͸60%͕࠷ॳͷ౳ࣜੜ੒࣌఺Ͱ ਖ਼౴ʹࣦഊ • ޙ໭Γ΍ઌಡΈ͕Ͱ͖ͳ͍ख๏ʹ͸
 ݶք͕͋Δ ˛๚໰ࡁΈϊʔυͱͦͷਖ਼ղ཰ ˛εςοϓຖͷࣦഊ཰
  21. Creative writingͰͷToTηοτΞοϓ 41 • P32ʹࣔͨ͠Φϓγϣϯ͸ҎԼͷΑ͏ʹͳΔ A. ࢥߟΛ෼ղ͢Δେ͖͞ɿͲͷΑ͏ͳจষΛهड़͢Δ͔ͷܭը / จষͷهड़ B.

    ࢥߟͷੜ੒ํ๏ɿࠓ·Ͱͷࢥߟ (ܭը) Λར༻ͯ͠จষΛهड़͢Δ C. ֤ϊʔυͷධՁํ๏ɿ֤ࢥߟ͸ಉ࣌ʹ5ͭੜ੒͞Εɺ࠷΋ྑ͍΋ͷΛબͿ D. ໦ͷ୳ࡧํ๏ɿਂ͞=2, ෯=1 • ܭըͱจষͷهड़Λߦ͏ͨΊਂ͞͸2ɺධՁ͝ͱʹ࠷ྑͷࢥߟͷΈ࢒ͨ͢Ί෯͸1
  22. Mini CrosswordsͰͷϕʔεϥΠϯ IO • ख͕͔ΓΛೖྗͱͯ͠ΫϩεϫʔυΛग़ྗͱ͢Δ(5-shot) 44 4PMWFYNJOJDSPTTXPSET(JWFOBOJOQVUPGIPSJ[POUBMDMVFTBOEWFSUJDBMDMVFT  HFOFSBUFBOPVUQVUPGSPXT XIFSFFBDISPXJTMFUUFSTFQBSBUFECZTQBDF

    
 *OQVU I"MVOBSWBMMFZ I"GBUUZPJM I5PFOUJDF I5PMPXFSUPSFEVDF I"TPMJUBSZQFSTPO W"DDPSEJOHUPUIFSPTUFS W"OPUIFSOBNFGPS1PSU'SBODRVJ W"OJMMJDJUMPWFSB&VSPQFBOMBLF W5PMJTQ W5PDPNFJO 0VUQVU 3*--& 0-&*/ 5&.15 "#"4& -0/&3 CoT • ख͕͔ΓͱରԠ͢Δ୯ޠΛࣔ͢ (5-shot)
  23. Mini CrosswordsͰͷToTηοτΞοϓ 45 • P32ʹࣔͨ͠Φϓγϣϯ͸ҎԼͷΑ͏ʹͳΔ A. ࢥߟΛ෼ղ͢Δେ͖͞ɿख͕͔Γ͔Β୯ޠΛਪଌ B. ࢥߟͷੜ੒ํ๏ɿࠓ·ͰͷࢥߟΛར༻ͯ͠୯ޠΛਪଌ C.

    ֤ϊʔυͷධՁํ๏ɿଞͷख͕͔Γͷ୯ޠͷअຐʹͳΒͳ͍͔ʁ
 - [ sure/maybe/impossible ] • ଞͷ୯ޠ͕ຒΊΒΕͳ͘ͳͬͨ (impossible) ৔߹͸લͷࢥߟʹ໭Δ D. ໦ͷ୳ࡧํ๏ɿਂ͞༏ઌ୳ࡧ
  24. Tree of Thoughts: Deliberate Problem Solving with Large Language Models

    • ໰୊ղܾͷதؒஈ֊ (Thought)Λ୳ ࡧՄೳͱ͢ΔϑϨʔϜϫʔΫɺ
 ʮTree of ThoughtʯΛఏҊ • ෳ਺ͷҟͳΔਪ࿦ܦ࿏Λߟྀ͠ɺ࣍ ͷߦಈΛܾఆ͢ΔͨΊʹࣗݾධՁΛ ߦ͏ʢख़ߟ͢Δʣ • ఏҊͨ̏ͭ͠ͷઌಡΈ΍୳ࡧ͕ඞཁ ͳλεΫʹ͓͍ͯେ෯ʹੑೳ޲্ 48