Upgrade to Pro — share decks privately, control downloads, hide ads and more …

認知科学からの視点:満足化によるエミュレーションと,判定問題としての強化学習 - Shibuy...

認知科学からの視点:満足化によるエミュレーションと,判定問題としての強化学習 - Shibuya Synapse 3 - 2018 06Jun 23 Sat - CompCogSci and RL

Tatz Takahashi

June 23, 2018
Tweet

More Decks by Tatz Takahashi

Other Decks in Science

Transcript

  1. ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) Shibuya Synapse #3

    - ݱࡏͷڧԽֶशʹԿ͕଍Γͳ͍ͷ͔ʁ 2018-06-23 Sat ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 1 / 23
  2. Outline 1 ͸͡Ίʹ 2 ࠷ۙͷٞ࿦ 3 ڧԽֶशʹ͓͚Δຬ଍Խ 4 ଟ࿹όϯσΟ οτ໰୊

    5 ٞ࿦ 6 ݶఆ߹ཧੑ 7 ·ͱΊ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 2 / 23
  3. ͸͡Ίʹ ೝ஌Պֶ (Cognitive Science) ৺Λ৘ใॲཧػցͱݟͳ͠ɺͦͷ৘ใදݱ (representation; data structure) ͱਪ࿦ (inferenece;

    algorithm) Λղ໌͢Δɻ ࢀߟ: ࢲͷϒοΫϚʔΫ ܭࢉ࿦తೝ஌Պֶ ਓ޻஌ೳ Vol.31.No.2(2016/3) ೝ஌Պֶ ্هͷߟ͑Λத৺ʹɺओʹҎԼ 6 ͭͷ෼໺͕ڠಇ 1 ఩ֶ 2 ݴޠֶ 3 ਓྨֶ 4 ਆܦՊֶ 5 ਓ޻஌ೳ 6 ৺ཧֶ Wikipedia ΑΓ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 3 / 23
  4. ͸͡Ίʹ ͳͥਓ޻஌ೳʹೝ஌Պֶɾ৺ཧֶ͕ؔ܎ʁ Ԥถͷ AI ݚڀऀͷҙ֎ͱଟ͕͘৺ཧֶՊग़਎ ྫɿ Geoffrey E. Hinton 1970

    ೥έϯϒϦοδେ࣮ݧ৺ཧֶՊଔ (PhD ͸ AI) Michael I. Jordan 1978 ೥ϧΠδΞφभཱେ৺ཧֶՊଔ (PhD: UCSD ೝ஌Պֶ) ೝ஌Պֶɾ৺ཧֶͷ୅දతͳࡶࢽ Psychological Review: ৺ཧֶ෼໺ͷτοϓδϟʔφϧɺ௕͍ཧ࿦త࿦จ͕ࡌΔ Cognitive Science: ೝ஌Պֶ෼໺ͷτοϓδϟʔφϧ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 4 / 23
  5. ͸͡Ίʹ ύʔηϓτϩϯ 1958 χϡʔ ϥ ϧ ωοτ ͷ ݪ ܕ

    By Source (WP:NFCC#4), Fair use, https://en.wikipedia.org/w/index.php?curid=47541432 Psychological Review ࢽ 1958 Psychological Review Vol. 65, No. 6, 19S8 THE PERCEPTRON: A PROBABILISTIC MODEL FOR INFORMATION STORAGE AND ORGANIZATION IN THE BRAIN1 F. ROSENBLATT Cornell Aeronautical Laboratory If we are eventually to understand and the stored pattern. According to ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 5 / 23
  6. ͸͡Ίʹ ࠶ؼܕχϡʔϥϧωοτϫʔΫ 1990 RNN, LSTM, . . . Cognitive Science

    ࢽ 1990 COGNITIVE SCIENCE 14, 179-211 (1990) Finding Structure in Time JEFFREYL.ELMAN University of California, San Diego Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach Is to rep- resent time implicitly by its effects on processing rather than explicitly (as in a spatial representation). The current report develops a proposal along these lines ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 6 / 23
  7. ͸͡Ίʹ ϘϧπϚϯϚγϯ 1985 RBM, . . . Cognitive Science ࢽ

    1985 COGNITIVE SCIENCE 9, 147-169 (1985) A Learning Algorithm for Boltzmann Machines* DAVID H. ACKLEY GEOFFREY E. HINTON Computer Science Department Carnegie-Mellon University TERRENCE J. SEJNOWSKI ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 7 / 23
  8. ͸͡Ίʹ ڧԽֶश 1981 Q-learning, Actor-Critic, DQN, AlphaGo, . . .

    Psychological Review ࢽ 1981 Psychological Review 1981, Vol. 88, No. 2, 135-170 Copyright 1981 by the American Psychological Association, Inc. 0033-295X/8I/8802-OI35$00.75 Toward a Modern Theory of Adaptive Networks: Expectation and Prediction Richard S. Sutton and Andrew G. Barto Computer and Information Science Department University of Massachusetts—Amherst Many adaptive neural network theories are based on neuronlike adaptive elements that can behave as single unit analogs of associative conditioning. In this article we develop a similar adaptive element, but one which is more closely in accord with the facts of animal learning theory than elements commonly studied in adaptive network research. We suggest that an essential feature of classical ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 8 / 23
  9. ͸͡Ίʹ ʮ஌ೳ৘ใσβΠϯίʔεʯ ൃදऀͷॴଐɺ౦ژిػେֶ ཧ޻ֶ෦ ৘ใγεςϜσβΠϯֶܥ ಺ ஌ೳ৘ใσβΠϯίʔεͰ͸ɺਓؒͷ೴ͱ৺ͷಇ͖ͱͦͷಛੑʹ ͍ͭͯཧղ͠ɺ౷ܭֶɾσʔλ෼ੳʹجͮ͘໰୊ൃݟɾղܾͷͨ Ίͷ৘ใ෼ੳೳྗͱɺਓؒͷ஌ೳΛ୅ସ͠͏ΔೳྗΛ࣋ͬͨγε ςϜͷઃܭɺධՁΛߦ͏ͨΊͷ஌ࣝͱೳྗΛཆ͍·͢ɻਓؒͷ೴

    ͱ৺ͷಇ͖ͱͦͷಛੑΛཧղ͢Δʹ͸ʮίϛϡχέʔγϣϯɾ৺ ཧʯ෼໺ͷՊ໨Λֶͼ·͢ɻ౷ܭֶɾ σʔλ෼ੳʹجͮ͘໰୊ൃ ݟɾղܾͷͨΊͷ৘ใ෼ੳೳྗʹ͍ͭͯ͸ʮ৘ใՊֶʯ෼໺ͷ౷ ܭ΍ଟมྔղੳʹؔ͢ΔՊ໨Λֶͼ·͢ɻਓ޻஌ೳγεςϜͷઃ ܭɺධՁΛߦ͏ͨΊʹ͸ɺ ʮ৘ใγεςϜʯ ʮ৘ใϝσΟΞʯ෼໺ ͷίϯϐϡʔλγεςϜͷݪཧͱγεςϜߏஙʹؔ͢ΔՊ໨ɺਓ ޻஌ೳϓϩάϥϛϯά IɾII Λத৺ͱͨ͠ʮϓϩάϥϛϯάʯ෼໺ ͷՊ໨ΛֶΜͰ͍͖·͢ɻ ʢϤʔϩούͷڞಉݚڀऀ΍๺ถͷֶੜ͔ΒධՁʣ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 9 / 23
  10. ࠷ۙͷٞ࿦ Lake ࿦จɿ ʮਓؒͷΑ͏ʹֶश͠ߟ͑ΔϚγϯ Λ࡞Δʹ͸ʯ Building Machines That Learn and

    Think Like People Behavioral and Brain Sciences, 2017 Lake, Brenden M: Bayesian Program Learning (Science, 2015) Ullman, Tomer D: MIT PD ൃୡϞσϦϯά Tenenbaum, Joshua B: MIT ͷܭࢉ࿦తೝ஌ՊֶͷϦʔμʔ Gershman, Samuel J: ϋʔόʔυ ܭࢉ࿦తਆܦՊֶͱೝ஌Պֶ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 10 / 23
  11. ࠷ۙͷٞ࿦ Behavioral and Brain Sciences ࢽ ߦಈɺ೴ɺೝ஌ܥͷ୅දతϨϏϡʔࢽ, Cambridge Univ. Press

    Α͘ಡ·Ε͓ͯΓ (IF 14 Ҏ্) ௕͍ϨϏϡʔɾҙݟ࿦จͱɺ ଟ਺ͷίϝϯλϦʔʢ൓࿦΍ίϝϯτʣͱɺ ίϝϯλϦʔʹର͢ΔஶऀΒͷճ౴Λಉ࣌ܝࡌ Φʔϓϯͳٞ࿦Λ௨ͯ͡෼໺ͷڞ௨ݟղΛ࡞Δ໾ׂ ͜͜Ͱ঺հ͢Δ࿦จ΋௕͍ʢೋஈ૊Ͱɺຊจ 25 ϖʔδɺ27 ݅ͷί ϝϯλϦʔ͕ 26 ϖʔδɺஶऀΒͷճ౴͕ 10 ϖʔδɺจݙ͕ 13 ϖʔδʣͷͰɺ೔ຊޠ༁Λ࡞੒ͨ͠ʢͲ͔͜Ͱग़൛͠·͢ʣ ɻ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 11 / 23
  12. ࠷ۙͷٞ࿦ ΠϯτϩμΫγϣϯ 1. P•p“jZbŽ• • ȮŠʼnʥ3ɂ¼˞˧ (CNN)—ɗ˜˞˧ (RNN) -íƇ (DQN)

    ,ɝˋ>Ɵõ • 3ȋ0ȋǽ¬̪3E0ʼnʥ—ƐʧH -4™¼.-LƙĜHLʧŕ • ˞ɣɰʼn—ɗ̒ƌɊʼn—AIF3¬̪F3 ıȳLȄˎ¬̪3E0ʼn7ʧH…b•3 %A3ǫ˯ə/ˇʑLǀǼ • ɉɄ3AI3àʧŕ-­ƅDH;- – ȮŠʼnʥ3˞ɣ^N-3ʙĕ – xi˜•˞˧ vs. ‰o‘ȅʈ 7 ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 12 / 23
  13. ࠷ۙͷٞ࿦ DeepMind ͔Βͷ൓࿦ “Building machines that learn and think for

    themselves ” Botvinick, . . . , Legg, and Hassabis (19 ໊) جຊతʹ Lake Βʹಉҙ͢Δ͕ɺ ࣗ཯ੑ autonomy ͕࠷΋ॏཁ όΠΞεͷ࡞ΓࠐΈ human hand engineering ͸ྑ͘ͳ͍ ͨ·ͨ· Lake Βͷॏࢹͨ͠ਓؒͷόΠΞεͱͯ͠ͷ௚ײ෺ཧ ֶ΍௚ײ৺ཧֶʹ͍ͭͯ͸େྔͷՊֶతσʔλͱϞσϧ͕͋ Δ͕ɺଞʹ͍ͭͯ͸͋·Γͳ͍ͷͰɺ࡞ΓࠐΈՄೳͳྖҬ΋ গͳ͍ AI ͕ΑΓෳࡶͳݱ࣮ੈքʹཱͪ޲͔͏΄Ͳʹɺࣗ཯తֶश͕ ॏཁʹͳΔ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 13 / 23
  14. ࠷ۙͷٞ࿦ Lake Βͷ࠶൓࿦ ࣗ཯ੑʹΑΔ૑ൃ emergence ʹ೚ͤΔͷ͸ݱ࣮తͰͳ͍ χϡʔϥϧωοτͷޯ഑ֶशͰɺਓ͕ؒ࣋ͭΑ͏ͳʮཧ࿦ʯ Λ֫ಘͰ͖Δͱ͸ࢥ͑ͳ͍ ࠷ۙͷೝ஌Պֶͷٞ࿦͸ɺҎԼͷΑ͏ͳରཱ࣠Λ௒͑ͭͭ ͋Δ

    ʢਓؒͷॾೳྗ͸ʣੜ·Ε͔ҭ͔ͪ ʢਓ͕ؒ࣋ͭͷ͸ʣཧ࿦͔அย͔ ʢਓؒͷਪ࿦͸ʣه߸త͔४ه߸త͔ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 14 / 23
  15. ࠷ۙͷٞ࿦ ·ͱΊ Tenenbaum Β͸ਓؒͷΑ͏ͳϚγϯΛ࡞Ζ͏ͱ͍ͯ͠Δ Hassabis Β͸ϘτϜΞοϓͰɺਆͷࢠͷΑ͏ͳϚγϯΛ࡞Ζ ͏ͱ͍ͯ͠Δ (Solve intelligence; AGI)

    ͲͪΒʹ΋ൈ͚͍ͯΔ؍఺͸ɺਓؒͷ஌ੑͷࣾձੑͰ͋Δ ࢠڙ͸ɺجຊతͳ਎ମͷՄೳੑʹ͍ͭͯ͸ࢼߦࡨޡʢڧԽֶ शతʣͰֶͼ ͦͷޙ͸ಉ๔΍େਓͷ ໛฿ʹΑֶͬͯͿ ࣾձֶशɺಛʹ໛฿Ͱଟ͘ͷجຊతͳߦಈ୯ҐΛ֫ಘ͠ɺͦ ͷௐ੔ͰڧԽֶशͳͲΛ༻͍Δ ໛฿ʹ͸େผͯ͠ imitation (how ͷ໛฿) ͱ emulation (what ͷ໛฿) ͕͋Δ͕ɺޙऀͷϞσϧ͸΄ͱΜͲͳ͍ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 15 / 23
  16. ڧԽֶशʹ͓͚Δຬ଍Խ ೝ஌ɾࣾձతݱ৅ɿ ʮΤϛϡϨʔγϣϯʯ ʮΤϛϡϨʔγϣϯʯͱͯ͠౷Ұతʹߟ͍͑ͨݱ৅ (emulation: ڝ૪, ுΓ߹͍, ର߅) Կ͔ʹ੒ޭͨ͠ͱ͍͏৘ใ͚ͩͰɺޙଓͷ੒ޭ͕ଓ͘ ΞϝϦΧͰͷݪര։ൃͷ৘ใ͚ͩͰι࿈ͷ։ൃΛ͔ͳΓՃ଎ʁ

    બख A ͕ 100m ૸Ͱ 10 ඵΛ੾Δ΍൱΍ɺଞͷબख΋ 10 ඵΛ੾ Γ࢝ΊΔɻ A ͷ૸๏ͳͲΛࢀߟʹ͠ͳ͍ͱͯ͠΋ 10 ඵΛ੾Γ΍͘͢ Ͱ͖Δͱ৴͡ΔͱɺͰ͖Δ͜ͱ΋͋ΔɻͰ͖ͳ͍ͱ৴ͨ͡Βɺ ·ͣͰ͖ͳ͍ɻ ݚڀͰࢦಋڭһ͕ʮͰ͖Δʯͱଠޑ൑Λԡ͞ͳ͍ͱɺֶੜ͸ͳ ͔ͳ͔Ͱ͖ͳ͍ɻ͔͠͠ࢦಋڭһ͕΍ΓํΛ஌͍ͬͯΔΘ͚Ͱ ͸ͳ͍ʢ஌͍ͬͯΔͳΒ΋͏ݚڀͰ͸ͳ͍ʣ ɻ ੈք͸਺ֶͰهड़Ͱ͖Δͱݴ͏ ෆ߹ཧͳ߹ཧੑͷલఏ ۙ୅Պֶ΁ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 16 / 23
  17. ڧԽֶशʹ͓͚Δຬ଍Խ ΤϛϡϨʔγϣϯΛ׆༻ͨ͠ڧԽֶशͷϞσϧ Ҏ্ͷࣾձతݱ৅͸ɺ͋Δछͷࣾձֶशͱͯ͠ཧղՄೳ ʮݶఆ߹ཧੑʯ΍ʮຬ଍ԽʯʹΠϯεύΠΞ͞ΕͨΞϧΰϦζϜ ͕ͦͷϞσϧͱͳΔ ͋Δछͷʮه࿥ʯ΍ʮୡ੒ਫ४ʯΛ༩͑ΒΕΔͱɺͦΕΛ௒͑Δ ߦಈܥྻΛ୳ࡧɾߏங͢Δ ڧԽֶशͰɺୡ੒ج४Λ༩͑ΒΕΔͱɺຬ଍ͳߦಈܥྻ͕΋ ͠ଘࡏ͢Ε͹ͦΕΛޮ཰తʹൃݟͰ͖ΔΞϧΰϦζϜʢRS ߴ

    ڮ, ߕ໺ & Ӝ্ 2016; ͦͷܗࣜతੑ࣭ ۄ଄ & ߴڮ, JSAI 2018, in prep.; RS-GRC, ߕ໺ et al. JSAI 2018 ࠤௗ et al., ଖా et al.ʣ τοϓμ΢ϯͳୡ੒ج४ʹΑΓɺ७ਮͳϘτϜΞοϓΑΓ΋ ୳ࡧۭ͕ؒѹॖɾߏ଄Խ ͞ΒʹʮΤϛϡϨʔγϣϯΛ׆༻ͨ͠ڧԽֶशͷϞσϧʯʹ ͸ܭࢉ࿦తͳଆ໘͋Γɿ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 17 / 23
  18. ڧԽֶशʹ͓͚Δຬ଍Խ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश ൑ఆ໰୊ (decision problem) ɿ ʢ͋Δ੍໿ू߹ C ͷԼʣx ͕͋

    Δੑ࣭Λຬ͔ͨ͢Λ yes/no Ͱ౴͑Α ࠷దԽ໰୊ɿ͋Δ੍໿ू߹ C ͷԼɺx = argmaxx′ f (x′) Λݟ ͚ͭΑ ࠷దԽ໰୊͸ܾఆ໰୊ʹม׵Ͱ͖Δɻ100m Λ X ඵͰ૸ΕΔ͔ɺ ͱ͍͏ܾఆ໰୊Ͱɺ X Λ {9.0, 9.1, ..., 10.0} ͱͯ͠ 11 ௨Γ΍ͬͯ ΈΕ͹ɺ X ͷ࠷খ஋ͷൣғ͕෼͔Δʢ9.1 Ͱ noɺ9.2 Ͱ yes ͳΒ ͹ɺݶք͸ (9.1, 9.2] ʹ͋Δʣ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 18 / 23
  19. ଟ࿹όϯσΟ οτ໰୊ ೝ஌తຬ଍ԽՁ஋ؔ਺ RS (risk-sensitive satisficing) ߦಈ ai ʹରͯ͠ɺ ͦΕΛࢼͨ͠ճ਺ʢ

    ʮࢼߦྔʯ ʣΛ n(ai ) ɺ ܦݧظ଴஋ʢใुฏۉʣΛ V (ai ) ɺ ૯ࢼߦ਺ʢʹεςοϓ਺ʣΛ N = Σn(ai ) ͱͯ͠ɺຬ଍ԽՁ஋ؔ਺ RS ͸ߦಈ ai ͷՁ஋Λ࣍ͰධՁ͠ greedy ʹબ୒ RS(ai ) = n(ai ) N ( V (ai ) − R ) (1) ͜ͷ RS ஋Λ greedy ʹӡ༻ R ͸ຬ଍Խͷج४ ( V (ai ) − R ) > 0(< 0) ͳΒ ai ͸ຬ଍Ͱ͖Δ (Ͱ͖ͳ͍) બ୒ࢶ ৄ͘͠͸ۄ଄ɾߴڮ (JSAI 2018, 1N1-04) ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 19 / 23
  20. ଟ࿹όϯσΟ οτ໰୊ RS ʹ͍ͭͯ όϯσΟ οτ໰୊Ͱͷੑ࣭ ඞͣຬ଍Խ͢Δอূ͋Γ ຬ଍ԽͷҙຯͰͷ regret ͸༗ݶʹཹ·Δʢී௨͸࠷దͰ΋

    log Ͱ੒௕ʣ ຬ଍Խج४͕ʮ࠷దʯͳΒ࠷దԽ JSAI 2018 ۄ଄ɾߴڮ; ౤ߘ४උத. όϯσΟ οτ໰୊Ͱɺ R ͸νʔτͳ͠ʹࣗ෼ͰܾΊΒΕɺ regret ͸࠷ద (log Φʔμʔɺ UCB ܥΑΓྑ͍) JSAI 2018 ߕ໺ɾߴڮ ڧԽֶशͰɺຬ଍ͳߦಈܥྻΛޮ཰Α͘ൃݟՄೳ JSAI 2017 ڇాɾߕ໺ɾߴڮ JSAI 2018 ࠤௗ et al., ଖా et al. ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 20 / 23
  21. ٞ࿦ ຬ଍ԽͱڧԽֶश ैདྷڧԽֶश͸ಈతܭը๏ɺ࠷ద੍ޚʹجͮ͘࠷దԽ໰୊ ϏσΦήʔϜ (DQN; Atari) ΍ϘʔυήʔϜ (AlphaGo; ғޟ) ͳΒͱ΋͔͘ɺਓ͕ؒ΍͍ͬͯΔΑ͏ͳߦಈֶश͸࠷దԽͱ

    ͯ͠͸೉͍͠ͷͰ͸ͳ͍͔ ೝ஌తຬ଍ԽͷϞσϧʹΑΓɺڧԽֶशλεΫΛ൑ఆ໰୊ͱ ͯ͠ଊ͑௚͢ ਓؒ΍ಈ෺΋ɺཚ਺Λ࢖ͬͨ໢ཏతͳ୳ࡧ͔Βͷ࠷దԽΛ໨ ࢦ͢ͱ͍͏ΑΓɺλεΫʹ͋Δ࿮૊ΈΛ՝ͯ͠ʢྫ͑͹ຬ଍ Խج४ɺͦͯ͠ҼՌϞσϧʣ ɺશͯͷՄೳੑΛߟྀͤͣʹ͏· ͘΍͍ͬͯΔ͸ͣ ਓؒͷ৔߹ʢൃୡʣ ɺࣾձతʹجຊతͳߦಈΛ ໛฿Ͱ֫ಘ ͠ɺ ڧԽֶशͰௐ੔ ͱ͍͏ೋஈ֊Λ౿Ή (ߦಈֶशͷଟஈ֊ཧ࿦ɺ ࣗવͳ֊૚Խ΁) ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 21 / 23
  22. ݶఆ߹ཧੑ ຬ଍Խ (satisficing) ਓؒͷ஌֮ɺਪ࿦ɺߦಈʹ͸ਫ਼౓ɺܭࢉྔɺޮՌʹݶք ݶఆ߹ཧੑ ͦΜͳதͰɺ࠷దԽ͸ଟ͘ͷ৔߹ʹෆՄೳ ࠷దԽɿঢ়گԼͰ࠷ྑͷબ୒ࢶ΍ߦಈܥྻͷબ୒ɾܗ੒ ͦ͏͍ͬͨ৔߹ʹ͸ ຬ଍Խ satisficing

    ͕༗ޮ satisfice = satisfy + suffice ݹయతຬ଍Խɿ ୳ࡧˠຬ଍ ୳ࡧ ͋Δج४Λຬͨ͢Α͏ͳબ୒ࢶ͕ݟ͔͍ͭͬͯͳ ͚Ε͹ɺ৭ʑͳબ୒ࢶΛϥϯμϜʹબΜͰ୳͢ ຬ଍ ҰͭͰ΋ݟ͔ͭΕ͹΋͏ͦΕͰྑ͍ͱͯͦ͠ΕΛ બͼଓ͚Δ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 22 / 23
  23. ·ͱΊ ·ͱΊ AI ͷॏཁݚڀͷ͍͔ͭ͘͸࣮͸৺ཧֶ༝དྷ ৺ͷ৘ใॲཧϝΧχζϜΛߟ͑Δ͜ͱ͸஌తγεςϜͷ։ൃ ʹॏཁ ൚༻ੑͷߴ͍஌తγεςϜͱͯ͠།Ұͷ࣮ྫ ͦͷ௚ײతཧղͰͳ͘ɺՊֶతཧղ͕ॏཁ ೝ஌Պֶ͸ܭࢉ࿦తϨϕϧͷٞ࿦͕ಘҙ ࠷ۙͷॏཁͳٞ࿦ΛऔΓ্͛ͨ

    ܭࢉ࿦తೝ஌Պֶ (MIT த৺) ͱਂ૚ֶशɾਆܦՊֶΛఐࢠͱ ͨ͠ AGI ΁ (DeepMind த৺) ͱ͍͏࣠ ਓؒͷೝ஌ʹֶͼɺࣾձֶशɾ໛฿ֶश΍ҼՌϞσϧߏஙͷ ৽͍͠ΞϧΰϦζϜΛఏҊ ܭࢉ࿦తೝ஌ՊֶͷݚڀάϧʔϓΛ೔ຊͰ্ཱͪ͛ΔͷͰɺ ͥͻ͝ࢀՃΛ ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3) ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश 2018-06-23 Sat 23 / 23