Upgrade to Pro — share decks privately, control downloads, hide ads and more …

認知科学からの視点:満足化によるエミュレーションと,判定問題としての強化学習 - Shibuya Synapse 3 - 2018 06Jun 23 Sat - CompCogSci and RL

認知科学からの視点:満足化によるエミュレーションと,判定問題としての強化学習 - Shibuya Synapse 3 - 2018 06Jun 23 Sat - CompCogSci and RL

Tatz Takahashi

June 23, 2018
Tweet

More Decks by Tatz Takahashi

Other Decks in Science

Transcript

  1. ೝ஌Պֶ͔Βͷࢹ఺ɿ
    ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ
    ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ)
    Shibuya Synapse #3 - ݱࡏͷڧԽֶशʹԿ͕଍Γͳ͍ͷ͔ʁ
    2018-06-23 Sat
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 1 / 23

    View full-size slide

  2. Outline
    1 ͸͡Ίʹ
    2 ࠷ۙͷٞ࿦
    3 ڧԽֶशʹ͓͚Δຬ଍Խ
    4 ଟ࿹όϯσΟ
    οτ໰୊
    5 ٞ࿦
    6 ݶఆ߹ཧੑ
    7 ·ͱΊ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 2 / 23

    View full-size slide

  3. ͸͡Ίʹ
    ೝ஌Պֶ (Cognitive Science)
    ৺Λ৘ใॲཧػցͱݟͳ͠ɺͦͷ৘ใදݱ (representation; data
    structure) ͱਪ࿦ (inferenece; algorithm) Λղ໌͢Δɻ
    ࢀߟ: ࢲͷϒοΫϚʔΫ ܭࢉ࿦తೝ஌Պֶ ਓ޻஌ೳ Vol.31.No.2(2016/3)
    ೝ஌Պֶ
    ্هͷߟ͑Λத৺ʹɺओʹҎԼ
    6 ͭͷ෼໺͕ڠಇ
    1 ఩ֶ
    2 ݴޠֶ
    3 ਓྨֶ
    4 ਆܦՊֶ
    5 ਓ޻஌ೳ
    6 ৺ཧֶ
    Wikipedia ΑΓ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 3 / 23

    View full-size slide

  4. ͸͡Ίʹ
    ͳͥਓ޻஌ೳʹೝ஌Պֶɾ৺ཧֶ͕ؔ܎ʁ
    Ԥถͷ AI ݚڀऀͷҙ֎ͱଟ͕͘৺ཧֶՊग़਎ ྫɿ
    Geoffrey E. Hinton
    1970 ೥έϯϒϦοδେ࣮ݧ৺ཧֶՊଔ (PhD ͸ AI)
    Michael I. Jordan
    1978 ೥ϧΠδΞφभཱେ৺ཧֶՊଔ (PhD: UCSD ೝ஌Պֶ)
    ೝ஌Պֶɾ৺ཧֶͷ୅දతͳࡶࢽ
    Psychological Review:
    ৺ཧֶ෼໺ͷτοϓδϟʔφϧɺ௕͍ཧ࿦త࿦จ͕ࡌΔ
    Cognitive Science:
    ೝ஌Պֶ෼໺ͷτοϓδϟʔφϧ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 4 / 23

    View full-size slide

  5. ͸͡Ίʹ
    ύʔηϓτϩϯ 1958
    χϡʔ ϥ ϧ ωοτ ͷ ݪ ܕ
    By Source (WP:NFCC#4), Fair use,
    https://en.wikipedia.org/w/index.php?curid=47541432
    Psychological Review ࢽ 1958
    Psychological Review
    Vol. 65, No. 6, 19S8
    THE PERCEPTRON: A PROBABILISTIC MODEL FOR
    INFORMATION STORAGE AND ORGANIZATION
    IN THE BRAIN1
    F. ROSENBLATT
    Cornell Aeronautical Laboratory
    If we are eventually to understand and the stored pattern. According to
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 5 / 23

    View full-size slide

  6. ͸͡Ίʹ
    ࠶ؼܕχϡʔϥϧωοτϫʔΫ 1990
    RNN, LSTM, . . .
    Cognitive Science ࢽ 1990
    COGNITIVE SCIENCE 14, 179-211 (1990)
    Finding Structure in Time
    JEFFREYL.ELMAN
    University of California, San Diego
    Time underlies many interesting human behaviors. Thus, the question of how to
    represent time in connectionist models is very important. One approach Is to rep-
    resent time implicitly by its effects on processing rather than explicitly (as in a
    spatial representation). The current report develops a proposal along these lines
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 6 / 23

    View full-size slide

  7. ͸͡Ίʹ
    ϘϧπϚϯϚγϯ 1985
    RBM, . . .
    Cognitive Science ࢽ 1985
    COGNITIVE SCIENCE 9, 147-169 (1985)
    A Learning Algorithm for
    Boltzmann Machines*
    DAVID H. ACKLEY
    GEOFFREY E. HINTON
    Computer Science Department
    Carnegie-Mellon University
    TERRENCE J. SEJNOWSKI
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 7 / 23

    View full-size slide

  8. ͸͡Ίʹ
    ڧԽֶश 1981
    Q-learning, Actor-Critic, DQN, AlphaGo, . . .
    Psychological Review ࢽ 1981
    Psychological Review
    1981, Vol. 88, No. 2, 135-170
    Copyright 1981 by the American Psychological Association, Inc.
    0033-295X/8I/8802-OI35$00.75
    Toward a Modern Theory of Adaptive Networks:
    Expectation and Prediction
    Richard S. Sutton and Andrew G. Barto
    Computer and Information Science Department
    University of Massachusetts—Amherst
    Many adaptive neural network theories are based on neuronlike adaptive elements
    that can behave as single unit analogs of associative conditioning. In this article
    we develop a similar adaptive element, but one which is more closely in accord
    with the facts of animal learning theory than elements commonly studied in
    adaptive network research. We suggest that an essential feature of classical
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 8 / 23

    View full-size slide

  9. ͸͡Ίʹ
    ʮ஌ೳ৘ใσβΠϯίʔεʯ
    ൃදऀͷॴଐɺ౦ژిػେֶ ཧ޻ֶ෦ ৘ใγεςϜσβΠϯֶܥ ಺
    ஌ೳ৘ใσβΠϯίʔεͰ͸ɺਓؒͷ೴ͱ৺ͷಇ͖ͱͦͷಛੑʹ
    ͍ͭͯཧղ͠ɺ౷ܭֶɾσʔλ෼ੳʹجͮ͘໰୊ൃݟɾղܾͷͨ
    Ίͷ৘ใ෼ੳೳྗͱɺਓؒͷ஌ೳΛ୅ସ͠͏ΔೳྗΛ࣋ͬͨγε
    ςϜͷઃܭɺධՁΛߦ͏ͨΊͷ஌ࣝͱೳྗΛཆ͍·͢ɻਓؒͷ೴
    ͱ৺ͷಇ͖ͱͦͷಛੑΛཧղ͢Δʹ͸ʮίϛϡχέʔγϣϯɾ৺
    ཧʯ෼໺ͷՊ໨Λֶͼ·͢ɻ౷ܭֶɾ σʔλ෼ੳʹجͮ͘໰୊ൃ
    ݟɾղܾͷͨΊͷ৘ใ෼ੳೳྗʹ͍ͭͯ͸ʮ৘ใՊֶʯ෼໺ͷ౷
    ܭ΍ଟมྔղੳʹؔ͢ΔՊ໨Λֶͼ·͢ɻਓ޻஌ೳγεςϜͷઃ
    ܭɺධՁΛߦ͏ͨΊʹ͸ɺ
    ʮ৘ใγεςϜʯ
    ʮ৘ใϝσΟΞʯ෼໺
    ͷίϯϐϡʔλγεςϜͷݪཧͱγεςϜߏஙʹؔ͢ΔՊ໨ɺਓ
    ޻஌ೳϓϩάϥϛϯά IɾII Λத৺ͱͨ͠ʮϓϩάϥϛϯάʯ෼໺
    ͷՊ໨ΛֶΜͰ͍͖·͢ɻ
    ʢϤʔϩούͷڞಉݚڀऀ΍๺ถͷֶੜ͔ΒධՁʣ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 9 / 23

    View full-size slide

  10. ࠷ۙͷٞ࿦
    Lake ࿦จɿ
    ʮਓؒͷΑ͏ʹֶश͠ߟ͑ΔϚγϯ
    Λ࡞Δʹ͸ʯ
    Building Machines That Learn and Think Like People
    Behavioral and Brain Sciences, 2017
    Lake, Brenden M: Bayesian Program Learning (Science, 2015)
    Ullman, Tomer D: MIT PD ൃୡϞσϦϯά
    Tenenbaum, Joshua B: MIT ͷܭࢉ࿦తೝ஌ՊֶͷϦʔμʔ
    Gershman, Samuel J: ϋʔόʔυ ܭࢉ࿦తਆܦՊֶͱೝ஌Պֶ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 10 / 23

    View full-size slide

  11. ࠷ۙͷٞ࿦
    Behavioral and Brain Sciences ࢽ
    ߦಈɺ೴ɺೝ஌ܥͷ୅දతϨϏϡʔࢽ, Cambridge Univ. Press
    Α͘ಡ·Ε͓ͯΓ (IF 14 Ҏ্)
    ௕͍ϨϏϡʔɾҙݟ࿦จͱɺ
    ଟ਺ͷίϝϯλϦʔʢ൓࿦΍ίϝϯτʣͱɺ
    ίϝϯλϦʔʹର͢ΔஶऀΒͷճ౴Λಉ࣌ܝࡌ
    Φʔϓϯͳٞ࿦Λ௨ͯ͡෼໺ͷڞ௨ݟղΛ࡞Δ໾ׂ
    ͜͜Ͱ঺հ͢Δ࿦จ΋௕͍ʢೋஈ૊Ͱɺຊจ 25 ϖʔδɺ27 ݅ͷί
    ϝϯλϦʔ͕ 26 ϖʔδɺஶऀΒͷճ౴͕ 10 ϖʔδɺจݙ͕ 13
    ϖʔδʣͷͰɺ೔ຊޠ༁Λ࡞੒ͨ͠ʢͲ͔͜Ͱग़൛͠·͢ʣ
    ɻ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 11 / 23

    View full-size slide

  12. ࠷ۙͷٞ࿦
    ΠϯτϩμΫγϣϯ
    1. P•p“jZbŽ•
    • ȮŠʼnʥ3ɂ¼˞˧ (CNN)—ɗ˜˞˧ (RNN)
    -íƇ (DQN) ,ɝˋ>Ɵõ
    • 3ȋ0ȋǽ¬̪3E0ʼnʥ—ƐʧH
    -4™¼.-LƙĜHLʧŕ
    • ˞ɣɰʼn—ɗ̒ƌɊʼn—AIF3¬̪F3
    ıȳLȄˎ¬̪3E0ʼn7ʧH…b•3
    %A3ǫ˯ə/ˇʑLǀǼ
    • ɉɄ3AI3àʧŕ-­ƅDH;-
    – ȮŠʼnʥ3˞ɣ^N-3ʙĕ
    – xi˜•˞˧ vs. ‰o‘ȅʈ
    7
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 12 / 23

    View full-size slide

  13. ࠷ۙͷٞ࿦
    DeepMind ͔Βͷ൓࿦
    “Building machines that learn and think for themselves ”
    Botvinick, . . . , Legg, and Hassabis (19 ໊)
    جຊతʹ Lake Βʹಉҙ͢Δ͕ɺ
    ࣗ཯ੑ autonomy ͕࠷΋ॏཁ
    όΠΞεͷ࡞ΓࠐΈ human hand engineering ͸ྑ͘ͳ͍
    ͨ·ͨ· Lake Βͷॏࢹͨ͠ਓؒͷόΠΞεͱͯ͠ͷ௚ײ෺ཧ
    ֶ΍௚ײ৺ཧֶʹ͍ͭͯ͸େྔͷՊֶతσʔλͱϞσϧ͕͋
    Δ͕ɺଞʹ͍ͭͯ͸͋·Γͳ͍ͷͰɺ࡞ΓࠐΈՄೳͳྖҬ΋
    গͳ͍
    AI ͕ΑΓෳࡶͳݱ࣮ੈքʹཱͪ޲͔͏΄Ͳʹɺࣗ཯తֶश͕
    ॏཁʹͳΔ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 13 / 23

    View full-size slide

  14. ࠷ۙͷٞ࿦
    Lake Βͷ࠶൓࿦
    ࣗ཯ੑʹΑΔ૑ൃ emergence ʹ೚ͤΔͷ͸ݱ࣮తͰͳ͍
    χϡʔϥϧωοτͷޯ഑ֶशͰɺਓ͕ؒ࣋ͭΑ͏ͳʮཧ࿦ʯ
    Λ֫ಘͰ͖Δͱ͸ࢥ͑ͳ͍
    ࠷ۙͷೝ஌Պֶͷٞ࿦͸ɺҎԼͷΑ͏ͳରཱ࣠Λ௒͑ͭͭ
    ͋Δ
    ʢਓؒͷॾೳྗ͸ʣੜ·Ε͔ҭ͔ͪ
    ʢਓ͕ؒ࣋ͭͷ͸ʣཧ࿦͔அย͔
    ʢਓؒͷਪ࿦͸ʣه߸త͔४ه߸త͔
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 14 / 23

    View full-size slide

  15. ࠷ۙͷٞ࿦
    ·ͱΊ
    Tenenbaum Β͸ਓؒͷΑ͏ͳϚγϯΛ࡞Ζ͏ͱ͍ͯ͠Δ
    Hassabis Β͸ϘτϜΞοϓͰɺਆͷࢠͷΑ͏ͳϚγϯΛ࡞Ζ
    ͏ͱ͍ͯ͠Δ (Solve intelligence; AGI)
    ͲͪΒʹ΋ൈ͚͍ͯΔ؍఺͸ɺਓؒͷ஌ੑͷࣾձੑͰ͋Δ
    ࢠڙ͸ɺجຊతͳ਎ମͷՄೳੑʹ͍ͭͯ͸ࢼߦࡨޡʢڧԽֶ
    शతʣͰֶͼ
    ͦͷޙ͸ಉ๔΍େਓͷ ໛฿ʹΑֶͬͯͿ
    ࣾձֶशɺಛʹ໛฿Ͱଟ͘ͷجຊతͳߦಈ୯ҐΛ֫ಘ͠ɺͦ
    ͷௐ੔ͰڧԽֶशͳͲΛ༻͍Δ
    ໛฿ʹ͸େผͯ͠ imitation (how ͷ໛฿) ͱ emulation (what
    ͷ໛฿) ͕͋Δ͕ɺޙऀͷϞσϧ͸΄ͱΜͲͳ͍
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 15 / 23

    View full-size slide

  16. ڧԽֶशʹ͓͚Δຬ଍Խ
    ೝ஌ɾࣾձతݱ৅ɿ
    ʮΤϛϡϨʔγϣϯʯ
    ʮΤϛϡϨʔγϣϯʯͱͯ͠౷Ұతʹߟ͍͑ͨݱ৅
    (emulation: ڝ૪, ுΓ߹͍, ର߅)
    Կ͔ʹ੒ޭͨ͠ͱ͍͏৘ใ͚ͩͰɺޙଓͷ੒ޭ͕ଓ͘
    ΞϝϦΧͰͷݪര։ൃͷ৘ใ͚ͩͰι࿈ͷ։ൃΛ͔ͳΓՃ଎ʁ
    બख A ͕ 100m ૸Ͱ 10 ඵΛ੾Δ΍൱΍ɺଞͷબख΋ 10 ඵΛ੾
    Γ࢝ΊΔɻ
    A ͷ૸๏ͳͲΛࢀߟʹ͠ͳ͍ͱͯ͠΋ 10 ඵΛ੾Γ΍͘͢
    Ͱ͖Δͱ৴͡ΔͱɺͰ͖Δ͜ͱ΋͋ΔɻͰ͖ͳ͍ͱ৴ͨ͡Βɺ
    ·ͣͰ͖ͳ͍ɻ
    ݚڀͰࢦಋڭһ͕ʮͰ͖Δʯͱଠޑ൑Λԡ͞ͳ͍ͱɺֶੜ͸ͳ
    ͔ͳ͔Ͱ͖ͳ͍ɻ͔͠͠ࢦಋڭһ͕΍ΓํΛ஌͍ͬͯΔΘ͚Ͱ
    ͸ͳ͍ʢ஌͍ͬͯΔͳΒ΋͏ݚڀͰ͸ͳ͍ʣ
    ɻ
    ੈք͸਺ֶͰهड़Ͱ͖Δͱݴ͏ ෆ߹ཧͳ߹ཧੑͷલఏ
    ۙ୅Պֶ΁
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 16 / 23

    View full-size slide

  17. ڧԽֶशʹ͓͚Δຬ଍Խ
    ΤϛϡϨʔγϣϯΛ׆༻ͨ͠ڧԽֶशͷϞσϧ
    Ҏ্ͷࣾձతݱ৅͸ɺ͋Δछͷࣾձֶशͱͯ͠ཧղՄೳ
    ʮݶఆ߹ཧੑʯ΍ʮຬ଍ԽʯʹΠϯεύΠΞ͞ΕͨΞϧΰϦζϜ
    ͕ͦͷϞσϧͱͳΔ
    ͋Δछͷʮه࿥ʯ΍ʮୡ੒ਫ४ʯΛ༩͑ΒΕΔͱɺͦΕΛ௒͑Δ
    ߦಈܥྻΛ୳ࡧɾߏங͢Δ
    ڧԽֶशͰɺୡ੒ج४Λ༩͑ΒΕΔͱɺຬ଍ͳߦಈܥྻ͕΋
    ͠ଘࡏ͢Ε͹ͦΕΛޮ཰తʹൃݟͰ͖ΔΞϧΰϦζϜʢRS ߴ
    ڮ, ߕ໺ & Ӝ্ 2016; ͦͷܗࣜతੑ࣭ ۄ଄ & ߴڮ, JSAI
    2018, in prep.; RS-GRC, ߕ໺ et al. JSAI 2018 ࠤௗ et al., ଖా
    et al.ʣ
    τοϓμ΢ϯͳୡ੒ج४ʹΑΓɺ७ਮͳϘτϜΞοϓΑΓ΋
    ୳ࡧۭ͕ؒѹॖɾߏ଄Խ
    ͞ΒʹʮΤϛϡϨʔγϣϯΛ׆༻ͨ͠ڧԽֶशͷϞσϧʯʹ
    ͸ܭࢉ࿦తͳଆ໘͋Γɿ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 17 / 23

    View full-size slide

  18. ڧԽֶशʹ͓͚Δຬ଍Խ
    ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    ൑ఆ໰୊ (decision problem) ɿ
    ʢ͋Δ੍໿ू߹ C ͷԼʣx ͕͋
    Δੑ࣭Λຬ͔ͨ͢Λ yes/no Ͱ౴͑Α
    ࠷దԽ໰୊ɿ͋Δ੍໿ू߹ C ͷԼɺx = argmaxx′
    f (x′) Λݟ
    ͚ͭΑ
    ࠷దԽ໰୊͸ܾఆ໰୊ʹม׵Ͱ͖Δɻ100m Λ X ඵͰ૸ΕΔ͔ɺ
    ͱ͍͏ܾఆ໰୊Ͱɺ X Λ {9.0, 9.1, ..., 10.0} ͱͯ͠ 11 ௨Γ΍ͬͯ
    ΈΕ͹ɺ X ͷ࠷খ஋ͷൣғ͕෼͔Δʢ9.1 Ͱ noɺ9.2 Ͱ yes ͳΒ
    ͹ɺݶք͸ (9.1, 9.2] ʹ͋Δʣ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 18 / 23

    View full-size slide

  19. ଟ࿹όϯσΟ
    οτ໰୊
    ೝ஌తຬ଍ԽՁ஋ؔ਺
    RS (risk-sensitive satisficing)
    ߦಈ ai
    ʹରͯ͠ɺ
    ͦΕΛࢼͨ͠ճ਺ʢ
    ʮࢼߦྔʯ
    ʣΛ n(ai
    ) ɺ
    ܦݧظ଴஋ʢใुฏۉʣΛ V (ai
    ) ɺ
    ૯ࢼߦ਺ʢʹεςοϓ਺ʣΛ N = Σn(ai
    )
    ͱͯ͠ɺຬ଍ԽՁ஋ؔ਺ RS ͸ߦಈ ai
    ͷՁ஋Λ࣍ͰධՁ͠ greedy
    ʹબ୒
    RS(ai
    ) =
    n(ai
    )
    N
    (
    V (ai
    ) − R
    )
    (1)
    ͜ͷ RS ஋Λ greedy ʹӡ༻
    R ͸ຬ଍Խͷج४
    (
    V (ai
    ) − R
    )
    > 0(< 0) ͳΒ ai
    ͸ຬ଍Ͱ͖Δ (Ͱ͖ͳ͍) બ୒ࢶ
    ৄ͘͠͸ۄ଄ɾߴڮ (JSAI 2018, 1N1-04)
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 19 / 23

    View full-size slide

  20. ଟ࿹όϯσΟ
    οτ໰୊
    RS ʹ͍ͭͯ
    όϯσΟ
    οτ໰୊Ͱͷੑ࣭
    ඞͣຬ଍Խ͢Δอূ͋Γ
    ຬ଍ԽͷҙຯͰͷ regret ͸༗ݶʹཹ·Δʢී௨͸࠷దͰ΋ log
    Ͱ੒௕ʣ
    ຬ଍Խج४͕ʮ࠷దʯͳΒ࠷దԽ
    JSAI 2018 ۄ଄ɾߴڮ; ౤ߘ४උத.
    όϯσΟ
    οτ໰୊Ͱɺ R ͸νʔτͳ͠ʹࣗ෼ͰܾΊΒΕɺ
    regret ͸࠷ద (log Φʔμʔɺ UCB ܥΑΓྑ͍)
    JSAI 2018 ߕ໺ɾߴڮ
    ڧԽֶशͰɺຬ଍ͳߦಈܥྻΛޮ཰Α͘ൃݟՄೳ
    JSAI 2017 ڇాɾߕ໺ɾߴڮ
    JSAI 2018 ࠤௗ et al., ଖా et al.
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 20 / 23

    View full-size slide

  21. ٞ࿦
    ຬ଍ԽͱڧԽֶश
    ैདྷڧԽֶश͸ಈతܭը๏ɺ࠷ద੍ޚʹجͮ͘࠷దԽ໰୊
    ϏσΦήʔϜ (DQN; Atari) ΍ϘʔυήʔϜ (AlphaGo; ғޟ)
    ͳΒͱ΋͔͘ɺਓ͕ؒ΍͍ͬͯΔΑ͏ͳߦಈֶश͸࠷దԽͱ
    ͯ͠͸೉͍͠ͷͰ͸ͳ͍͔
    ೝ஌తຬ଍ԽͷϞσϧʹΑΓɺڧԽֶशλεΫΛ൑ఆ໰୊ͱ
    ͯ͠ଊ͑௚͢
    ਓؒ΍ಈ෺΋ɺཚ਺Λ࢖ͬͨ໢ཏతͳ୳ࡧ͔Βͷ࠷దԽΛ໨
    ࢦ͢ͱ͍͏ΑΓɺλεΫʹ͋Δ࿮૊ΈΛ՝ͯ͠ʢྫ͑͹ຬ଍
    Խج४ɺͦͯ͠ҼՌϞσϧʣ
    ɺશͯͷՄೳੑΛߟྀͤͣʹ͏·
    ͘΍͍ͬͯΔ͸ͣ
    ਓؒͷ৔߹ʢൃୡʣ
    ɺࣾձతʹجຊతͳߦಈΛ ໛฿Ͱ֫ಘ ͠ɺ
    ڧԽֶशͰௐ੔ ͱ͍͏ೋஈ֊Λ౿Ή (ߦಈֶशͷଟஈ֊ཧ࿦ɺ
    ࣗવͳ֊૚Խ΁)
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 21 / 23

    View full-size slide

  22. ݶఆ߹ཧੑ
    ຬ଍Խ (satisficing)
    ਓؒͷ஌֮ɺਪ࿦ɺߦಈʹ͸ਫ਼౓ɺܭࢉྔɺޮՌʹݶք
    ݶఆ߹ཧੑ
    ͦΜͳதͰɺ࠷దԽ͸ଟ͘ͷ৔߹ʹෆՄೳ
    ࠷దԽɿঢ়گԼͰ࠷ྑͷબ୒ࢶ΍ߦಈܥྻͷબ୒ɾܗ੒
    ͦ͏͍ͬͨ৔߹ʹ͸ ຬ଍Խ satisficing ͕༗ޮ
    satisfice = satisfy + suffice
    ݹయతຬ଍Խɿ ୳ࡧˠຬ଍
    ୳ࡧ ͋Δج४Λຬͨ͢Α͏ͳબ୒ࢶ͕ݟ͔͍ͭͬͯͳ
    ͚Ε͹ɺ৭ʑͳબ୒ࢶΛϥϯμϜʹબΜͰ୳͢
    ຬ଍ ҰͭͰ΋ݟ͔ͭΕ͹΋͏ͦΕͰྑ͍ͱͯͦ͠ΕΛ
    બͼଓ͚Δ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 22 / 23

    View full-size slide

  23. ·ͱΊ
    ·ͱΊ
    AI ͷॏཁݚڀͷ͍͔ͭ͘͸࣮͸৺ཧֶ༝དྷ
    ৺ͷ৘ใॲཧϝΧχζϜΛߟ͑Δ͜ͱ͸஌తγεςϜͷ։ൃ
    ʹॏཁ
    ൚༻ੑͷߴ͍஌తγεςϜͱͯ͠།Ұͷ࣮ྫ
    ͦͷ௚ײతཧղͰͳ͘ɺՊֶతཧղ͕ॏཁ
    ೝ஌Պֶ͸ܭࢉ࿦తϨϕϧͷٞ࿦͕ಘҙ
    ࠷ۙͷॏཁͳٞ࿦ΛऔΓ্͛ͨ
    ܭࢉ࿦తೝ஌Պֶ (MIT த৺) ͱਂ૚ֶशɾਆܦՊֶΛఐࢠͱ
    ͨ͠ AGI ΁ (DeepMind த৺) ͱ͍͏࣠
    ਓؒͷೝ஌ʹֶͼɺࣾձֶशɾ໛฿ֶश΍ҼՌϞσϧߏஙͷ
    ৽͍͠ΞϧΰϦζϜΛఏҊ
    ܭࢉ࿦తೝ஌ՊֶͷݚڀάϧʔϓΛ೔ຊͰ্ཱͪ͛ΔͷͰɺ
    ͥͻ͝ࢀՃΛ
    ߴڮ ୡೋ (౦ژిػେֶ, υϫϯΰਓ޻஌ೳݚڀॴ) (SS3)
    ೝ஌Պֶ͔Βͷࢹ఺ɿ ຬ଍ԽʹΑΔΤϛϡϨʔγϣϯͱɼ ൑ఆ໰୊ͱͯ͠ͷڧԽֶश
    2018-06-23 Sat 23 / 23

    View full-size slide