Upgrade to Pro — share decks privately, control downloads, hide ads and more …

iOSと(深層)強化学習

yuky_az
August 31, 2018

 iOSと(深層)強化学習

iOSDC2018における、「iOSに深層強化学習は実装できるのか?」
というテーマの発表です。

発表者: 我妻幸長(@yuky_az)

Sec. 1: 強化学習とは?
Sec. 2: Swiftで強化学習
Sec. 3: Accelerate Frameworkによるニューラルネットワーク
Sec. 4: Swiftで深層強化学習

発表に使用した動画はこちら。
【Q学習でCart Pole】
https://youtu.be/lugIwpsSmBk

【Deep Q-NetworkでCart Pole】
https://youtu.be/hgDTCEZKxb8

【ぶるぶる移動作戦】
https://youtu.be/NsSEGsJokdg

【スイッチ作戦】
https://youtu.be/kqECbWKkq98

【カウンターアタック】
https://youtu.be/zIR0pw1AlKk

yuky_az

August 31, 2018
Tweet

More Decks by yuky_az

Other Decks in Programming

Transcript

  1. J04ͱʢਂ૚ʣڧԽֶश
    :VLJOBHB"[VNB
    !ZVLZ@B[
    J04%$+BQBO

    View Slide

  2. ࣗݾ঺հ
    :VLJOBHB"[VNB

    ʢզ࠺޾௕ʣ
    4"*-BCגࣜձࣾ୅දऔక໾
    !ZVLZ@B[
    ʮώτͱ"*ͷڞੜʯ͕ϛογϣϯ
    ࠷ۙΑ͘ݴΘΕΔ͜ͱ
    6EFNZͰ"*ؔ࿈ͷߨ࠲Λෳ਺ల։
    ສਓۙ͘ͷडߨੜ
    ࿩͠ํ͕"*ʹࣅ͖ͯͨͶ

    View Slide

  3. J04%$ʹొஃͨ݁͠Ռʜ
    4#ΫϦΤΠςΟϒΑΓൃചʂ

    View Slide

  4. ൃදͷྲྀΕ
    4FDڧԽֶशͱ͸ʁ
    4FD4XJGUͰڧԽֶश
    4FD"DDFMFSBUF'SBNFXPSLʹΑΔχϡʔϥϧωοτϫʔΫ
    4FD4XJGUͰਂ૚ڧԽֶश

    View Slide

  5. ൃදͷςʔϚ
    J04ʹਂ૚ڧԽֶश͸࣮૷Ͱ͖Δͷ͔ʁ

    View Slide

  6. Sec. 1 ڧԽֶशͱ͸ʁ
    wڧԽֶशͷ֓ཁ
    wڧԽֶशʹඞཁͳ֤֓೦
    w2ֶश
    wਂ૚ڧԽֶश

    View Slide

  7. ਓ޻஌ೳ(AI)ɺػցֶशɺڧԽֶश
    ਓ޻஌ೳ "*

    ػցֶश
    σΟʔϓϥʔχϯά
    ڧԽֶश

    View Slide

  8. ڧԽֶशͱ͸ʁ
    ػցֶशͷҰछ
    ࢼߦࡨޡΛ௨ͯ͡ʮ؀ڥʹ͓͍ͯ࠷΋ใु͕ಘΒΕ΍͍͢ߦಈʯΛ
    ʮΤʔδΣϯτʯֶ͕श͢Δ
    ߦಈ
    ใु
    ΤʔδΣϯτ
    ؀ڥ
    ֶश

    View Slide

  9. ڧԽֶशͷԠ༻ྫ
    ήʔϜͷ߈ུ
    ˠ"MQIB(PɺϒϩοΫ่͠ͷ߈ུɺϧʔϏοΫΩϡʔϒͷ߈ུFUDʜ
    ϩϘοτͷ੍ޚ
    ˠೋ଍าߦϩϘοτɺ࢈ۀ༻ϩϘοτɺFUDʜ
    σʔληϯλʔͷిྗ࡟ݮ
    ˠIUUQTCMPHHPPHMFPVUSFBDIJOJUJBUJWFTFOWJSPONFOUEFFQNJOEBJSFEVDFTFOFSHZVTFEGPS

    Ϗϧͷ஍਒ରࡦ
    ˠIUUQTJOGPSJVNOUUEBUBDPNGPSFTJHIUBJWJCSBUJPODPOUSPMIUNM

    FUDʜ

    View Slide

  10. ڧԽֶशʹඞཁͳ֓೦
    ߦಈʢBDUJPOʣ
    ঢ়ଶʢTUBUFʣ
    ใुʢSFXBSEʣ
    4UBSU
    (PBM
    ΤʔδΣϯτ ؀ڥ

    View Slide

  11. 1. ߦಈʢactionʣ
    ΤʔδΣϯτ͕؀ڥʹಇ͖͔͚Δ͜ͱ
    ໎࿏ͷྫͰݴ͑͹ɺ
    ΤʔδΣϯτ͕໎࿏಺ΛҠಈ͢Δ͜ͱ
    ෳ਺ͷߦಈͷத͔ΒͭΛબ୒͢Δ
    4UBSU
    (PBM

    View Slide

  12. 2. ঢ়ଶʢstateʣ
    ΤʔδΣϯτ͕؀ڥʹ͓͍ͯஔ͔Εͨঢ়ଶ
    ໎࿏ͷྫͰݴ͑͹ɺ
    ΤʔδΣϯτͷҐஔ44͕ঢ়ଶ
    ߦಈʹΑͬͯঢ়ଶ͸มԽ͢Δ
    4 4 4
    4 4 4
    4 4 4

    View Slide

  13. 3. ใुʢrewardʣ
    ΤʔδΣϯτ͕ड͚औΔใु
    ໎࿏ͷྫͰݴ͑͹ɺ
    ΤʔδΣϯτ͕ΰʔϧʹ౸ୡ͢Ε͹ͷใु
    ΤʔδΣϯτ͕᠘ʹ౸ୡ͢Ε͹ͷใु
    ใुΛ΋ͱʹɺ࠷దͳߦಈΛֶश͍ͯ͘͠
    4UBSU
    (PBM
    !
    "

    View Slide

  14. Qֶश
    2ֶश͸ɺڧԽֶशͷҰछͰɺ֤ঢ়ଶͱߦಈͷ૊Έ߹Θͤʹ2஋Λઃఆ
    ΤʔδΣϯτ͸࠷΋2஋ͷߴ͍ߦಈΛબ୒͢Δ
    25BCMFͷ֤஋͕࠷దԽ͞ΕΔ͜ͱͰֶश͢Δ
    ˢ ˣ ˡ ˠ
    4
    4
    4
    ʜ ʜ ʜ ʜ ʜ
    4 4 4
    4 4 4
    4 4 4
    25BCMF

    View Slide

  15. Q஋ͷߋ৽
    ߦಈͷ݁ՌɺಘΒΕͨใुͱ࣍ͷঢ়ଶͰ࠷େͷ
    2஋ʢׂΓҾ͘ʣ͔Βݱࡏͷ2஋Λࠩ͠Ҿ͘
    ͜Εʹֶश܎਺ʢͳͲʣΛ͔͚ͯ2஋ͷߋ৽
    ྔͱ͢Δ
    4UBSU
    (PBM
    !
    "

    2஋ͷߋ৽ྔֶश܎਺Y ใुׂҾ཰Y࣍ͷঢ়ଶͰ࠷େͷ2஋ݱࡏͷ2஋

    ߦಈʹΑΔ2஋ͷߋ৽

    View Slide

  16. Qֶशͷ໰୊఺
    ѻ͏ঢ়ଶͷ਺͕ଟ͍ͱ25BCMF͕ڊେʹͳΓ
    ֶश͕͏·͘ਐ·ͳ͘ͳͬͯ͠·͏
    ˢ ˣ ˡ ˠ
    4
    4
    ʜ ʜ ʜ ʜ ʜ
    4
    25BCMF

    View Slide

  17. ਂ૚ڧԽֶश
    ڧԽֶशʹਂ૚ֶशʢσΟʔϓϥʔχϯάʣΛऔΓೖΕͨͷ͕ਂ૚ڧԽֶश

    %FFQ2/FUXPSL %2/
    ͸ਂ૚ڧԽֶशͷҰछͰɺ
    25BCMFͷ୅ΘΓʹχϡʔϥϧωοτϫʔΫΛ࢖༻͢Δ
    ˢ ˣ ˡ ˠ
    4
    4
    4
    ʜ ʜ ʜ ʜ ʜ

    View Slide

  18. ʜ
    ঢ়ଶ
    st
    ϓϨʔϠʔͷҐஔ
    ϓϨʔϠʔͷ଎౓
    ఢΩϟϥͷҐஔ
    ఢΩϟϥͷ଎౓
    ɹ2 st, ߦಈ

    ɹ2 st, ߦಈ

    ɹ2 st, ߦಈ

    Deep Q-NetworkʢDQNʣ
    χϡʔϥϧωοτϫʔΫͰɺঢ়ଶ͔Β֤ߦಈͷ2஋ΛٻΊΔ

    View Slide

  19. ʜ
    ঢ়ଶ
    st
    ϓϨʔϠʔͷҐஔ
    ϓϨʔϠʔͷ଎౓
    ఢΩϟϥͷҐஔ
    ఢΩϟϥͷ଎౓
    ɹ2 st, ߦಈ

    ɹ2 st, ߦಈ

    ɹ2 st, ߦಈ

    Deep Q-Networkͷֶश
    ޡࠩͷٯ఻೻
    ޡࠩ ใुׂҾ཰Y࣍ͷঢ়ଶͰ࠷େͷ2஋ݱࡏͷ2஋

    View Slide

  20. Sec. 2 SwiftͰڧԽֶश
    w$BSU1PMF໰୊
    w2ֶशͷ࣮૷
    w$BSU1PMF໰୊ͷσϞ

    View Slide

  21. Cart Pole໰୊
    $BSU1PMF໰୊͸ڧԽֶशͷݹయతͳ໰୊
    $BSUΛࠨӈʹҠಈͤͯ͞ɺ্ʹ৐ͬͨ1PMF͕౗Εͳ͍Α͏ʹ͢Δ
    $BSU
    1PMF
    ঢ়ଶ
    $BSUͷҐஔ
    $BSUͷ଎౓
    1PMFͷ֯౓
    1PMFͷ֯଎౓
    ߦಈ
    $BSUΛࠨʹಈ͔͢
    $BSUΛӈʹಈ͔͢

    View Slide

  22. ؀ڥͷߏங
    4QSJUF,JUΛ࢖ͬͯ%ͷ$BSUͱ1PMFΛ࡞੒
    $BSUͱ1PMFΛKPJOUͰ઀ଓ͠ɺॏྗ͸1PMFʹͷΈʹద༻
    cart = SKSpriteNode(imageNamed: "robot_normal.png")
    cart.size = cartSize
    cart.physicsBody = SKPhysicsBody(rectangleOf: cartSize)
    cart.physicsBody?.affectedByGravity = false
    pole = SKShapeNode(rectOf: poleSize)
    pole.physicsBody = SKPhysicsBody(rectangleOf: poleSize)
    let joint = SKPhysicsJointPin.joint(…
    self.addChild(pole)
    self.addChild(cart)
    self.physicsWorld.add(joint)

    View Slide

  23. Q-Tableͷ࣮૷
    2ֶशΛ༻͍Δ
    ؆୯ʹ͢ΔͨΊʹɺ1PMFͷ֯౓ͱ֯଎౓ͷΈ
    ͔Βঢ়ଶΛܾΊΔ
    1PMFͷ֯౓ɺ֯଎౓ΛͦΕͧΕʹ෼͚ͯ
    σδλϧԽ
    ঢ়ଶͷ਺͸Y௨Γ
    ˡ ˠ
    4
    4
    ʜ ʜ ʜ
    4
    25BCMF

    View Slide

  24. Q-Tableͷ࣮૷
    25BCMF
    var qTable = [[CGFloat]]()
    25BCMFͷߋ৽
    qTable[state][action] += eta * (reward + gamma*maxQNext - qTable[state][action])

    View Slide

  25. ใुʹ͍ͭͯ
    $BSU
    1PMF
    $BSU
    1PMF
    ϑϨʔϜΩʔϓͰ͖ͨΒใु ›܏͍ͨΒใु

    View Slide

  26. Demo: Cart Pole໰୊ -Qֶश-

    View Slide

  27. Sec. 3 Accelerate FrameworkʹΑΔ
    χϡʔϥϧωοτϫʔΫ
    wχϡʔϥϧωοτϫʔΫͷ࣮૷ํ๏Λબఆ
    w"DDFMFSBUF'SBNFXPSL
    w#-"4ʢ#BTJD-JOFBS"MHFCSB4VCQSPHSBNTʣ
    wߦྻͷૢ࡞

    View Slide

  28. ਂ૚ڧԽֶशΛ࣮૷͢ΔͨΊʹ…
    ༧ଌ͚ͩͰ͸ͳֶ͘श͕Ͱ͖ΔχϡʔϥϧωοτϫʔΫ͕ඞཁ
    ॱ఻೻ʹΑΔ༧ଌ
    ٯ఻೻ʹΑΔֶश
    ೖྗ ग़ྗ

    View Slide

  29. iOS ػցֶशؔ࿈Frameworkͷߏ੒
    $PSF.-
    :PVSBQQ
    Vision
    "DDFSBSBUFBOE#//4
    ϋʔυ΢ΣΞଆ
    Ϣʔβʔଆ
    .FUBM1FSGPSNBODF4IBEFST
    Natural Language Processing GameplayKit
    IUUQTEFWFMPQFSBQQMFDPNEPDVNFOUBUJPODPSFNM

    View Slide

  30. ֶशՄೳͳϑϨʔϜϫʔΫ͸…?
    $PSF.-
    ˠ܇࿅ʢֶशʣࡁΈͷϞσϧΛΞϓϦʹಋೖ͢Δ
    ˠػցֶशͷ༧ଌʹಛԽ
    .14 .FUBM1FSGPSNBODF4IBEFST

    ˠ(16Λ༻͍ͨ.FUBMͷߴ͍ԋࢉػೳΛΞϓϦʹಋೖ
    ˠػցֶशͷ༧ଌʹಛԽ
    #//4 #BTJD/FVSBM/FUXPSL4VCSPVUJOFT

    ˠ$16ͷੑೳΛϑϧʹҾ͖ग़ͯ͠ԋࢉΛߦ͏

    ˠ༧ଌʹಛԽ
    #
    $
    %

    View Slide

  31. χϡʔϥϧωοτϫʔΫͷࣗ࡞
    χϡʔϥϧωοτϫʔΫΛ࣮૷͢Δͷʹॏཁͳߦྻԋࢉ
    ߦྻੵ
    సஔ

    View Slide

  32. ߦྻੵ





    º
    ߦྻੵ


    ߦ

    YYYY

    View Slide

  33. సஔ
    ߦྻͷɺߦͱྻΛೖΕସ͑Δ

    View Slide

  34. Ͳ͏΍ͬͯߦྻੵͱసஔΛ࣮૷͢Δ͔ʁ
    1VSF4XJGUʹΑΔ࣮૷
    ˠߦྻ΍ϕΫτϧΛࣗ෼Ͱఆٛ͢Δඞཁ
    ˠશͯΛࣗલͰ࣮૷͢Δඞཁ͕͋ΔͷͰख͕͔͔ؒΔ
    "DDFMFSBUF'SBNFXPSLʹΑΔ࣮૷
    ˠઢܗ୅਺ϥΠϒϥϦ#-"4ΛؚΉ
    ˠߦྻੵɺసஔͷͨΊͷؔ਺͕͋Δ
    .FUBMʹΑΔ࣮૷
    ˠ4QSJUF,JUͱͳ͔ͥׯবͯ͠ը໘͕ͪΒͭ͘
    ˠ(16΁සൟʹΞΫηε͢ΔͱύϑΥʔϚϯε͕མͪΔ
    &
    '
    (

    View Slide

  35. "DDFMFSBUF'SBNFXPSLͱ͸ʁ
    ˠେن໛ͳ਺ֶܭࢉɺը૾ԋࢉʹ༻͍ΒΕΔ

    ˠ$16ͷੑೳΛϑϧʹҾ͖ग़ͯ͠ԋࢉΛߦ͏
    ˠߴ͍ύϑΥʔϚϯεɺলిྗ
    ؚΉϥΠϒϥϦ܈

    ˠը૾ॲཧ༻ͷW*NBHF
    ˠχϡʔϥϧωοτϫʔΫ༻ͷ#//4
    ˠઢܗ୅਺༻ͷ#-"4
    ˠFUDʜ

    View Slide

  36. #-"4ʢ#BTJD-JOFBS"MHFCSB4VCQSPHSBNTʣ
    ߦྻϕΫτϧΛද͢ܕɺMB@PCKFDU@U
    ഑ྻ͔ΒߦྻΛੜ੒
    let mat = la_matrix_from_double_buffer(array, rows, cols, cols,
    la_hint_t(LA_NO_HINT), la_attribute_t(LA_DEFAULT_ATTRIBUTES))
    ߦྻੵ
    la_matrix_product(leftMat, rightMat)
    సஔ
    la_transpose(mat)

    View Slide

  37. Sec. 4 SwiftͰਂ૚ڧԽֶश
    w%FFQ2/FUXPSLʢ%2/ʣ
    wχϡʔϥϧωοτϫʔΫͷߏங
    w%FFQ2/FUXPSLͷσϞ

    View Slide

  38. ʜ
    ঢ়ଶ
    st
    ϓϨʔϠʔͷҐஔ
    ϓϨʔϠʔͷ଎౓
    ఢΩϟϥͷҐஔ
    ఢΩϟϥͷ଎౓
    ɹ2
    st
    , ߦಈ

    ɹ2
    st
    , ߦಈ

    ɹ2
    st
    , ߦಈ

    ٯ఻೻ʹΑΔֶश
    Deep Q-NetworkʢDQNʣ

    View Slide

  39. χϡʔϥϧωοτϫʔΫͷߏங
    ߦྻੵͷԋࢉࢠΛఆٛ
    public func *(left: la_object_t, right: la_object_t) -> la_object_t {
    return la_matrix_product(left, right)
    }
    ֤૚ΛΫϥεͱ࣮ͯ͠૷
    class MiddleLayer: BaseLayer{

    }
    సஔͷ࣮૷
    extension la_object_t {
    var trans : la_object_t {
    return la_transpose(self)
    }
    }

    View Slide

  40. χϡʔϥϧωοτϫʔΫͷߏங
    ॱ఻೻ͷϝιου
    func forward(x: la_object_t) -> la_object_t {
    let u_mat = x_mat * self.w + self.b
    let y = sigmoid(u: u)
    return y
    }
    ٯ఻೻ͷϝιου
    func backward(t: la_object_t) -> la_object_t {
    let delta = self.y - t
    self.dW = self.x.trans * delta
    self.db = delta
    let dx = delta * self.w.trans
    return dx
    }

    View Slide

  41. χϡʔϥϧωοτϫʔΫͷߏ੒
    ೖྗ૚ தؒ૚ தؒ૚ ग़ྗ૚
    O O
    1PMFͷ֯౓
    1PMFͷ֯଎౓
    2஋ʢࠨʹಈ͘ʣ
    2஋ʢӈʹಈ͘ʣ

    View Slide

  42. Demo: Cart Pole໰୊ -Deep Q-Network-

    View Slide

  43. ΤʔδΣϯτͷઓུ
    ͿΔͿΔҠಈ࡞ઓ

    View Slide

  44. ΤʔδΣϯτͷઓུ
    εΠον࡞ઓ

    View Slide

  45. ΤʔδΣϯτͷઓུ
    Χ΢ϯλʔΞλοΫ

    View Slide

  46. ·ͱΊ
    ֶशՄೳͳχϡʔϥϧωοτϫʔΫ
    ˠ"DDFMFSBUF'SBNFXPSLͷ#-"4Λ࢖͑͹ൺֱత༰қ
    গͳ͘ͱ΋ɺൺֱతγϯϓϧͳ$BSU1PMF໰୊ͷΑ͏ͳ΋ͷͰ͋Ε͹
    J04ʹਂ૚ڧԽֶश͸࣮૷Ͱ͖Δ
    $BSU1PMF໰୊
    ˠ2ֶशɺ%FFQ2/FUXPSLΛ࣮૷
    ˠ"*͕ࢥΘ͵૑଄ੑʢʁʣΛൃش͢Δ

    View Slide

  47. ͝੩ௌɺ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ
    ຊ೔࢖༻ͨ͠ίʔυ
    IUUQTHJUIVCDPNZVLJOBHB$BSU1PMF4XJGU
    IUUQTHJUIVCDPNZVLJOBHB$BSU1PMF%FFQ4XJGU

    View Slide