Upgrade to Pro — share decks privately, control downloads, hide ads and more …

rinkou01_Cooperation Emergence under Resource-Constrained Peer Punishment

tom--bo
February 01, 2017

rinkou01_Cooperation Emergence under Resource-Constrained Peer Punishment

group learning with reading "Cooperation Emergence under Resource-Constrained Peer Punishment"

tom--bo

February 01, 2017
Tweet

More Decks by tom--bo

Other Decks in Research

Transcript

  1. Cooperation Emergence under
    Resource-Constrained Peer Punishment
    Samhar Mahmoud, Simon Miles, Michael Luck
    King’s College London, London, UK
    :7

    View Slide

  2. ABOUT THIS PAPER
    • Publisher
    – AAMAS’16 Proceedings of the 2016
    International Conference on Autonomous
    Agents & Multiagent Systems, p900-908
    • Keywords
    – Metanorm, Emergence, Limited Enforcement Cost

    View Slide

  3. ABSTRACT
    • bšIĤĦĬĵĆîëā
    ·”"(peer
    punishment)Ĉ‰ž‘(social norm)ė$øĔ
    ýĎó
    • ôćķĭĽă÷āAxelrod[1]ćĶĩž‘ĠŁĵķĭ
    Ľðêēæûć[GĂêĔMahmoud[22,23]Žðê
    Ĕ
    • ÷ï÷æf=ćĶĩž‘ĠŁĵĂĈ”"Ć@øĔġ
    ĦĮė•U÷āîĒùæôć¦dĂĈ”"ĆďġĦĮ
    ė•U÷ýޑė$ĂñĔXvė_m÷āë
    Ĕ

    View Slide

  4. INTRODUCTION
    • ġĿijĸŁĩĤĦĬĵĆîëāĈj·ć wð…
    i·Ć™:ą0c§ôēæ;»pć|Ĉ¼÷
    ë
    • ;»p3ĂąëĤĦĬĵĆîëā‰ž‘
    (Social Norm)ĈšDM‚ąĜŁĥěĿĮė“!
    øĔýĎóĂêĔ
    • ĜŁĥěĿĮė“!÷扞‘ė$øĔevă÷
    ā”"ėëĔôăðn¢öĕāñý

    View Slide

  5. INTRODUCTION
    • ń P2PćĴĘęĽhĨĴĮ
    Data
    Data
    Data
    Data

    View Slide

  6. INTRODUCTION
    • ń P2PćĴĘęĽhĨĴĮ
    Data
    Data
    Data
    Data
    Data
    ďĒìþół
    hĈ÷ąëł
    ŃĴļŁĻęĪŁń

    View Slide

  7. INTRODUCTION
    • ń P2PćĴĘęĽhĨĴĮ
    Data
    Data
    Data
    Data
    Data
    ŃĴļŁĻęĪŁń
    ”
    ” ďĒìþół
    hĈ÷ąëł

    View Slide

  8. INTRODUCTION
    • ń P2PćĴĘęĽhĨĴĮ
    Data
    Data
    Data
    Data
    Data
    h÷ąëăł
    ŃĴļŁĻęĪŁń
    ”
    ”

    View Slide

  9. INTRODUCTION
    • ÷ï÷æôćè”"ėíĔéăëì›xĆġĦĮė•
    U÷āëĔ†‹ðąë
    – ħĿĢŁıīĮľŁĞąĄćļĨŁĦðBąë}8ĂĈę
    ĿĩĻĞĤĺĿė§ôøôăšðġĦĮĆąēìĔ
    • ôì÷ý}8ė{÷ýćŅĀćķĭĽð”"ćġĦĮ
    ė•U÷ý}8ĂĈž‘ėWŒĂñąëôăėˆ÷æôć/
    ½ėŸtøĔXv눸ç
    Ä ĝļĥİĽćÊåÖÜáßÕÐÄÑ憋
    Å ÐÄÑė”"ð(‚Ć9)öúĒĕĔđìĆ[G÷ý
    ÎÒÙÝßäÕÐÅÅÁÅÆÑ憋

    View Slide

  10. RELATED WORK
    • ĶĩIJĽĵĠŁĵ憋ĂĈËÖÙáŽĆđÿā”"ė
    ³òøĔôăĂĴļŁĻęĪŁðZ!öĕĔăëì†
    ‹ðêĔÐÄÃÁÄÄÑ
    • eĂĴļŁĻęĪŁð”"뫸ŃÔßäÞãÖá¿
    àäÞÚâÙÝÖÞãÀôăðĂñĔyuþăôć'kĈFČĔ
    ăÏÚÛÚ×ßáÒÛÚâĒðˆ÷āëĔÐÅÉÑ
    • ÌÖÜÓÚÞØĒĈôì÷ý'kėZíĔýĎĆ*¥÷ý6
    .ĆĈ5²ėíĔ’čė&íĔôăĂ*¥ė
    ®øĔôăĆW%÷āëĔÐÄÈÑ

    View Slide

  11. RELATED WORK
    • ôì÷ý†‹ĂĈ”"Đ5²ćė1>Ć÷āîēæ
    ÎÒÙÝßäÕĒĈ(‚Ć”"ćė9)öúĔôăĂIJ
    Ľĵė~öúĔôăĆW%÷āëĔÐÅÄÑ
    • eĂÎÚÜÜÖáĒÐÅÇÁÅÈÑĐÍäáÔÒÐÄÉÑĒĈR5ė_ø
    ĔôăĆg‡ąaYëć’čė’čªčæq÷ë
    R5ė_öúĔôăė„]÷āëĔ
    • ôì÷ý†‹ĈQSćêĔĹŁģĆ@O÷đìă÷ý
    ďćþðæôĕĒĈā”"ėíĔôăšĆġ
    ĦĮė•U÷āëąë

    View Slide

  12. PEER PUNISHMENT & LIMITED
    RESOURCES
    • P2PćĴĘęĽhĤĦĬĵĆîëāĈ
    ĴĘęĽėhøĔôă = *¥
    h÷ąëôă = œē
    ăøĔôăðĂñĔ
    • œĔĜŁĥěĿĮĆ@÷āĈ”"æíĉ>i
    ·ĴĘęĽėh÷ąëăëì›(ėăĔôăð•í
    ĒĕĔç
    • œÿýĜŁĥěĿĮăsĆĴĘęĽėh÷ąë
    ĖóĆĈëïąëýĎæĜŁĥěĿĮć›(ė•U
    ÷āæ°ąi·ėt>÷ûć·ĴĘęĽėh÷
    ąëăëìôăĆąĔ

    View Slide

  13. Metanorm Model (Interaction Model)
    • ĜŁĥěĿĮõăĆæ
    *¥(cooperation)ïœē(defection)ėt>øĔ
    C%or%D
    C%or%D
    C%or%D
    C%or%D
    C%or%D

    View Slide

  14. Metanorm Model (Interaction Model)
    • *¥Ĉš Lðąë
    C
    +0
    +0 +0
    +0
    +0

    View Slide

  15. Metanorm Model (Interaction Model)
    • œē(temptationăď ì)Ĉ
    š©Ć ƒæ –Ć ƒ(hurt value)ėďýĒø
    C
    +3
    +1 +1
    +1
    +1
    D

    View Slide

  16. Metanorm Model (Agent Model)
    • ĜŁĥěĿĮĈboldnessăvengefulnessė›(ć
    LïĒQ-learning÷āëĔ
    • ôć2ĀćïĒĜŁĥěĿĮć]µ(policy)ð
    t>öĕĔ
    :—ö(boldness)ņ œē(temptation)ė›ì‡z
    4P(vengefulness)%ņœēĜŁĥěĿĮ딸Ĕ‡z

    View Slide

  17. Metanorm Model (Punishment Mechanism)
    • ”ėíĔo­ĆĈ2оðêĔ
    1. AxelrodćĝļĥİĽćķĭĽ(static model)
    – ”ć´Ĉ>
    2. MahmoudĒĆđĔ[GöĕýķĭĽ
    (adaptive model)
    – ”ć´ĈĜŁĥěĿĮć¯,ć›(ïĒ(‚Ćt>
    – ĜŁĥěĿĮĈûĕüĕćº^ĜŁĥěĿĮć›(ć
    Cr(image)ė\øĔ
    – imageĆKÿ┸Ĕėt>øĔ

    View Slide

  18. EXPERIMENTAL EVALUATION
    • ĜŁĥěĿĮĈļĨŁĦė!¸öĕýyTĂæ-˜ą
    ›(ėpolicyĆđÿāt>›ì
    • ļĨŁĦĈ1ĻĚĿįŃāćĜŁĥěĿĮðęĿĩĻ
    ĞĤĺĿė›ìńõăĆļħīĮøĔ
    • Experimental EvaluationĂĈ£Ă¬ċý
    – Static model
    – Adaptive model
    뤸Ĕ
    • ıīĮľŁĞĈl<ıīĮľŁĞŀĦğŁĽĴļŁıī
    ĮľŁĞÛì

    View Slide

  19. PARAMETERS SETUP

    View Slide

  20. STATIC PUNISHMENT EXPERIMENTAL
    RESULTS
    Fig1%impact%of%limited%resources%with%punishment%on%final%B%and%V

    View Slide

  21. ADAPTIVE PUNISHMENT EXPERIMENTAL
    RESULTS
    Fig2%impact%of%limited%resources%with%punishment%on%final%B%and%V

    View Slide

  22. RESOURCE-AWARE PUNISHMENT MODEL
    • ôôČĂćXvĈĜŁĥěĿĮðš©ćļĨŁĦė•
    UøĔôăąò›(ė÷āëý
    • ûôĂĜŁĥěĿĮĈš©ćļĨŁĦăº^øĔĜŁ
    ĥěĿĮcæº^ĜŁĥěĿĮć¯,ć›((image)
    ïĒ”øĔ´ėt>øĔđìĆ[GøĔ

    View Slide

  23. RESOURCE-AWARE PUNISHMENT MODEL
    • ĜŁĥěĿĮĈ¶önć£V¶ė\ĀďćăøĔ
    • êĔĜŁĥěĿĮ(agi)Ĉûćº^ĜŁĥěĿĮ(agj)õ
    ăÜēć#.(defection proportion): dpij
    ė¡
    øĔ
    • ôćdpij
    ėLocalDefImageăøĔ
    • öĒĆº^ĜŁĥěĿĮćdpij
    ćE2ė
    AvgDefImageăøĔ
    – LocalDefImageðAvgDefImageė¨íāëýĒHòæ¨
    íāëąóĕĉFò”øĔNðêĔ

    View Slide

  24. RESOURCE-AWARE PUNISHMENT MODEL
    • ”øĔ#.(Deviation)ėt>øĔ
    • º^ĜŁĥěĿĮĊć+ļĨŁĦ´ė2ŽĆb÷
    UniformRes(agi
    )ăøĔ
    • ôĕĈġĦĮ汹ćĂæġĦĮĆ@øĔ”ć#.
    (enforcement cost percentage: ECP)Ôć#.
    (EquivPunish)Ć9`øĔ
    • ôĕĒïĒ?¹Ć”øĔ´Ĉ
    EquivPunish x DeviationăąĔ

    View Slide

  25. Evaluation
    Fig3%impact%of%limited%resources%with%punishment%on%final%B%and%V

    View Slide

  26. CONCLUSION
    • ôĕČĂćĶĩIJĽĵķĭĽėëý†‹ĂĈ”"
    ĆġĦĮė•U÷ýďćðąïÿý
    • ôôĂĈ”"ð>ćĝļĥİĽćķĭĽăġĦĮ
    ė(‚Ć9)öúýķĭĽćeĆæ”"ćġĦĮ
    ėA÷IJĽĵć~渀ėˆ÷ý
    • Čý暩ćļĨŁĦė•U÷ā”"ėîôąĔķĭ
    Ľė_m÷æđē*¥ĊćIJĽĵė~öúĒĕĔ
    ôăėˆ÷ý

    View Slide

  27. FUTURE WORK
    • J憋ă÷āĈôćXvð ćo­( ćĠŁ
    ĵo­ńưĂñĔïŽðêĔ

    View Slide

  28. REFERENCE
    [1] R. Axelrod. An evolutionary approach to norms. American Political Science Review,
    80(4):1095–1111, 1986.
    [10] E. Fehr and S. Ga%̈chter. Altruistic punishment in humans. Nature, 415(6868):137–140, Jan.
    2002.
    [11] E. Fehr and S. Ga%̈chter. Cooperation and punishment in public goods experiments. The
    American Economic Review, 90(4):pp. 980–994, 2000.
    [17] D. Helbing, A. Szolnoki, M. Perc, and G. SzabA%̃%̧s. Punish, but not too hard: how costly
    punishment spreads in the spatial public goods game. New Journal of Physics, 12(8):083005,
    2010.
    [18] R. Jurca and B. Faltings. An incentive compatible reputation mechanism. In Proceedings of
    the Second International Joint Conference on Autonomous Agents and Multiagent Systems,
    AAMAS ’03, pages 1026–1027. ACM, 2003.
    [21] S. Mahmoud, J. Keppens, N. Griffiths, and M. Luck. Efficient norm emergence through
    experiential dynamic punishment. In Proceedings of the 20th European Conference on Artificial
    Intelligence, pages 576–581. IOS Press, 2012.
    [22] S. Mahmoud, J. Keppens, M. Luck, and N. Griffiths. Norm establishment via metanorms in
    network topologies. In Proceedings of the 2011
    [23] S. Mahmoud, J. Keppens, M. Luck, and N. Griffiths. Overcoming omniscience in axelrod’s
    model. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web
    Intelligence and Intelligent Agent Technology - Volume 03, WI-IAT ’11, pages 29–32. IEEE
    Computer Society, 2011.
    [26] N. Miller, P. Resnick, and R. Zeckhauser. Eliciting honest feedback in electronic markets.
    KSG Working Paper Series RWP02-039, 2002.
    [27] N. Miller, P. Resnick, and R. Zeckhauser. Eliciting informative feedback: The peer-prediction
    method. Management Science, 51:2005, 2005.
    [28] N. Nikiforakis. Punishment and counter-punishment in public good games: Can we really
    govern ourselves? Journal of Public Economics, 92:91–112, 2008.

    View Slide