|AlphaGo原来是这样运行的，一文详解多智能体强化学习( 十 )

[5] Hu J, Wellman M P. Nash Q-learning for general-sum stochastic games[J]. Journal of machine learning research, 2003, 4(Nov): 1039-1069.
[6] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746–752, 1998.
[7] S. Kapetanakis and D. Kudenko. Reinforcement learning of coordination in cooperative multi-agent systems. American Association for Artificial Intelligence, pp. 326-331, 2002.
[8] Yang Y, Luo R, Li M, et al. Mean Field Multi-Agent Reinforcement Learning[C]. international conference on machine learning, 2018: 5567-5576
[9] Lowe R, Wu Y, Tamar A, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments[C]. neural information processing systems, 2017: 6379-6390.
[10] Foerster J, Farquhar G, Afouras T, et al. Counterfactual Multi-Agent Policy Gradients[J]. arXiv: Artificial Intelligence, 2017.
[11] Sunehag P, Lever G, Gruslys A, et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning.[J]. arXiv: Artificial Intelligence, 2017.
[12] Rashid T, Samvelyan M, De Witt C S, et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[J]. arXiv: Learning, 2018.
[13] OpenAI Five, OpenAI, https://blog.openai.com/openai-five/, 2018.
[14] Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
[15] P. Long, T. Fan, X. Liao, W. Liu, H. Zhang and J. Pan, ''Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning,'' 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, 2018, pp. 6252-6259, doi: 10.1109/ICRA.2018.8461113.
[16] Y. F. Chen, M. Everett, M. Liu and J. P. How, ''Socially aware motion planning with deep reinforcement learning,'' 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, 2017, pp. 1343-1350, doi: 10.1109/IROS.2017.8202312.
[17] Hernandez-Leal P , Kartal B , Taylor M E . A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents & Multi Agent Systems, 2019(2).

|AlphaGo原来是这样运行的，一文详解多智能体强化学习( 十 )

推荐阅读

技术■邀你参加 | 首场技术转化“云课堂” 本月7号开讲

『易经风水知识』代表钱仓丰满，命中财气很旺！，脸上这个地方饱满

诗词歌赋|“鸡声茅店月，人迹板桥霜”中茅店到底何意？诗词大会54人答错

幼儿交通安全小知识顺口溜?交通安全小知识顺口溜图片_1

怀孕可以吃孕妇奶粉吗？怀孕吃什么比较好

『荣耀De冷』那个朱棣临终前说“爱我”的人，大明管家夏原吉

综艺|《未知的餐桌》凭6分的评分获得口碑，岳云鹏的观众缘真不是玄学

哪些人不宜食用青羊肉？青羊肉的副作用有哪些?

穿搭|周迅没有明星架子，穿T恤牛仔裤坐地上帮奶奶拍照，广受网友好评

8个字押韵的结婚祝福语有哪些？

幽默渠昊英|搞笑段子：好歹我也是过了英语四级,你说个panda会死啊

李沁|又被李沁美到了！一袭白色缎面拖尾长裙配大红唇，女神范十足

家里的管道漏水怎么办家里管道漏水怎么办

那么心动|父亲拼命凑钱坐车送去，看到女婿的债主愣了，女婿欠债八十万

卓依婷|卓依婷曾经被观众大喊：滚下去，你个冒牌货，卓依婷已经去世了

一全姐说|她被誉为“排球女神”，因一张训练照走红，颜值不输娱乐圈明星

为啥晚上不能吃姜

WeChat|腾讯公司回应美国商务部针对WeChat最新行政命令

央视|结伙辱骂、殴打医护人员陕西“伤医”父子终被判刑！

浙江象山一渔船被撞沉 2人获救5人失联