機械学習/cookpad summer internship 2017 ml

Slide 1

Slide 1 text

17day TECH INTERNSHIP 機械学習クックパッド株式会社研究開発部菊田遥平 2017/08/15

Slide 13

Slide 13 text

事例：機械翻訳 Ref: http://arxiv.org/pdf/1707.01310v1 Ref: https://github.com/yoheikikuta/arxiv_summary_translation In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and solving the dual MDP-policy pair yields a policy gradient solution to optimizing the parametrized environment. Furthermore, environments with non-differentiable parameters are addressed by a proposed general generative framework. Experiments on a Maze generation task show the effectiveness of generating diverse and challenging Mazes against agents with various settings. 典型的な強化学習（ RL）では、環境が与えられていると仮定され、学習の目標は、エージェントが環境との相互作用を通じて行動をとるための最適なポリシーを特定することである。本論文では、環境を与えないでエージェントとの対話を通じて同時に制御可能で学習可能であることを考慮して、この設定を拡張する。理論的には、エージェントに対する環境ではマルコフ決定プロセス（ MDP）が2つあり、デュアルMDPポリシーペアを解決すると、パラメータ化された環境を最適化するためのポリシー勾配ソリューションが得られます。さらに、非微分可能なパラメータを有する環境は、提案された一般的な生成フレームワークによって対処される。迷路生成タスクの実験では、さまざまな設定のエージェントに対して多様で挑戦的な迷路を生成する効果が示されます。代表的な強化学習（ RL）で、環境が与えられたと学習の目標は、環境との相互作用を介して行動を取るエージェントに最適なポリシーを識別することであると仮定されます。本稿では、環境を考慮して、この設定を拡張するには与えられていないが、同時に薬剤との相互作用を介して制御可能と学習可能。理論的には、我々は二重のマルコフ決定過程（ MDP）はそのWRTエージェントに環境を WRT見つけ、およびデュアル MDP-ポリシーのペアを解くと、パラメータ化環境を最適化するための方策勾配ソリューションを生み出します。また、非微分パラメータを持つ環境では、提案された一般的な生成的枠組みによって対処されます。迷路生成タスクの実験は、様々な設定を持つエージェントに対して多様かつ挑戦的な迷路を生成するの有効性を示します。元の文章従来手法での翻訳ニューラル機械翻訳での翻訳与えられた自然言語を他の自然言語に翻訳することも、対となる文章を大量に学習することで、人間に近いレベルの翻訳ができるようになってきている 13 / 133

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text