Reading Circle (Reinforcement Learning for Image Processing)

7e2aff680cf5ccd644764bf589dd57e2?s=47 pyman
July 10, 2020

Reading Circle (Reinforcement Learning for Image Processing)

Explanation of Reinforcement Learning for Image Processing

7e2aff680cf5ccd644764bf589dd57e2?s=128

pyman

July 10, 2020
Tweet

Transcript

  1. Reading Circle Reinforcement Learning for Image Processing

  2. Today’s Topic Reinforcement Learning Algorithms −SARSA −Q-Learning −DQN −REINFORCE −Actor-Critic

    Reinforcement Learning for Image Processing −[Cao+, CVPR17] −A2-RL [Li+, CVPR18] −PixelRL [Furuta+, AAAI19] −Adversarial RL [Ganin+, ICML18] 2
  3. What’s Reinforcement Learning An area of machine learning concerned with

    how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. 3 Environment Agent State Reward Action Maximize = σ=0 ∞ γ++1
  4. Policy and Value Policy π −the model of agent's action

    selection State Value Function π −the expected return starting with state under π −"how good" it is to be in the given state Action Value Function π(, ) −the action-value of (, ) under π 4
  5. TD error Value-based RL Q-function is usually used. π ,

    ← π , + αδ δ = − π , 5 learning rate TD target The difference between the evaluation value before and after the agent acted.
  6. TD error Value-based RL Q-function is usually used. π ,

    ← π , + αδ δ = − π , There are some algorithms for different TD targets. − SARSA − Q-Learning 6 learning rate TD target
  7. Policy of Value-based RL greedy policy −always choose the action

    with the highest Q-value ε-greedy policy −select a random action with probability ε and the action with the highest Q-value with probability 1-ε 7
  8. SARSA = +1 + γ π +1 , +1 π

    , ← π , + α( − π , ) 8 Update the Q-value based on the current policy +1 +1 Update Q-value ε-greedy On-Policy
  9. Q-Learning = +1 + γmax π +1 , ′ π

    , ← π , + α( − π , ) 9 ′ Update the Q-value assuming the agent chose the action that maximizes the Q-value +1 +1 Update Q-value ε-greedy ε-greedy max Off-Policy
  10. Problem of Q-Learning Q-value is maintained by Q-table −manual separation

    to discrete values −slow learning because of ignorance of the relationships between values −the curse of dimensionality → approximate the Q-function by NN 10
  11. DQN = +1 + γmax π +1 , ′ π

    , ← π , + α( − π , ) approximate the Q-function by CNN −target network −experience replay −reward clipping 11 ′ fix the parameters of the Q-function max π +1 , ′ for a certain period ′
  12. DQN = +1 + γmax π +1 , ′ π

    , ← π , + α( − π , ) approximate the Q-function by CNN −target network −experience replay −reward clipping 12 ′ store past state transitions and their rewards in the replay buffer and apply mini-batch learning by sampling from replay buffer
  13. Policy-Gradient Method Learn parameter to maximize objective function ∇θ πθ

    = [∇θ logπθ ∗ πθ (∗)] for parameterized policy πθ 13 approximate policy itself by NN
  14. Policy-Gradient Method Learn parameter to maximize objective function ∇θ πθ

    = [∇θ logπθ ∗ πθ (∗)] for parameterized policy πθ There are some algorithms for different πθ (∗). −REINFORCE : −Actor-Critic : π = π − π 14 how good the policy is
  15. After all… 15

  16. Today’s Topic Reinforcement Learning Algorithms −SARSA −Q-Learning −DQN −REINFORCE −Actor-Critic

    Reinforcement Learning for Image Processing −[Cao+, CVPR17] −A2-RL [Li+, CVPR18] −PixelRL [Furuta+, AAAI19] −Adversarial RL [Ganin+, ICML18] 16
  17. [Cao+, CVPR17] Attention-Aware Face Hallucination 17

  18. [Cao+, CVPR17] Attention-Aware Face Hallucination algorithm : REINFORCE state :

    facial image action : select a location to enhance reward : MSE loss 18
  19. [Cao+, CVPR17] Qualitative Results 19

  20. A2-RL [Li+, CVPR18] Aesthetics Aware Image Cropping 20

  21. A2-RL [Li+, CVPR18] Aesthetics Aware Image Cropping algorithm : A3C

    state : current image & original image action : select a cropping window reward : aesthetics score 21
  22. A2-RL [Li+, CVPR18] Qualitative Results 22

  23. PixelRL [Furuta+, AAAI19] Pixel-wise multi-agent reinforcement learning - Employ fully

    convolutional network (FCN) - pixel-wise agents share the parameters - pixel-wise action : change pixel value - pixel-wise reward : the difference of pixel value 23
  24. PixelRL [Furuta+, AAAI19] Image Denoising 24

  25. PixelRL [Furuta+, AAAI19] Image Restoration 25

  26. PixelRL [Furuta+, AAAI19] Image Restoration 26

  27. Adversarial RL [Ganin+, ICML18] Synthesizing Programs 27 Renderer is non-differential

  28. Adversarial RL [Ganin+, ICML18] Generate vector images from raster images

    28
  29. Adversarial RL [Ganin+, ICML18] Generate vector images from raster images

    29
  30. Adversarial RL [Ganin+, ICML18] MNIST Reconstraction 30

  31. Conclusion Reinforcement Learning Algorithms −SARSA −Q-Learning −DQN −REINFORCE −Actor-Critic Reinforcement

    Learning for Image Processing −[Cao+, CVPR17] −A2-RL [Li+, CVPR18] −PixelRL [Furuta+, AAAI19] −Adversarial RL [Ganin+, ICML18] 31 Very recently, deep RL has been used for image processing There are much more RL algorithms…