Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reading Circle (Reinforcement Learning for Image Processing)

pyman
July 10, 2020

Reading Circle (Reinforcement Learning for Image Processing)

Explanation of Reinforcement Learning for Image Processing

pyman

July 10, 2020
Tweet

More Decks by pyman

Other Decks in Research

Transcript

  1. Today’s Topic Reinforcement Learning Algorithms −SARSA −Q-Learning −DQN −REINFORCE −Actor-Critic

    Reinforcement Learning for Image Processing −[Cao+, CVPR17] −A2-RL [Li+, CVPR18] −PixelRL [Furuta+, AAAI19] −Adversarial RL [Ganin+, ICML18] 2
  2. What’s Reinforcement Learning An area of machine learning concerned with

    how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. 3 Environment Agent State Reward Action Maximize = σ=0 ∞ γ++1
  3. Policy and Value Policy π −the model of agent's action

    selection State Value Function π −the expected return starting with state under π −"how good" it is to be in the given state Action Value Function π(, ) −the action-value of (, ) under π 4
  4. TD error Value-based RL Q-function is usually used. π ,

    ← π , + αδ δ = − π , 5 learning rate TD target The difference between the evaluation value before and after the agent acted.
  5. TD error Value-based RL Q-function is usually used. π ,

    ← π , + αδ δ = − π , There are some algorithms for different TD targets. − SARSA − Q-Learning 6 learning rate TD target
  6. Policy of Value-based RL greedy policy −always choose the action

    with the highest Q-value ε-greedy policy −select a random action with probability ε and the action with the highest Q-value with probability 1-ε 7
  7. SARSA = +1 + γ π +1 , +1 π

    , ← π , + α( − π , ) 8 Update the Q-value based on the current policy +1 +1 Update Q-value ε-greedy On-Policy
  8. Q-Learning = +1 + γmax π +1 , ′ π

    , ← π , + α( − π , ) 9 ′ Update the Q-value assuming the agent chose the action that maximizes the Q-value +1 +1 Update Q-value ε-greedy ε-greedy max Off-Policy
  9. Problem of Q-Learning Q-value is maintained by Q-table −manual separation

    to discrete values −slow learning because of ignorance of the relationships between values −the curse of dimensionality → approximate the Q-function by NN 10
  10. DQN = +1 + γmax π +1 , ′ π

    , ← π , + α( − π , ) approximate the Q-function by CNN −target network −experience replay −reward clipping 11 ′ fix the parameters of the Q-function max π +1 , ′ for a certain period ′
  11. DQN = +1 + γmax π +1 , ′ π

    , ← π , + α( − π , ) approximate the Q-function by CNN −target network −experience replay −reward clipping 12 ′ store past state transitions and their rewards in the replay buffer and apply mini-batch learning by sampling from replay buffer
  12. Policy-Gradient Method Learn parameter to maximize objective function ∇θ πθ

    = [∇θ logπθ ∗ πθ (∗)] for parameterized policy πθ 13 approximate policy itself by NN
  13. Policy-Gradient Method Learn parameter to maximize objective function ∇θ πθ

    = [∇θ logπθ ∗ πθ (∗)] for parameterized policy πθ There are some algorithms for different πθ (∗). −REINFORCE : −Actor-Critic : π = π − π 14 how good the policy is
  14. Today’s Topic Reinforcement Learning Algorithms −SARSA −Q-Learning −DQN −REINFORCE −Actor-Critic

    Reinforcement Learning for Image Processing −[Cao+, CVPR17] −A2-RL [Li+, CVPR18] −PixelRL [Furuta+, AAAI19] −Adversarial RL [Ganin+, ICML18] 16
  15. [Cao+, CVPR17] Attention-Aware Face Hallucination algorithm : REINFORCE state :

    facial image action : select a location to enhance reward : MSE loss 18
  16. A2-RL [Li+, CVPR18] Aesthetics Aware Image Cropping algorithm : A3C

    state : current image & original image action : select a cropping window reward : aesthetics score 21
  17. PixelRL [Furuta+, AAAI19] Pixel-wise multi-agent reinforcement learning - Employ fully

    convolutional network (FCN) - pixel-wise agents share the parameters - pixel-wise action : change pixel value - pixel-wise reward : the difference of pixel value 23
  18. Conclusion Reinforcement Learning Algorithms −SARSA −Q-Learning −DQN −REINFORCE −Actor-Critic Reinforcement

    Learning for Image Processing −[Cao+, CVPR17] −A2-RL [Li+, CVPR18] −PixelRL [Furuta+, AAAI19] −Adversarial RL [Ganin+, ICML18] 31 Very recently, deep RL has been used for image processing There are much more RL algorithms…