Dec19 Meetup: Introduction to Reinforcement Learning with Tensorflow Agents

Dec19 Meetup: Introduction to Reinforcement Learning with Tensorflow Agents

In this talk you will discover how machines can learn complex behaviors and anticipatory actions.
Using this approach autonomous helicopters fly acrobatic maneuvers and could even learn to beat the GO world champion.

Opposed to other ML approaches, a training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge.
The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0, we will see how a real-life problem can be turned into a reinforcement learning task.
In an accompanying Python notebook, we implement - step by step - all required solution elements and highlight the design of Google’s
newest reinforcement learning library.

In the last part of the talk we take a look at the difficulties in using TF-Agents from a practitioners view and the approach we took at Geberit to address them.

Speaker: Christian Hidber
Christian is a consultant at bSquare with a focus on .NET Development, Machine Learning and Azure and an international conference speaker. He has a PhD in computer algebra from ETH Zurich and did a postdoc at UC Berkeley where he researched online data mining algorithms. Currently he applies machine learning to industrial hydraulics simulations in the context of a product with 7,000 installations in 42 countries all around the world.

You can find him at:
https://www.linkedin.com/in/christian-hidber/

0754d30f3acc99a940aebdcd49d5af97?s=128

Azure Zurich User Group

December 17, 2019
Tweet

Transcript

  1. 1.

    O L I V E R Z E I G

    E R M A N N E M B A R C C H R I S T I A N H I D B E R B S Q U A R E TO REINFORCEMENT LEARNING WITH TF–AGENTS & TENSOR FLOW 2.0 I N T R O D U C T I O N
  2. 3.

    T H E T A S K – 3 0

    0 W A T E R + 1 0 0 0 H O N E Y – 5 0 M E A D O W – 1 0 0 F O R E S T – 2 0 0 M O U N T A I N
  3. 4.

    -1 0 0 – 5 0 M E A D

    O W – 1 0 0 F O R E S T – 2 0 0 M O U N T A I N – 3 0 0 W A T E R + 1 0 0 0 H O N E Y C L E V E R S T R A T E G Y + 1 0 0 0 --------- - 6 0 0 -1 0 0 -1 0 0 -1 0 0 -1 0 0
  4. 5.

    -1 0 0 – 5 0 M E A D

    O W – 1 0 0 F O R E S T – 2 0 0 M O U N T A I N – 3 0 0 W A T E R + 1 0 0 0 H O N E Y + 1 0 0 0 --------- - 4 0 0 -1 0 0 -2 0 0 -3 0 0 N O T S O C L E V E R S T R A T E G Y
  5. 6.

    – 5 0 M E A D O W –

    1 0 0 F O R E S T – 2 0 0 M O U N T A I N – 3 0 0 W A T E R + 1 0 0 0 H O N E Y G O A L – F I N D A G O O D S T R A T E G Y F O R O R S O
  6. 7.

    C H O O S I N G A N

    A C T I O N O B S E R V A T I O N ( G A M E S T A T E ) P O L I C Y ( G A M I N G S T R A T E G Y ) ? A C T I O N
  7. 8.

    O B S E R V A T I O

    N ( G A M E S T A T E ) P O L I C Y ( G A M I N G S T R A T E G Y ) A C T I O N C H O O S I N G A N A C T I O N
  8. 9.

    O B S E R V A T I O

    N ( G A M E S T A T E ) P O L I C Y ( G A M I N G S T R A T E G Y ) A C T I O N C H O O S I N G A N A C T I O N
  9. 10.

    C H O O S I N G A N

    A C T I O N O B S E R V A T I O N ( G A M E S T A T E ) P O L I C Y ( G A M I N G S T R A T E G Y ) A C T I O N
  10. 11.

    E N V I R O N M E N

    T ( G A M E E N G I N E ) P O L I C Y ( G A M I N G S T R A T E G Y ) R E I N F O R C E M E N T L E A R N I N G A P P R O A C H A G E N T ( A L G O R I T H M )
  11. 12.

    E N V I R O N M E N

    T ( G A M E E N G I N E ) P O L I C Y ( G A M I N G S T R A T E G Y ) R E I N F O R C E M E N T L E A R N I N G A P P R O A C H A G E N T ( A L G O R I T H M )
  12. 13.

    E N V I R O N M E N

    T ( G A M E E N G I N E ) P O L I C Y ( G A M I N G S T R A T E G Y ) R E I N F O R C E M E N T L E A R N I N G A P P R O A C H A G E N T ( A L G O R I T H M ) P L A Y M E A S U R E U P D A T E
  13. 15.

    R E W A R D O B S E

    R V A T I O N T F - A G E N T S & T F 2 . 0 E N V I R O N M E N T R L A L G O R I T H M O p e n A I G y m • S t e p • R e s e t T F A g e n t s • R e i n f o r c e • D Q N • P P O • S A C • … T F 2 . 0 / K E R A S P O L I C Y
  14. 16.

    D E M O O R S O O N

    T F – A G E N T S
  15. 17.

    O R S O ’ s E N V I

    R O N M E N T ( G A M E E N G I N E ) E N V I R O N M E N T : G R A P H W O R L D L O G I C H O N E Y P O T P L A C E M E N T R E W A R D S : S T E P S H O N E Y P O T O B S E R V A T I O N : O R S O ’ s P O S I T I O N N E X T S T E P R E W A R D S P O S I T I O N R E W A R D S A C T I O N S : D I R E C T I O N S
  16. 18.

    P O L I C Y : F R O

    M O B S E R V A T I O N T O A C T I O N H I D D E N L A Y E R I N P U T L A Y E R O U T P U T L A Y E R S O F T M A X P O L I C Y C O L L E C T O R B U F F E R A L G O R I T H M E N V I R O N M E N T P 1 P 2 P 3 P 4 x 1 x 2 . . . x n
  17. 19.

    T F 2 . 0 / K E R A

    S P O L I C Y T F - A G E N T S & T F 2 . 0 C O L L E C T O R B U F F E R A L G O R I T H M B A C K P R O P E N V I R O N M E N T L E A R N P L A Y
  18. 21.

    P RAC T IT ION E RS VIE W PPO

    (outline) DQN (outline)
  19. 22.

    P RAC T IT ION E RS VIE W PPO

    (outline) DQN (outline)
  20. 24.
  21. 25.

    R E W A R D O B S E

    R V A T I O N R L A L G O R I T H M P O L I C Y I M P L E M E N T I N G R E I N F O R C E M E N T L E A R N I N G E N V I R O N M E N T
  22. 26.

    D I S C L A I M E R

    : E A S Y A G E N T S I S D E V E L O P E D B Y T H E S P E A K E R S .
  23. 27.

    D E M O O R S O O N

    E A S Y A G E N T S D I S C L A I M E R : E A S Y A G E N T S I S D E V E L O P E D B Y T H E S P E A K E R S .
  24. 28.

    R E I N F O R C E M

    E N T L E A R N I N G A S A S E R V I C E
  25. 29.

    YOU R S P E AKE RS Christian.Hidber@bsquare.ch +41 44

    260 54 00 https://www.linkedin.com/in/christian-hidber/ https://github.com/christianhidber C HRI STI A N HI D BE R OliverZeigermann@gmail.com @DJCordhose https://www.linkedin.com/in/oliver-zeigermann-34989773 https://github.com/DJCordhose O L I VE R Z E I GE RMA N N
  26. 30.