Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ricsue-deepracer-ws-istanbul.pdf

 ricsue-deepracer-ws-istanbul.pdf

An introduction to reinforcement learning. Get started on your machine learning adventures by turning the key and revving up the AWS DeepRacer console. See how fast you can make your car go around the various tracks and claim those all important bragging rights.

Ricardo Sueiras

September 16, 2019
Tweet

More Decks by Ricardo Sueiras

Other Decks in Technology

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Putting the “machine” in Machine Learning Ricardo Sueiras | Principal Evangelist, Amazon Web Services I s t a n b u l L o f t
  2. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS DeepRacer car specifications Car: 1/18-scale 4WD with monster truck chassis CPU: Intel Atom processor Memory: 4 GB RAM Storage: 32 GB (expandable) Wi-Fi: 802.11ac Camera: 4 MP camera with MJPEG Drive battery: 1000 mAh lithium polymer Compute battery: 13600 mAh USB-C Sensors: Integrated accelerometer and gyroscope Ports: 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI Software: Ubuntu OS 16.04.3 LTS, Intel OpenVINO toolkit, ROS Kinetic
  3. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS DeepRacer League: Race for prizes and glory The world’s first global, autonomous racing league www.deepracerleague.com
  4. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Submit your model now to race in the Virtual Circuit!
  5. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Reinforcement learning in the broader AI context Reinforcement learning Supervised learning Unsupervised learning
  6. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Method: Supervised learning How it works: Expert driver controls a real- world car that has a camera. Save the images from the camera as inputs and corresponding driving actions (speed and steering angle) as outputs. Train a model. Result: Provide state (image) into model and receive driving action. RL vs. other approaches for robotic racing Method: Reinforcement learning How it works: Virtual agent repeatedly interacts with a simulated environment and logs experience (image, action, new state, reward). Experience is used to train a model, and new model is used to get more experience. Result: Provide state (image) into model and receive driving action.
  7. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Machine learning overview Supervised Unsupervised Reinforcement
  8. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Reinforcement learning in the real world Reward positive behavior Don’t reward negative behavior The result!
  9. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Snakes on the (control) plane @frankmunz)
  10. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Reinforcement learning terms Agent Environment State Action Episode Reward
  11. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T The reward function The reward function incentivizes particular behaviors and is at the core of reinforcement learning
  12. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Training an RL model GOAL AGENT
  13. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T The reward function in a grid race GOAL AGENT
  14. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Scores that incentivize central line driving GOAL AGENT
  15. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T An episode GOAL AGENT
  16. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Iteration GOAL AGENT
  17. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Exploration
  18. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Exploitation and convergence
  19. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Exploration vs. exploitation EXPLORATION EXPLOITATION
  20. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T How does learning happen? Value function Policy function
  21. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T RL algorithms: Vanilla policy gradient * Image source: Landscape image is CC0 1.0 public domain Data is only used once • High variance of rewards • Magnitude of update could be too large J(q) New weights New weights 0.4 ± 0.3 ±
  22. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T EPISODE STEP {State, Action, Reward, New State} Complete Track or Crash – sequence of STEPS or EXPERIENCE EXPERIENCE BUFFER Sequence of STEPS over fixed number of EPISODES Episode x Episode y BATCH Ordered list of experiences TRAINING Random selection of BATCHES ITERATION POLICY NETWORK Episode 1 Episode 2
  23. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS DeepRacer neural network architecture Output – action Input – state (image)
  24. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  25. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS Cloud AWS DeepRacer NAT gateway VPC AWS DeepRacer Models Simulation video Metrics AWS DeepRacer simulator architecture
  26. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T AWS DeepRacer console diagram
  27. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Lab 0 – AWS DeepRacer service resource creation Objective: Set up your account resources to get you to the races! https://tinyurl.com/y59s4r4c
  28. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Programming your own reward function Code editor: Python 3 syntax Three example reward functions Code validation via AWS Lambda
  29. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Deep Reinforcement Learning Models: Tips & Tricks for Writing Reward Functions
  30. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Track components Track center Track wall Track surface, aka on-track Field, aka off-track Track boundaries
  31. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Coordinate system and track waypoints Outer boundary waypoints Track center waypoints Inner boundary waypoints X Y Track width Car direction
  32. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Reward function parameters { "all_wheels_on_track": Boolean, # flag to indicate if the vehicle is on the track "x": float, # vehicle's x-coordinate in meters "y": float, # vehicle's y-coordinate in meters "distance_from_center": float, # distance in meters from the track center "is_left_of_center": Boolean, # Flag to indicate if the vehicle is on the left side to the track center "heading": float, # vehicle's yaw in degrees "progress": float, # percentage of track completed "steps": int, # number steps completed "speed": float, # vehicle's speed in meters per second (m/s) "steering_angle": float, # vehicle's steering angle in degrees "track_width": float, # width of the track "waypoints": [[float, float], … ], # list of [x,y] as milestones along the track center "closest_waypoints": [int, int] # indices of the two nearest waypoints. }
  33. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Example parameter: heading
  34. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Example parameter: waypoint
  35. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Example parameter: all_wheels_on_track
  36. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Example parameter: distance_from_center
  37. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Action space
  38. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Hyper parameters control the training algorithm
  39. S U M M I T © 2019, Amazon Web

    Services, Inc. or its affiliates. All rights reserved.
  40. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Lab 1 – AWS DeepRacer service Objective: Build your first AWS DeepRacer RL model https://tinyurl.com/y6pyejqm
  41. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T Simulation-to-real domain transfer SIM-to-real challenge Train the model using simulated images, but train the race car using the images that the car experiences in the real world Strategies Environment control Domain randomization Modularity and abstraction
  42. © 2019, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. S U M M I T ROS msg node Stored file ROS nodes Web server publisher Model optimizer Video M-JPEG Web server video Inference results Autonomous drive Control node Optimized model Media engine Camera Model Inference engine Manual drive Navigation node Servo and motor AWS DeepRacer software architecture