Slide 1

Slide 1 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Putting the “machine” in Machine Learning Ricardo Sueiras | Principal Evangelist, Amazon Web Services I s t a n b u l L o f t

Slide 2

Slide 2 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T AWS DeepRacer car specifications Car: 1/18-scale 4WD with monster truck chassis CPU: Intel Atom processor Memory: 4 GB RAM Storage: 32 GB (expandable) Wi-Fi: 802.11ac Camera: 4 MP camera with MJPEG Drive battery: 1000 mAh lithium polymer Compute battery: 13600 mAh USB-C Sensors: Integrated accelerometer and gyroscope Ports: 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI Software: Ubuntu OS 16.04.3 LTS, Intel OpenVINO toolkit, ROS Kinetic

Slide 3

Slide 3 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T AWS DeepRacer League: Race for prizes and glory The world’s first global, autonomous racing league www.deepracerleague.com

Slide 4

Slide 4 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Submit your model now to race in the Virtual Circuit!

Slide 5

Slide 5 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reinforcement learning in the broader AI context Reinforcement learning Supervised learning Unsupervised learning

Slide 6

Slide 6 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Method: Supervised learning How it works: Expert driver controls a real- world car that has a camera. Save the images from the camera as inputs and corresponding driving actions (speed and steering angle) as outputs. Train a model. Result: Provide state (image) into model and receive driving action. RL vs. other approaches for robotic racing Method: Reinforcement learning How it works: Virtual agent repeatedly interacts with a simulated environment and logs experience (image, action, new state, reward). Experience is used to train a model, and new model is used to get more experience. Result: Provide state (image) into model and receive driving action.

Slide 7

Slide 7 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Machine learning overview Supervised Unsupervised Reinforcement

Slide 8

Slide 8 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reinforcement learning in the real world Reward positive behavior Don’t reward negative behavior The result!

Slide 9

Slide 9 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Snakes on the (control) plane @frankmunz)

Slide 10

Slide 10 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reinforcement learning terms Agent Environment State Action Episode Reward

Slide 11

Slide 11 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T The reward function The reward function incentivizes particular behaviors and is at the core of reinforcement learning

Slide 12

Slide 12 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Training an RL model GOAL AGENT

Slide 13

Slide 13 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T The reward function in a grid race GOAL AGENT

Slide 14

Slide 14 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Scores that incentivize central line driving GOAL AGENT

Slide 15

Slide 15 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T An episode GOAL AGENT

Slide 16

Slide 16 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Iteration GOAL AGENT

Slide 17

Slide 17 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Exploration

Slide 18

Slide 18 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Exploitation and convergence

Slide 19

Slide 19 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Exploration vs. exploitation EXPLORATION EXPLOITATION

Slide 20

Slide 20 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T How does learning happen? Value function Policy function

Slide 21

Slide 21 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T RL algorithms: Vanilla policy gradient * Image source: Landscape image is CC0 1.0 public domain Data is only used once • High variance of rewards • Magnitude of update could be too large J(q) New weights New weights 0.4 ± 0.3 ±

Slide 22

Slide 22 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T EPISODE STEP {State, Action, Reward, New State} Complete Track or Crash – sequence of STEPS or EXPERIENCE EXPERIENCE BUFFER Sequence of STEPS over fixed number of EPISODES Episode x Episode y BATCH Ordered list of experiences TRAINING Random selection of BATCHES ITERATION POLICY NETWORK Episode 1 Episode 2

Slide 23

Slide 23 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T AWS DeepRacer neural network architecture Output – action Input – state (image)

Slide 24

Slide 24 text

S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 25

Slide 25 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T AWS Cloud AWS DeepRacer NAT gateway VPC AWS DeepRacer Models Simulation video Metrics AWS DeepRacer simulator architecture

Slide 26

Slide 26 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T AWS DeepRacer console diagram

Slide 27

Slide 27 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Lab 0 – AWS DeepRacer service resource creation Objective: Set up your account resources to get you to the races! https://tinyurl.com/y59s4r4c

Slide 28

Slide 28 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Programming your own reward function Code editor: Python 3 syntax Three example reward functions Code validation via AWS Lambda

Slide 29

Slide 29 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Deep Reinforcement Learning Models: Tips & Tricks for Writing Reward Functions

Slide 30

Slide 30 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Track components Track center Track wall Track surface, aka on-track Field, aka off-track Track boundaries

Slide 31

Slide 31 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Coordinate system and track waypoints Outer boundary waypoints Track center waypoints Inner boundary waypoints X Y Track width Car direction

Slide 32

Slide 32 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Reward function parameters { "all_wheels_on_track": Boolean, # flag to indicate if the vehicle is on the track "x": float, # vehicle's x-coordinate in meters "y": float, # vehicle's y-coordinate in meters "distance_from_center": float, # distance in meters from the track center "is_left_of_center": Boolean, # Flag to indicate if the vehicle is on the left side to the track center "heading": float, # vehicle's yaw in degrees "progress": float, # percentage of track completed "steps": int, # number steps completed "speed": float, # vehicle's speed in meters per second (m/s) "steering_angle": float, # vehicle's steering angle in degrees "track_width": float, # width of the track "waypoints": [[float, float], … ], # list of [x,y] as milestones along the track center "closest_waypoints": [int, int] # indices of the two nearest waypoints. }

Slide 33

Slide 33 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Example parameter: heading

Slide 34

Slide 34 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Example parameter: waypoint

Slide 35

Slide 35 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Example parameter: all_wheels_on_track

Slide 36

Slide 36 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Example parameter: distance_from_center

Slide 37

Slide 37 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Action space

Slide 38

Slide 38 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Hyper parameters control the training algorithm

Slide 39

Slide 39 text

S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 40

Slide 40 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Lab 1 – AWS DeepRacer service Objective: Build your first AWS DeepRacer RL model https://tinyurl.com/y6pyejqm

Slide 41

Slide 41 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T Simulation-to-real domain transfer SIM-to-real challenge Train the model using simulated images, but train the race car using the images that the car experiences in the real world Strategies Environment control Domain randomization Modularity and abstraction

Slide 42

Slide 42 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. S U M M I T ROS msg node Stored file ROS nodes Web server publisher Model optimizer Video M-JPEG Web server video Inference results Autonomous drive Control node Optimized model Media engine Camera Model Inference engine Manual drive Navigation node Servo and motor AWS DeepRacer software architecture