Sebastian Schwarz- A Competitive Time-Trial AI for Need for Speed: Most Wanted Using Deep Reinforcement Learning

Slide 1

Slide 1 text

A Competitive Time-Trial AI for A Competitive Time-Trial AI for Need for Speed: Most Wanted Need for Speed: Most Wanted Using Deep Reinforcement Learning Using Deep Reinforcement Learning Munich Datageeks Meetup Munich Datageeks Meetup by Sebastian Schwarz from E.ON (Data.ON) by Sebastian Schwarz from E.ON (Data.ON) Linkedin Linkedin 2023-04-25 2023-04-25 use ← ↑ ↓ → to navigate use ← ↑ ↓ → to navigate 1 / 61 1 / 61

Slide 2

Slide 2 text

Introduction Introduction 2 / 61 2 / 61

Slide 3

Slide 3 text

Introduction: eSports Athlete In Another Life 15 Years Ago 3 / 61

Slide 4

Slide 4 text

Introduction to Need for Speed:Most Wanted (2005) Introduction Arcade racing game, best selling game of the franchise with more than 16m copies sold Popular eSports title with major tournaments at Electronic Sports League (ESL) and World Cyber Games (WCG) Game Mode: Circuit (Basics) Played in races on circuits in 1:1 mode Usually 5-6 laps with standing start, first to complete all laps wins All tuning (except Junkman parts by WCG rules) and cars allowed (Best: Lotus Elise and Porsche Carrera GT) NOS (reason: cheating) and Collision (reason: lags) disabled by ESL and WCG rules 4 / 61

Slide 5

Slide 5 text

Introduction: Unofficial World Record at Heritage Heights* Heritage Heights 1.06.85 Lotus Elise, NO Nos, HD Heritage Heights 1.06.85 Lotus Elise, NO Nos, HD [*] By ESL rules an to the best of my knowledge at the time. There are no official leaderboards. 5 / 61

Slide 6

Slide 6 text

Sony[1] F. Fuchs, Y. Song, E. Kaufmann, D. Scaramuzza and P. Dürr:, Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning, IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4257-4264, July 2021, doi: https://10.1109/LRA.2021.3064284 Sony[2] Wurman, P.R., Barrett, S., Kawamoto, K. et al.: Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223– 228 (2022). https://doi.org/10.1038/s41586-021- 04357-7 Introduction: Sony Published Gran Turismo Sophy in 2022 6 / 61

Slide 7

Slide 7 text

Introduction: And I Thought... 7 / 61

Slide 8

Slide 8 text

A Competitive Time-Trial AI for NFS:MW Using Deep RL A Competitive Time-Trial AI for NFS:MW Using Deep RL 8 / 61 8 / 61

Slide 9

Slide 9 text

Because why Not?! Hardware to run the game at 100fps+ and train a deep reinforcement learning model in real-time became commodity Software for deep reinforcement learning is available in python with stable-baselines3 Proven by Sony AI that it is possible with what later became Gran Turismo Sophy Use cases: Fine-tuning: car setups, usually trial and error Pushing the Boundary: what's the theoretical best? Better AI: competitive in-game AI A Competitive Time-Trial AI for NFS:MW Using Deep RL YES, but Why...? 9 / 61

Slide 10

Slide 10 text

Implementing The Algorithm: Getting Started Implementing The Algorithm: Getting Started 10 / 61 10 / 61

Slide 11

Slide 11 text

Implementing The Algorithm: Getting Started Neither Game API Nor Any Code is Publicly Available: A Start from Scratch 1. Custom gym Environment: Implement a custom (real-time) training environment (action, observation, reward, done) using OpenAI gym 2. Hack Game for API: Create a (real-time) game API in python with all necessary functions by "hacking" the game's memory and access it using pymem 3. Virtual Gamepad: Control the game in real-time with a virtual gamepd vgamepad 4. Agent Training: Train the deep reinforcement learning algorithm using the SAC algorithm (Soft Actor Critic) from stable_baselines3 with a pytorch backend 11 / 61

Slide 12

Slide 12 text

Implementing The Algorithm: Custom Implementing The Algorithm: Custom gym gym Environment #1 Environment #1 12 / 61 12 / 61

Slide 13

Slide 13 text

Implementing The Algorithm: Custom gym Environment Basic Methods Need to be Implemented in NfsAiHotLap class NfsAiHotLap(gym.Env): """Custom Environment: NfsAiHotLap that follows the gym interface""" def __init__(self): """method to initialize especially the action and observation spaces""" self.action_space self.observation_space def step(self, action): """method to perform one action and return the reward and the next observation""" return observation, reward, done, info def reset(self): """method to reset the game and return the initial observation""" return observation def render(self, mode="human"): """method that outputs a render of the env, e.g. screenshot""" 13 / 61

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Implementing The Algorithm: Custom gym Environment action: Combined Steering and Acceleration Input Give the agent a virtual Xbox 360 Controller to steer and accelerate in-game reward: Immediate Reward for Action Taken (Target Variable) Proxy reward the agent by how quickly it goes around the track with delta lap completion observation: The Feature Vector for the Algorithm Give the agent telemetry (speed, acceleration, ...), lidar, and a gps navigation system done: Indicator if an Episode has ended Tell the agent when it's "Game Over" (one of lap completion, time limit, or reverse) 18 / 61

Slide 19

Slide 19 text

Implementing The Algorithm: Hack Game for API Implementing The Algorithm: Hack Game for API 19 / 61 19 / 61

Slide 20

Slide 20 text

Implementing The Algorithm: Hack Game for API There is No Game API Real-Time Data Required: Information from the game needs to be available in real-time, e.g. to calculate the reward, or end of an episode done even if the input to the game would be a screen capture Extra Options: NFSMW Extra Options for: windowed mode, more than 5 laps, teleport car at given speed and direction (for resetting after an episode), debug print Just Build One... With This Strategy Access speed.exe: The Game's variable are stored in the computers memory (RAM), for NFS:MW about 300 MB Search RAM: Find the right addresses (pointer) or patterns (dynamic) where the data you want is stored using a hex editor and scanning memory at various in-game situations for changes Read RAM: Use pymem to read variables from the game's memory in nanoseconds Real-Time calculation: Calculate everything else in python in real-time 20 / 61

Slide 21

Slide 21 text

Implementing The Algorithm: Hack Game for API Example: Tracking x, y, z and speed from pymem import Pymem class TrackVehicle: def __init__(self): # NFS:MW process is called "speed.exe" self.pm = Pymem("speed.exe") def track(self): x = self.pm.read_float(0x00914560) # x-coordinate y = self.pm.read_float(0x00914564) # y-coordinate z = self.pm.read_float(0x00914568) # z-coordinate speed = self.pm.read_float(0x009142C8) # speed return((x, y, z, speed)) 21 / 61

Slide 22

Slide 22 text

Implementing The Algorithm: Hack Game for API Currently Identified Variables Variable Type Example Detail x, y, z float -365, 1000, 156 coordinates of the car in the game in m (like gps: lat, long, elevation) speed float 88.76 speed of the car in m/s surface_l, surface_r int 0 surface the car is driving on with the left/right wheels respectively, e.g. asphalt, grass, ... angle int 0xFB12 direction of the car, must be a mapping of the Euler angle from [-pi, +pi] lap int 3 the current lap of the race steering, throttle float -0.6, 0.3 steering and throttle (currently only for Logitech RumblePad 2) gamestate int 6 dummy variable for the game state, e.g. menu or within race Based on these variables everything else is calculated in the NfsMw API 22 / 61

Slide 23

Slide 23 text

Demo Time: Telemetry Demo Time: Telemetry 23 / 61 23 / 61

Slide 24

Slide 24 text

Real-Time Telemetry is All We Need! 24 / 61

Slide 25

Slide 25 text

Implementing The Algorithm: Hack Game for API Example: Racing Line and Track Boundaries (x, y, speed) for One Lap (01:09.810) 25 / 61

Slide 26

Slide 26 text

Implementing The Algorithm: Hack Game for API Example: Steering and Speed (steering, speed) for One Lap (01:09.810) 26 / 61

Slide 27

Slide 27 text

Implementing The Algorithm: Hack Game for API Some Available Methods in My NfsMw API: Vehicle Related class NfsMw(): def vehicle_telemetry(self): """returns telemetry data: x, y, z, speed, sfc_l, sfc_r, direction""" def vehicle_lidar(self, resolution_degree=1): """returns distances to next border for 180 degrees ahead""" def vehicle_collision(self): """returns if there is a collision""" def vehicle_airtime(self): """returns is vehicle is airborne""" def vehicle_reverse(self, rev_angle_threshold=0.6*np.pi): """returns is vehicle is reversing""" 27 / 61

Slide 28

Slide 28 text

Implementing The Algorithm: Hack Game for API Some Available Methods in My NfsMw API: Lap Related class NfsMw(): def lap(self): """returns the current lap""" def laptime(self): """returns the lap time clock""" def lap_completion(self): """returns the lap completion""" def lap_angle_ahead(self, n_ahead=150): """returns the direction of the track n points ahead""" def lap_radii_ahead(self, n_ahead=150, inverse=True): """returns the inverse radii n points ahead""" 28 / 61

Slide 29

Slide 29 text

Implementing The Algorithm: Hack Game for API Some Available Methods in My NfsMw API: Game Related class NfsMw(): def state(self): """returns the gamestate""" def reset_vehicle(self): """reset car to saved start location with hotkey from NFS:MW ExtraOpts""" def restart_race(self): """reset the whole race and start at lap 1 again""" def screenshot(self): """returns screenshot as numpy array""" 29 / 61

Slide 30

Slide 30 text

Implementing The Algorithm: Virtual Gamepad Implementing The Algorithm: Virtual Gamepad 30 / 61 30 / 61

Slide 31

Slide 31 text

In order to steer the car inputs need to be sent to the game It's known that controlling the game via Keyboard is too choppy and therefore slow I use the vgamepad library in python, wrapper for ViGEm (Virtual Gamepad Emulation Framework) to create a virtual Xbox 360 Controller Steering is mapped to the left analogue stick x-axis, acceleration is mapped to the right analogue stick y-axis (default in the game) Signal is continuous between [-1, +1] Implementing The Algorithm: Virtual Gamepad Just Give it a Virtual Xbox 360 Controller... 31 / 61

Slide 32

Slide 32 text

Implementing The Algorithm: Custom Implementing The Algorithm: Custom gym gym Environment #2 Environment #2 32 / 61 32 / 61

Slide 33

Slide 33 text

Type: float32 Size: 2 Components: steering: left analogue stick x-axis, full left to full right encoded within [-1, +1] acceleration: right analogue stick y-axis, full brake to full throttle encoded within [-1, +1] Example: action=[-0.3, +1.0] would be 30% steering left and full throttle Notes: For steering every value makes sense For acceleration only [-0.4, 0.7, 1.0] make sense (brake, but not reverse, lift, full throttle) Implementing The Algorithm: Custom gym Environment action: Combined Steering and Acceleration Input Give the agent a virtual Xbox 360 Controller to steer and accelerate in-game 33 / 61

Slide 34

Slide 34 text

Implementing The Algorithm: Custom gym Environment reward: Immediate Reward for Action Taken (Target Variable) Proxy reward the agent by how quickly it goes around the track with delta lap completion Type: float32 Size: 1 Components: Delta in lap completion between two steps Example: reward=0.00035 would mean 0.035% of additional lap completion achieved Notes: lap completion is within [0, 1] and measured to the in-game millimeter high enough discount factor [0.98, 0.99] ensures to find the racing line (unlike speed) 34 / 61

Slide 35

Slide 35 text

Implementing The Algorithm: Custom gym Environment observation: The Feature Vector for the Algorithm Give the agent telemetry (speed, acceleration, ...), lidar, and a gps navigation system Type: float32 Size: 593 Components: vehicle_telemetry [4]: speed, acceleration, surface, direction vehicle_lidar [181]: 180 degree distances to the nearest boundary in 1 degree steps vehicle_collision [1]: collision indicator vehicle_reverse [1]: reverse indicator lap_radii_ahead [200]: inverse curve radii for 200 points (ca. 300m) ahead lap_angle_ahead [200]: curve angle for 200 points (ca. 300m) ahead streering_t5 [5]: last 5 steering inputs Example: Notes: This is the feature vector for the algorithm, and the only information it gets The agent is not trained on images as this would be slow All features need to be calculated in real-time from the game's variables Feature calculation takes 10ms on my machine 35 / 61

Slide 36

Slide 36 text

Implementing The Algorithm: Custom gym Environment Example: observation for main features lidar and GPS naviagtion 36 / 61

Slide 37

Slide 37 text

Implementing The Algorithm: Custom gym Environment Example: observation for main features lidar and GPS naviagtion 37 / 61

Slide 38

Slide 38 text

Implementing The Algorithm: Custom gym Environment Example: observation for main features lidar and GPS naviagtion 38 / 61

Slide 39

Slide 39 text

Implementing The Algorithm: Custom gym Environment Example: observation for main features lidar and GPS naviagtion 39 / 61

Slide 40

Slide 40 text

Implementing The Algorithm: Custom gym Environment done: Indicator if an Episode has ended Tell the agent when it's "Game Over" (one of lap completion, time limit, or reverse) Type: bool Size: 1 Components: Indicator of Episode has ended Example: done=False would mean the episode has not ended Notes: There are 3 reasons for done=True lap completion (best case) time limit (here: 180 seconds) vehicle reverse (more than 110 degree turn relative to track direction) 40 / 61

Slide 41

Slide 41 text

Implementing The Algorithm: Agent Training Implementing The Algorithm: Agent Training 41 / 61 41 / 61

Slide 42

Slide 42 text

Implementing The Algorithm: Agent Training That's Actually the Easy Part With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) The main code has some more features like a logging and saving callback 42 / 61

Slide 43

Slide 43 text

Implementing The Algorithm: Agent Training That's Actually the Easy Part With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Instantiates the virtual Xbox 360 Gamepad. Needs to run before the game starts There is also a VNfsKeyboard class, but it's slower (only discrted -1 or +1) input 43 / 61

Slide 44

Slide 44 text

Implementing The Algorithm: Agent Training That's Actually the Easy Part With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Instantiates the game environment, i.e. the custom env Uses the NfsMw api to interact with the game and to calculate observation, reward, done, action 44 / 61

Slide 45

Slide 45 text

Implementing The Algorithm: Agent Training That's Actually the Easy Part With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Defines the model as Soft Actor Critic (SAC), using a Multilayer Perceptron (MLP) (fully connected feed- forward DNN) Discount factor for future reward set to 0.985 (default 0.99) 45 / 61

Slide 46

Slide 46 text

Implementing The Algorithm: Agent Training That's Actually the Easy Part With stable_baselines3 (Basic Code) from stable_baselines3 import SAC from control.vnfsgamepad import VNfsGamePad from env import NfsAiHotLap # init virtual game pad pad = VNfsGamePad() # create environment nfsmwai = NfsAiHotLap(pad) # create model model = SAC("MlpPolicy", nfsmwai, gamma=0.985) # learn model.learn(total_timesteps=1e7) Starts the training progress Training in real-time at 30-60hz takes about 20h on my 7 year old gaming PC (Intel Xeon E3-1231 v3, Nvidia Geforce GTX1070) 46 / 61

Slide 47

Slide 47 text

Demo Time: Learning Progress Demo Time: Learning Progress 47 / 61 47 / 61

Slide 48

Slide 48 text

Learning Progress: After 0.25h (20k Steps) "Ahhh its a Car?" 48 / 61

Slide 49

Slide 49 text

Learning Progress: After 1h (80k Steps) "I'm driving!?" 49 / 61

Slide 50

Slide 50 text

Learning Progress: After 3-5h (500k Steps) "Let's Wall-Ride" 50 / 61

Slide 51

Slide 51 text

Learning Progress: After 10-20h (2m Steps) "I'm a Racer!" 51 / 61

Slide 52

Slide 52 text

Learning Progress: "I Can Race Other Cars, too" 52 / 61

Slide 53

Slide 53 text

Implementing The Algorithm: Agent Training Lap Time Progress From Scratch (Best Ever AI Lap: 1:10:47) 53 / 61

Slide 54

Slide 54 text

Conclusion Conclusion 54 / 61 54 / 61

Slide 55

Slide 55 text

Conclusion Challenges with the approach: great cornering, bad on stratights Reward: Instant reward only with tiny, not very sensitive to jitter (higher discount factor (gamma) does not work). Penalties on steering did not work so far. Real-Time: Learning/input steps and game are not synced, possible fix: Real-Time Gym (rtgym) Game FPS: Unstable frame times of the game Exploration Trade-Off: Potentially too high exploration coefficient (ent_coef) (manually lowering did not work) Filter: Filtering at testing does not work: Inexperience high speed at corners after long straight Observation Noisy and maybe incomplete input features, e.g. does not see many things, e.g. airtime, surface ahead, obstacles, ... 55 / 61

Slide 56

Slide 56 text

Conclusion Time-Trail AI for Need for Speed: Most Wanted Works Well: Using SAC from stable_baselines3 and some basic features a good speed (1:10:47) and a very stable model can be achieved in a quick time Accessible: Training in real-time at 30-60hz is possible on my 7 year old gaming PC (Intel Xeon E3-1231 v3, Nvidia Geforce GTX1070) Comparably Fast: Training takes about 24-48h (single instance of the game, single car) in real-time (Sony[1]: 56-73h, but 80 cars in parallel on 4 PS4, Sony[2]: up to 1000 PS4 in parallel) Generalization: Trained model generalizes very well to other cars without re-training Track Specific: However does generalize poorly for other tracks, makes the track, but many mistakes (same for humans btw.) 56 / 61

Slide 57

Slide 57 text

Thank You! Thank You! 57 / 61 57 / 61

Slide 58

Slide 58 text

Thank You Special Thanks Inko Elgezua Fernández, robotics and reinforcement learning expert, and colleague at E.ON, for many helpful comments, fruitful discussions, and motivation Anıl Gezergen (@nlgxzef) from NFSMW Extra Options, who does a sick job reverse engineering in-game functions and helped with some comments to get me started Florian Fuchs from Sony AI for pointing me at a pre-print version with more appendices for Sony[1] and some comments Yann Bouteiller autor and creator of vgamepad running a similar project tmrl for TrackMania for a helpful comment 58 / 61

Slide 59

Slide 59 text

Thank You Now Make a Pull-Request! Let's get the WR! Code and Models: are Open Source and properly structured and ok-ish documented from today onwards: gitlab.com/schw4rz/nfsmwai A Competitive Time-Trial AI for NFS:MW Using Deep RL 59 / 61

Slide 60

Slide 60 text

Backup Backup 60 / 61 60 / 61

Slide 61

Slide 61 text

In-depth knowledge: about the game from my time as eSports athlete: benchmark (WR) and car setups available Stability of conditions: no damage, (tire) degradation, or weather effects on car or track Game properties: clear track boundaries, no weird driving techniques like wall-rides or double-steering required Active Community: Active time-trial and modding community, e.g. NFSMW Extra Options A Competitive Time-Trial AI for NFS:MW Using Deep RL Need for Speed: Most Wanted is a Good Fit 61 / 61