Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

" Probabilistic roadmaps (PRMs) have a long and productive history in robotic motion planning. First conceived in 1996, they operate by sampling a set of points in configuration space and connecting these points using a simple line-of-sight algorithm. While PRM-based methods can construct efficient map representations, they share similar limitations with other sampling-based planners: PRMs do not consider external constraints such as the path feasibility and can suffer from unmodeled dynamics, sensor noise and non-stationary environments.
Correspondingly, RL algorithms such as DDPG and CAFVI suggest promising new alternatives to learn policies over long time horizons, by decomposing the learning task into a set of goals and subgoals. These algorithms can be robust to sensor noise, motion stochasticity, and are resilient to (moderate) changes in the environment but require efficient state representations and can often suffer from poor local minima. By combining PRMs and RL techniques, the authors present a compelling case for learning robot dynamics separately from the environment, a technique that is shown to scale to environments up to 63 million times larger than in simulation.
Fig. 4. PRM-RL: a prosperous handshake between of RL and classical robotics.
Specifically, the authors decouple the dynamics and noise estimation from the environment itself. First they learn the dynamics in a small training environment, and use that model to inform the local graph connectivity within the target environment. Instead of adding edges along all collision- free paths, they only draw edges which can be successfully navigated by the dynamics model in a high percentage of simulations. This process generates a roadmap that is more robust to noise and motion error and simultaneously less prone to poor local minima exhibited by naive HRL planners, ensuring continuous progress towards the goal state.
In this talk we will explore how to construct a dynamically feasible roadmap using RL, how to train a dynamics model using policy gradients and value function approximation, and finally how to query the PRM to produce practical reference trajectories. No prior understanding about motion planning, HRL or robotics is assumed or required. "

When: Monday, 05/11/18 at 2.00pm
Where: PAA 3195

Breandan Considine

July 18, 2023
Tweet

More Decks by Breandan Considine

Other Decks in Research

Transcript

  1. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and

    Sampling-based Planning Aleksandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, Anthony Francis, James Davidson, and Lydia Tapia Presented by Breandan Considine
  2. Probabilistic roadmaps Advantages Very efficient planning representation Provably probabilistically complete

    Guaranteed to find a path if one exists, given enough samples Disadvantages Executing reference trajectories Typically unaware of task constraints Can suffer from noise in perception and motor control Does not perform well under uncertainty
  3. RL Planners Advantages Robust to noise and errors Can obey

    robot dynamics and other task constraints Handles moderate changes to the environment Not as computationally complex to execute as other approaches (e.g. MPC, action filtering, hierarchical policy approximation) Disadvantages Long range navigation on complex maps has a sparse reward structure Can be difficult to train Prone to converge on poor local minima Need to carefully select the control and action space