Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

" Probabilistic roadmaps (PRMs) have a long and productive history in robotic motion planning. First conceived in 1996, they operate by sampling a set of points in configuration space and connecting these points using a simple line-of-sight algorithm. While PRM-based methods can construct efficient map representations, they share similar limitations with other sampling-based planners: PRMs do not consider external constraints such as the path feasibility and can suffer from unmodeled dynamics, sensor noise and non-stationary environments.
Correspondingly, RL algorithms such as DDPG and CAFVI suggest promising new alternatives to learn policies over long time horizons, by decomposing the learning task into a set of goals and subgoals. These algorithms can be robust to sensor noise, motion stochasticity, and are resilient to (moderate) changes in the environment but require efficient state representations and can often suffer from poor local minima. By combining PRMs and RL techniques, the authors present a compelling case for learning robot dynamics separately from the environment, a technique that is shown to scale to environments up to 63 million times larger than in simulation.
Fig. 4. PRM-RL: a prosperous handshake between of RL and classical robotics.
Specifically, the authors decouple the dynamics and noise estimation from the environment itself. First they learn the dynamics in a small training environment, and use that model to inform the local graph connectivity within the target environment. Instead of adding edges along all collision- free paths, they only draw edges which can be successfully navigated by the dynamics model in a high percentage of simulations. This process generates a roadmap that is more robust to noise and motion error and simultaneously less prone to poor local minima exhibited by naive HRL planners, ensuring continuous progress towards the goal state.
In this talk we will explore how to construct a dynamically feasible roadmap using RL, how to train a dynamics model using policy gradients and value function approximation, and finally how to query the PRM to produce practical reference trajectories. No prior understanding about motion planning, HRL or robotics is assumed or required. "

When: Monday, 05/11/18 at 2.00pm
Where: PAA 3195

Breandan Considine

July 18, 2023
Tweet

More Decks by Breandan Considine

Other Decks in Research

Transcript

  1. PRM-RL: Long-range Robotic
    Navigation Tasks by Combining
    Reinforcement Learning and
    Sampling-based Planning
    Aleksandra Faust, Oscar Ramirez, Marek Fiser,
    Kenneth Oslund, Anthony Francis, James
    Davidson, and Lydia Tapia
    Presented by Breandan Considine

    View full-size slide

  2. Workspace
    Configuration space

    View full-size slide

  3. Probabilistic roadmaps
    Advantages
    Very efficient planning
    representation
    Provably probabilistically
    complete
    Guaranteed to find a
    path if one exists, given
    enough samples
    Disadvantages
    Executing reference
    trajectories
    Typically unaware of
    task constraints
    Can suffer from noise
    in perception and
    motor control
    Does not perform well
    under uncertainty

    View full-size slide

  4. RL Planners
    Advantages
    Robust to noise and errors
    Can obey robot dynamics
    and other task constraints
    Handles moderate
    changes to the
    environment
    Not as computationally
    complex to execute as
    other approaches (e.g.
    MPC, action filtering,
    hierarchical policy
    approximation)
    Disadvantages
    Long range navigation on
    complex maps has a
    sparse reward structure
    Can be difficult to train
    Prone to converge on poor
    local minima
    Need to carefully select
    the control and action
    space

    View full-size slide

  5. How can we synthesize
    these two approaches?

    View full-size slide

  6. How do we train
    the dynamics
    model?

    View full-size slide

  7. https://www.youtube.com/embed/_XiaL5W-5Lg?enablejsapi=1

    View full-size slide