PRM-RL: Long-range Robotic Navigation Tasks by ...

July 18, 2023

130

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

" Probabilistic roadmaps (PRMs) have a long and productive history in robotic motion planning. First conceived in 1996, they operate by sampling a set of points in configuration space and connecting these points using a simple line-of-sight algorithm. While PRM-based methods can construct efficient map representations, they share similar limitations with other sampling-based planners: PRMs do not consider external constraints such as the path feasibility and can suffer from unmodeled dynamics, sensor noise and non-stationary environments.
Correspondingly, RL algorithms such as DDPG and CAFVI suggest promising new alternatives to learn policies over long time horizons, by decomposing the learning task into a set of goals and subgoals. These algorithms can be robust to sensor noise, motion stochasticity, and are resilient to (moderate) changes in the environment but require efficient state representations and can often suffer from poor local minima. By combining PRMs and RL techniques, the authors present a compelling case for learning robot dynamics separately from the environment, a technique that is shown to scale to environments up to 63 million times larger than in simulation.
Fig. 4. PRM-RL: a prosperous handshake between of RL and classical robotics.
Specifically, the authors decouple the dynamics and noise estimation from the environment itself. First they learn the dynamics in a small training environment, and use that model to inform the local graph connectivity within the target environment. Instead of adding edges along all collision- free paths, they only draw edges which can be successfully navigated by the dynamics model in a high percentage of simulations. This process generates a roadmap that is more robust to noise and motion error and simultaneously less prone to poor local minima exhibited by naive HRL planners, ensuring continuous progress towards the goal state.
In this talk we will explore how to construct a dynamically feasible roadmap using RL, how to train a dynamics model using policy gradients and value function approximation, and finally how to query the PRM to produce practical reference trajectories. No prior understanding about motion planning, HRL or robotics is assumed or required. "

When: Monday, 05/11/18 at 2.00pm
Where: PAA 3195

Breandan Considine

July 18, 2023

More Decks by Breandan Considine

See All by Breandan Considine

Intrinsic social motivation via causal influence in multi-agent RL

breandan

110

Deep, Skinny Neural Networks are not Universal Approximators

breandan

110

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

breandan

100

Idiolect: A Reconfigurable Voice Coding Assisant

breandan

190

Interactive Programming with Automated Reasoning

breandan

Learning Structural Edits via Incremental Tree Transformations

breandan

Thinking Like Transformers

breandan

Discriminative Embeddings of Latent Variable Models for Structured Data

breandan

Derivatives. Important Concept. Simple to grasp in Kotlin.

breandan

Other Decks in Research

See All in Research

定性データ、どう活かす？〜定性データのための分析基盤、はじめました〜 / How to utilize qualitative data? ~We have launched an analysis platform for qualitative data~

kaminashi

1.1k

ストレス計測方法の確立に向けたマルチモーダルデータの活用

yurikomium

800

学生向けアンケート＜データサイエンティストについて＞

datascientistsociety

PRO

4.3k

電力システム最適化入門

mickey_kubo

740

20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策

trafficbrain

600

20250502_ABEJA_論文読み会_スライド

flatton

180

Self-supervised audiovisual representation learning for remote sensing data

satai

230

生成的推薦の人気バイアスの分析：暗記の観点から / JSAI2025

upura

210

SSII2025 [SS2] 横浜DeNAベイスターズの躍進を支えたAIプロダクト

ssii

PRO

3.7k

EarthSynth: Generating Informative Earth Observation with Diffusion Models

satai

130

データｘデジタルマップで拓く ミラノ発・地域共創最前線

mapconcierge4agu

190

【緊急警告】日本の未来設計図～沈没か、再生か。国民と断行するラストチャンス～

yuutakasan

140

Featured

See All Featured

Building Applications with DynamoDB

mza

6.5k

GraphQLの誤解/rethinking-graphql

sonatard

11k

Save Time (by Creating Custom Rails Generators)

garrettdimon

PRO

1.3k

Put a Button on it: Removing Barriers to Going Fast.

kastner

3.9k

The Invisible Side of Design

smashingmag

301

51k

What’s in a name? Adding method to the madness

productmarketing

PRO

3.6k

Design and Strategy: How to Deal with People Who Don’t "Get" Design

morganepeng

130

19k

Music & Morning Musume

bryan

6.7k

CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again

sstephenson

161

15k

Optimizing for Happiness

mojombo

379

70k

Cheating the UX When There Is Nothing More to Optimize - PixelPioneers

stephaniewalter

282

13k

Docker and Python

trallard

3.5k

Transcript

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and
Sampling-based Planning Aleksandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, Anthony Francis, James Davidson, and Lydia Tapia Presented by Breandan Considine
None
Workspace Configuration space
None
None
None
Probabilistic roadmaps Advantages Very efficient planning representation Provably probabilistically complete
Guaranteed to find a path if one exists, given enough samples Disadvantages Executing reference trajectories Typically unaware of task constraints Can suffer from noise in perception and motor control Does not perform well under uncertainty
RL Planners Advantages Robust to noise and errors Can obey
robot dynamics and other task constraints Handles moderate changes to the environment Not as computationally complex to execute as other approaches (e.g. MPC, action ﬁltering, hierarchical policy approximation) Disadvantages Long range navigation on complex maps has a sparse reward structure Can be diﬃcult to train Prone to converge on poor local minima Need to carefully select the control and action space
How can we synthesize these two approaches?
None
How do we train the dynamics model?
None
None
None
None
None
None
None
https://www.youtube.com/embed/_XiaL5W-5Lg?enablejsapi=1