TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

© 2021 OMRON SINIC X Corporation. All Rights Reserved. TRANS-AM:
Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA’21) Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri (OMRON SINIC X) International Conference on Robotics and Automation (ICRA 2021) June 1, 2021

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic
Assembly Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri OMRON SINIC X Corporation

Outline • Motivation • Method • Experiment 3 (C) 2021
OMRON SINIC X Corporation

Quick adaptation to a new setup (C) 2021 OMRON SINIC
X Corporation 5 A new workpiece with different physical characteristics Adaptation 0. One peg-hole 1. Another hole 2. Small peg/hole

Learning for quick adaptation (C) 2021 OMRON SINIC X Corporation
6 A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning

State transition dynamics (C) 2021 OMRON SINIC X Corporation 7
A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning Action State Transition of the state of the robot

Problem / Example / Related works 1. Different dynamics in
source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the env. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 8 (C) 2021 OMRON SINIC X Corporation Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.

source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 9 (C) 2021 OMRON SINIC X Corporation Peg-in-hole tasks with variable hole orientations Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.

source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 10 (C) 2021 OMRON SINIC X Corporation Require frequent access to source environments • Inter-dynamics transfer [Chen+ NeurIPS2018] • Meta-learning [Vanschoren 2018] Policy Env 1 Env K ... Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.

Using policies learned at source environments • No access with
source envs. • Model-free RL • Many interactions at target envs. • Long delays until deployment • Damage of the robots and workpieces • Model-based RL (ours, TRANS-AM) 11 (C) 2021 OMRON SINIC X Corporation [Barekatain+ IJCAI2020] Policy Env 1 Env K ...

Model based RL (MBRL) A model • 𝑔 ~ 𝑇
• Learned using collected samples • Used to • predict the states from the actions • select the optimal actions (θ: parameter) 13 (C) 2021 OMRON SINIC X Corporation State-transition dynamics

Transfer MBRL (C) 2021 OMRON SINIC X Corporation 14 Target
dynamics model • Unknown • Quickly learning using source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators ... ?

TRANS-AM learns a model of target dynamics (C) 2021 OMRON
SINIC X Corporation 15 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models

Peg-in-hole simulations with different hole angles • State: the pose
of the wrist and the peg • Action: the velocity of the wrist • Return: the distance to goal position 17 (C) 2021 OMRON SINIC X Corporation Wrist Peg Spring Hole orientation Gripper

Evaluation of learning 1. Returns in earlier episodes • Leaning
curve 2. 1st success episode • average ± standard deviation 3. Success ratio • session with success before {5, 10, 15, 20 episode}/ all session • The number of source dynamics (K) is changed • Baseline: learning from scratch (C) 2021 OMRON SINIC X Corporation 18

Task performance using TRANS-AM (C) 2021 OMRON SINIC X Corporation
19

Learning curves (C) 2021 OMRON SINIC X Corporation 20 Higher
returns in earlier episodes

1st success / success ratio (C) 2021 OMRON SINIC X
Corporation 21 Average ± Standard deviation Faster High success rates e: episode K: source model number

• State: pose, force • Action: gripper velocity • Reward:
Setup with a robot arm (C) 2021 OMRON SINIC X Corporation 22 Motion capture cameras x 6 (FLEX13, OptiTrack) Force-torque sensor (FT300, ROBOTIQ) Gripper (2F-85, ROBOTIQ) Compliant wrist UR5 robot arm (Universal Robots)

Task performance using TRANS-AM (C) 2021 OMRON SINIC X Corporation
23

Learning curves / Success ratios Higher returns in earlier episodes
24 (C) 2021 OMRON SINIC X Corporation

High success rates Success ratios (C) 2021 OMRON SINIC X
Corporation 25 Average ± Standard deviation 30 % faster e: episode K: source model number

Conclusion: TRANS-AM • Proposed new method to leverage known models
when learning new tasks • Efficient model-based reinforcement learning confirmed in simulation and real-robot experiments • 30% faster learning than from scratch 26 (C) 2021 OMRON SINIC X Corporation

Weight (C) 2021 OMRON SINIC X Corporation 29 Source 1
Source 2 Hole orientation Hole orientation Hole orientation Source 1 Source 2 Source 1 Source 2 Target Target Target

Success ratios (C) 2021 OMRON SINIC X Corporation 30 Average
± Standard deviation Faster e: episode K: source model number

The number of source environments (C) 2021 OMRON SINIC X
Corporation 31 Hole orientation

Cross entropy method (C) 2021 OMRON SINIC X Corporation 32
[Didier & Olivier 2012]

Model predictive control 1. Input action sequence 2. Predict state
sequence 3. Calculate rewards in the sequence 4. Update action sequence and go back 2. (C) 2021 OMRON SINIC X Corporation 33

Limitations of TRANS-AM • Not consider characteristics of environments •
How many source envs? • How select envs? • Cost functions • Interactions with the env. (C) 2021 OMRON SINIC X Corporation 34

TRANS-AM uses black-box source models (C) 2021 OMRON SINIC X
Corporation 35 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators

TRANS-AM learns characteristics of a new part Geometric • Size
• Shape • Pose (Position, orientation) Dynamic • Mass • Inertia • Friction • Deformation • Elasticity (C) 2021 OMRON SINIC X Corporation 36

TRANS-AM: Transfer Learning by Aggregating Dyna...

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

More Decks by OMRON SINIC X

Other Decks in Research

Featured

Transcript