Slide 1

Slide 1 text

© 2021 OMRON SINIC X Corporation. All Rights Reserved. TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA’21) Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri (OMRON SINIC X) International Conference on Robotics and Automation (ICRA 2021) June 1, 2021

Slide 2

Slide 2 text

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri OMRON SINIC X Corporation

Slide 3

Slide 3 text

Outline • Motivation • Method • Experiment 3 (C) 2021 OMRON SINIC X Corporation

Slide 4

Slide 4 text

Outline • Motivation • Method • Experiment 4 (C) 2021 OMRON SINIC X Corporation

Slide 5

Slide 5 text

Quick adaptation to a new setup (C) 2021 OMRON SINIC X Corporation 5 A new workpiece with different physical characteristics Adaptation 0. One peg-hole 1. Another hole 2. Small peg/hole

Slide 6

Slide 6 text

Learning for quick adaptation (C) 2021 OMRON SINIC X Corporation 6 A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning

Slide 7

Slide 7 text

State transition dynamics (C) 2021 OMRON SINIC X Corporation 7 A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning Action State Transition of the state of the robot

Slide 8

Slide 8 text

Problem / Example / Related works 1. Different dynamics in source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the env. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 8 (C) 2021 OMRON SINIC X Corporation Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.

Slide 9

Slide 9 text

Problem / Example / Related works 1. Different dynamics in source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 9 (C) 2021 OMRON SINIC X Corporation Peg-in-hole tasks with variable hole orientations Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.

Slide 10

Slide 10 text

Problem / Example / Related works 1. Different dynamics in source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 10 (C) 2021 OMRON SINIC X Corporation Require frequent access to source environments • Inter-dynamics transfer [Chen+ NeurIPS2018] • Meta-learning [Vanschoren 2018] Policy Env 1 Env K ... Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.

Slide 11

Slide 11 text

Using policies learned at source environments • No access with source envs. • Model-free RL • Many interactions at target envs. • Long delays until deployment • Damage of the robots and workpieces • Model-based RL (ours, TRANS-AM) 11 (C) 2021 OMRON SINIC X Corporation [Barekatain+ IJCAI2020] Policy Env 1 Env K ...

Slide 12

Slide 12 text

Outline • Motivation • Method • Experiment 12 (C) 2021 OMRON SINIC X Corporation

Slide 13

Slide 13 text

Model based RL (MBRL) A model • 𝑔 ~ 𝑇 • Learned using collected samples • Used to • predict the states from the actions • select the optimal actions (θ: parameter) 13 (C) 2021 OMRON SINIC X Corporation State-transition dynamics

Slide 14

Slide 14 text

Transfer MBRL (C) 2021 OMRON SINIC X Corporation 14 Target dynamics model • Unknown • Quickly learning using source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators ... ?

Slide 15

Slide 15 text

TRANS-AM learns a model of target dynamics (C) 2021 OMRON SINIC X Corporation 15 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models

Slide 16

Slide 16 text

Outline • Motivation • Method • Experiment 16 (C) 2021 OMRON SINIC X Corporation

Slide 17

Slide 17 text

Peg-in-hole simulations with different hole angles • State: the pose of the wrist and the peg • Action: the velocity of the wrist • Return: the distance to goal position 17 (C) 2021 OMRON SINIC X Corporation Wrist Peg Spring Hole orientation Gripper

Slide 18

Slide 18 text

Evaluation of learning 1. Returns in earlier episodes • Leaning curve 2. 1st success episode • average ± standard deviation 3. Success ratio • session with success before {5, 10, 15, 20 episode}/ all session • The number of source dynamics (K) is changed • Baseline: learning from scratch (C) 2021 OMRON SINIC X Corporation 18

Slide 19

Slide 19 text

Task performance using TRANS-AM (C) 2021 OMRON SINIC X Corporation 19

Slide 20

Slide 20 text

Learning curves (C) 2021 OMRON SINIC X Corporation 20 Higher returns in earlier episodes

Slide 21

Slide 21 text

1st success / success ratio (C) 2021 OMRON SINIC X Corporation 21 Average ± Standard deviation Faster High success rates e: episode K: source model number

Slide 22

Slide 22 text

• State: pose, force • Action: gripper velocity • Reward: Setup with a robot arm (C) 2021 OMRON SINIC X Corporation 22 Motion capture cameras x 6 (FLEX13, OptiTrack) Force-torque sensor (FT300, ROBOTIQ) Gripper (2F-85, ROBOTIQ) Compliant wrist UR5 robot arm (Universal Robots)

Slide 23

Slide 23 text

Task performance using TRANS-AM (C) 2021 OMRON SINIC X Corporation 23

Slide 24

Slide 24 text

Learning curves / Success ratios Higher returns in earlier episodes 24 (C) 2021 OMRON SINIC X Corporation

Slide 25

Slide 25 text

High success rates Success ratios (C) 2021 OMRON SINIC X Corporation 25 Average ± Standard deviation 30 % faster e: episode K: source model number

Slide 26

Slide 26 text

Conclusion: TRANS-AM • Proposed new method to leverage known models when learning new tasks • Efficient model-based reinforcement learning confirmed in simulation and real-robot experiments • 30% faster learning than from scratch 26 (C) 2021 OMRON SINIC X Corporation

Slide 27

Slide 27 text

27 (C) 2021 OMRON SINIC X Corporation

Slide 28

Slide 28 text

Appendix (C) 2021 OMRON SINIC X Corporation 28

Slide 29

Slide 29 text

Weight (C) 2021 OMRON SINIC X Corporation 29 Source 1 Source 2 Hole orientation Hole orientation Hole orientation Source 1 Source 2 Source 1 Source 2 Target Target Target

Slide 30

Slide 30 text

Success ratios (C) 2021 OMRON SINIC X Corporation 30 Average ± Standard deviation Faster e: episode K: source model number

Slide 31

Slide 31 text

The number of source environments (C) 2021 OMRON SINIC X Corporation 31 Hole orientation

Slide 32

Slide 32 text

Cross entropy method (C) 2021 OMRON SINIC X Corporation 32 [Didier & Olivier 2012]

Slide 33

Slide 33 text

Model predictive control 1. Input action sequence 2. Predict state sequence 3. Calculate rewards in the sequence 4. Update action sequence and go back 2. (C) 2021 OMRON SINIC X Corporation 33

Slide 34

Slide 34 text

Limitations of TRANS-AM • Not consider characteristics of environments • How many source envs? • How select envs? • Cost functions • Interactions with the env. (C) 2021 OMRON SINIC X Corporation 34

Slide 35

Slide 35 text

TRANS-AM uses black-box source models (C) 2021 OMRON SINIC X Corporation 35 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators

Slide 36

Slide 36 text

TRANS-AM learns characteristics of a new part Geometric • Size • Shape • Pose (Position, orientation) Dynamic • Mass • Inertia • Friction • Deformation • Elasticity (C) 2021 OMRON SINIC X Corporation 36