Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri, “TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly”, ICRA 2021

blog:
https://medium.com/sinicx/trans-am-transfer-learning-by-aggregating-dynamics-models-for-soft-robotic-assembly-53ef3451d066

753eb5a167cf43a033413f08e63f3632?s=128

OMRON SINIC X

June 02, 2021
Tweet

Transcript

  1. © 2021 OMRON SINIC X Corporation. All Rights Reserved. TRANS-AM:

    Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA’21) Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri (OMRON SINIC X) International Conference on Robotics and Automation (ICRA 2021) June 1, 2021
  2. TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic

    Assembly Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri OMRON SINIC X Corporation
  3. Outline • Motivation • Method • Experiment 3 (C) 2021

    OMRON SINIC X Corporation
  4. Outline • Motivation • Method • Experiment 4 (C) 2021

    OMRON SINIC X Corporation
  5. Quick adaptation to a new setup (C) 2021 OMRON SINIC

    X Corporation 5 A new workpiece with different physical characteristics Adaptation 0. One peg-hole 1. Another hole 2. Small peg/hole
  6. Learning for quick adaptation (C) 2021 OMRON SINIC X Corporation

    6 A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning
  7. State transition dynamics (C) 2021 OMRON SINIC X Corporation 7

    A new workpiece with different physical characteristics 0. One peg-hole 1. Another hole 2. Small peg/hole Source env. Target env. env. = environment Adaptation Learning Action State Transition of the state of the robot
  8. Problem / Example / Related works 1. Different dynamics in

    source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the env. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 8 (C) 2021 OMRON SINIC X Corporation Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.
  9. Problem / Example / Related works 1. Different dynamics in

    source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 9 (C) 2021 OMRON SINIC X Corporation Peg-in-hole tasks with variable hole orientations Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.
  10. Problem / Example / Related works 1. Different dynamics in

    source and target env. • Hole orientations largely affect the dynamics 2. Unknown dynamics of the envs. • Link mass, joint dumping, friction, inertia 3. No communication with the source env. • Distributed factory lines 10 (C) 2021 OMRON SINIC X Corporation Require frequent access to source environments • Inter-dynamics transfer [Chen+ NeurIPS2018] • Meta-learning [Vanschoren 2018] Policy Env 1 Env K ... Source env. Target env. ≠ Source env. Target env. ? Source env. Target env.
  11. Using policies learned at source environments • No access with

    source envs. • Model-free RL • Many interactions at target envs. • Long delays until deployment • Damage of the robots and workpieces • Model-based RL (ours, TRANS-AM) 11 (C) 2021 OMRON SINIC X Corporation [Barekatain+ IJCAI2020] Policy Env 1 Env K ...
  12. Outline • Motivation • Method • Experiment 12 (C) 2021

    OMRON SINIC X Corporation
  13. Model based RL (MBRL) A model • 𝑔 ~ 𝑇

    • Learned using collected samples • Used to • predict the states from the actions • select the optimal actions (θ: parameter) 13 (C) 2021 OMRON SINIC X Corporation State-transition dynamics
  14. Transfer MBRL (C) 2021 OMRON SINIC X Corporation 14 Target

    dynamics model • Unknown • Quickly learning using source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators ... ?
  15. TRANS-AM learns a model of target dynamics (C) 2021 OMRON

    SINIC X Corporation 15 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models
  16. Outline • Motivation • Method • Experiment 16 (C) 2021

    OMRON SINIC X Corporation
  17. Peg-in-hole simulations with different hole angles • State: the pose

    of the wrist and the peg • Action: the velocity of the wrist • Return: the distance to goal position 17 (C) 2021 OMRON SINIC X Corporation Wrist Peg Spring Hole orientation Gripper
  18. Evaluation of learning 1. Returns in earlier episodes • Leaning

    curve 2. 1st success episode • average ± standard deviation 3. Success ratio • session with success before {5, 10, 15, 20 episode}/ all session • The number of source dynamics (K) is changed • Baseline: learning from scratch (C) 2021 OMRON SINIC X Corporation 18
  19. Task performance using TRANS-AM (C) 2021 OMRON SINIC X Corporation

    19
  20. Learning curves (C) 2021 OMRON SINIC X Corporation 20 Higher

    returns in earlier episodes
  21. 1st success / success ratio (C) 2021 OMRON SINIC X

    Corporation 21 Average ± Standard deviation Faster High success rates e: episode K: source model number
  22. • State: pose, force • Action: gripper velocity • Reward:

    Setup with a robot arm (C) 2021 OMRON SINIC X Corporation 22 Motion capture cameras x 6 (FLEX13, OptiTrack) Force-torque sensor (FT300, ROBOTIQ) Gripper (2F-85, ROBOTIQ) Compliant wrist UR5 robot arm (Universal Robots)
  23. Task performance using TRANS-AM (C) 2021 OMRON SINIC X Corporation

    23
  24. Learning curves / Success ratios Higher returns in earlier episodes

    24 (C) 2021 OMRON SINIC X Corporation
  25. High success rates Success ratios (C) 2021 OMRON SINIC X

    Corporation 25 Average ± Standard deviation 30 % faster e: episode K: source model number
  26. Conclusion: TRANS-AM • Proposed new method to leverage known models

    when learning new tasks • Efficient model-based reinforcement learning confirmed in simulation and real-robot experiments • 30% faster learning than from scratch 26 (C) 2021 OMRON SINIC X Corporation
  27. 27 (C) 2021 OMRON SINIC X Corporation

  28. Appendix (C) 2021 OMRON SINIC X Corporation 28

  29. Weight (C) 2021 OMRON SINIC X Corporation 29 Source 1

    Source 2 Hole orientation Hole orientation Hole orientation Source 1 Source 2 Source 1 Source 2 Target Target Target
  30. Success ratios (C) 2021 OMRON SINIC X Corporation 30 Average

    ± Standard deviation Faster e: episode K: source model number
  31. The number of source environments (C) 2021 OMRON SINIC X

    Corporation 31 Hole orientation
  32. Cross entropy method (C) 2021 OMRON SINIC X Corporation 32

    [Didier & Olivier 2012]
  33. Model predictive control 1. Input action sequence 2. Predict state

    sequence 3. Calculate rewards in the sequence 4. Update action sequence and go back 2. (C) 2021 OMRON SINIC X Corporation 33
  34. Limitations of TRANS-AM • Not consider characteristics of environments •

    How many source envs? • How select envs? • Cost functions • Interactions with the env. (C) 2021 OMRON SINIC X Corporation 34
  35. TRANS-AM uses black-box source models (C) 2021 OMRON SINIC X

    Corporation 35 2. Auxiliary model to compensate the residual 1. Aggregates the outputs from source dynamics models Source dynamics models • (not) parameterized, trainable • Learned neural networks • Hand-engineered simulators
  36. TRANS-AM learns characteristics of a new part Geometric • Size

    • Shape • Pose (Position, orientation) Dynamic • Mass • Inertia • Friction • Deformation • Elasticity (C) 2021 OMRON SINIC X Corporation 36