Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly (ICRA'21)

Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa Ijiri, “TRANS-AM: Transfer Learning by Aggregating Dynamics Models for Soft Robotic Assembly”, ICRA 2021

blog:
https://medium.com/sinicx/trans-am-transfer-learning-by-aggregating-dynamics-models-for-soft-robotic-assembly-53ef3451d066

OMRON SINIC X

June 02, 2021
Tweet

More Decks by OMRON SINIC X

Other Decks in Research

Transcript

  1. © 2021 OMRON SINIC X Corporation. All Rights Reserved.
    TRANS-AM: Transfer Learning by
    Aggregating Dynamics Models for Soft
    Robotic Assembly (ICRA’21)
    Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee, Felix von Drigalski, and Yoshihisa
    Ijiri (OMRON SINIC X)
    International Conference on Robotics and Automation (ICRA 2021)
    June 1, 2021

    View full-size slide

  2. TRANS-AM: Transfer Learning by Aggregating
    Dynamics Models for Soft Robotic Assembly
    Kazutoshi Tanaka, Ryo Yonetani, Masashi Hamaya, Robert Lee,
    Felix von Drigalski, and Yoshihisa Ijiri
    OMRON SINIC X Corporation

    View full-size slide

  3. Outline
    • Motivation
    • Method
    • Experiment
    3
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  4. Outline
    • Motivation
    • Method
    • Experiment
    4
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  5. Quick adaptation to a new setup
    (C) 2021 OMRON SINIC X Corporation 5
    A new workpiece with different physical characteristics
    Adaptation
    0. One peg-hole 1. Another hole 2. Small peg/hole

    View full-size slide

  6. Learning for quick adaptation
    (C) 2021 OMRON SINIC X Corporation 6
    A new workpiece with different physical characteristics
    0. One peg-hole 1. Another hole 2. Small peg/hole
    Source
    env.
    Target
    env.
    env. = environment
    Adaptation
    Learning

    View full-size slide

  7. State transition dynamics
    (C) 2021 OMRON SINIC X Corporation 7
    A new workpiece with different physical characteristics
    0. One peg-hole 1. Another hole 2. Small peg/hole
    Source
    env.
    Target
    env.
    env. = environment
    Adaptation
    Learning
    Action
    State
    Transition of the state of the robot

    View full-size slide

  8. Problem / Example / Related works
    1. Different dynamics in source and target env.
    • Hole orientations largely affect the dynamics
    2. Unknown dynamics of the env.
    • Link mass, joint dumping, friction, inertia
    3. No communication with the source env.
    • Distributed factory lines
    8
    (C) 2021 OMRON SINIC X Corporation
    Source
    env.
    Target
    env.

    Source
    env.
    Target
    env.
    ?
    Source
    env.
    Target
    env.

    View full-size slide

  9. Problem / Example / Related works
    1. Different dynamics in source and target env.
    • Hole orientations largely affect the dynamics
    2. Unknown dynamics of the envs.
    • Link mass, joint dumping, friction, inertia
    3. No communication with the source env.
    • Distributed factory lines
    9
    (C) 2021 OMRON SINIC X Corporation
    Peg-in-hole tasks with variable hole orientations
    Source
    env.
    Target
    env.

    Source
    env.
    Target
    env.
    ?
    Source
    env.
    Target
    env.

    View full-size slide

  10. Problem / Example / Related works
    1. Different dynamics in source and target env.
    • Hole orientations largely affect the dynamics
    2. Unknown dynamics of the envs.
    • Link mass, joint dumping, friction, inertia
    3. No communication with the source env.
    • Distributed factory lines
    10
    (C) 2021 OMRON SINIC X Corporation
    Require frequent access to source environments
    • Inter-dynamics transfer [Chen+ NeurIPS2018]
    • Meta-learning [Vanschoren 2018]
    Policy
    Env 1
    Env K
    ...
    Source
    env.
    Target
    env.

    Source
    env.
    Target
    env.
    ?
    Source
    env.
    Target
    env.

    View full-size slide

  11. Using policies learned at source environments
    • No access with source envs.
    • Model-free RL
    • Many interactions at target envs.
    • Long delays until deployment
    • Damage of the robots and workpieces
    • Model-based RL (ours, TRANS-AM)
    11
    (C) 2021 OMRON SINIC X Corporation
    [Barekatain+ IJCAI2020]
    Policy
    Env 1
    Env K
    ...

    View full-size slide

  12. Outline
    • Motivation
    • Method
    • Experiment
    12
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  13. Model based RL (MBRL)
    A model
    • 𝑔 ~ 𝑇
    • Learned using collected samples
    • Used to
    • predict the states from the actions
    • select the optimal actions
    (θ: parameter)
    13
    (C) 2021 OMRON SINIC X Corporation
    State-transition dynamics

    View full-size slide

  14. Transfer MBRL
    (C) 2021 OMRON SINIC X Corporation 14
    Target dynamics model
    • Unknown
    • Quickly learning using source dynamics models
    Source dynamics models
    • (not) parameterized, trainable
    • Learned neural networks
    • Hand-engineered simulators
    ...
    ?

    View full-size slide

  15. TRANS-AM learns a model of target dynamics
    (C) 2021 OMRON SINIC X Corporation 15
    2. Auxiliary model to compensate the residual
    1. Aggregates the outputs from source dynamics models

    View full-size slide

  16. Outline
    • Motivation
    • Method
    • Experiment
    16
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  17. Peg-in-hole simulations with different hole angles
    • State: the pose of the wrist and the peg
    • Action: the velocity of the wrist
    • Return: the distance to goal position
    17
    (C) 2021 OMRON SINIC X Corporation
    Wrist
    Peg
    Spring
    Hole orientation
    Gripper

    View full-size slide

  18. Evaluation of learning
    1. Returns in earlier episodes
    • Leaning curve
    2. 1st success episode
    • average ± standard deviation
    3. Success ratio
    • session with success before {5, 10, 15, 20 episode}/ all session
    • The number of source dynamics (K) is changed
    • Baseline: learning from scratch
    (C) 2021 OMRON SINIC X Corporation 18

    View full-size slide

  19. Task performance using TRANS-AM
    (C) 2021 OMRON SINIC X Corporation 19

    View full-size slide

  20. Learning curves
    (C) 2021 OMRON SINIC X Corporation 20
    Higher returns in earlier episodes

    View full-size slide

  21. 1st success / success ratio
    (C) 2021 OMRON SINIC X Corporation 21
    Average ± Standard deviation
    Faster High success rates
    e: episode
    K: source model number

    View full-size slide

  22. • State: pose, force
    • Action: gripper velocity
    • Reward:
    Setup with a robot arm
    (C) 2021 OMRON SINIC X Corporation 22
    Motion capture cameras x 6 (FLEX13, OptiTrack)
    Force-torque sensor (FT300, ROBOTIQ)
    Gripper (2F-85, ROBOTIQ)
    Compliant wrist
    UR5 robot arm (Universal Robots)

    View full-size slide

  23. Task performance using TRANS-AM
    (C) 2021 OMRON SINIC X Corporation 23

    View full-size slide

  24. Learning curves / Success ratios
    Higher returns in earlier episodes
    24
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  25. High success rates
    Success ratios
    (C) 2021 OMRON SINIC X Corporation 25
    Average ± Standard deviation
    30 % faster
    e: episode
    K: source model number

    View full-size slide

  26. Conclusion: TRANS-AM
    • Proposed new method to leverage known models when learning new tasks
    • Efficient model-based reinforcement learning
    confirmed in simulation and real-robot experiments
    • 30% faster learning than from scratch
    26
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  27. 27
    (C) 2021 OMRON SINIC X Corporation

    View full-size slide

  28. Appendix
    (C) 2021 OMRON SINIC X Corporation 28

    View full-size slide

  29. Weight
    (C) 2021 OMRON SINIC X Corporation 29
    Source 1 Source 2
    Hole orientation Hole orientation
    Hole orientation
    Source 1 Source 2
    Source 1
    Source 2
    Target Target Target

    View full-size slide

  30. Success ratios
    (C) 2021 OMRON SINIC X Corporation 30
    Average ± Standard deviation
    Faster
    e: episode
    K: source model number

    View full-size slide

  31. The number of source environments
    (C) 2021 OMRON SINIC X Corporation 31
    Hole orientation

    View full-size slide

  32. Cross entropy method
    (C) 2021 OMRON SINIC X Corporation 32
    [Didier & Olivier 2012]

    View full-size slide

  33. Model predictive control
    1. Input action sequence
    2. Predict state sequence
    3. Calculate rewards in the sequence
    4. Update action sequence and go back 2.
    (C) 2021 OMRON SINIC X Corporation 33

    View full-size slide

  34. Limitations of TRANS-AM
    • Not consider characteristics of environments
    • How many source envs?
    • How select envs?
    • Cost functions
    • Interactions with the env.
    (C) 2021 OMRON SINIC X Corporation 34

    View full-size slide

  35. TRANS-AM uses black-box source models
    (C) 2021 OMRON SINIC X Corporation 35
    2. Auxiliary model to compensate the residual
    1. Aggregates the outputs from source dynamics models
    Source dynamics models
    • (not) parameterized, trainable
    • Learned neural networks
    • Hand-engineered simulators

    View full-size slide

  36. TRANS-AM learns characteristics of a new part
    Geometric
    • Size
    • Shape
    • Pose
    (Position, orientation)
    Dynamic
    • Mass
    • Inertia
    • Friction
    • Deformation
    • Elasticity
    (C) 2021 OMRON SINIC X Corporation 36

    View full-size slide