Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[Journal club] Flow as the Cross-Domain Manipul...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Semantic Machine Intelligence Lab., Keio Univ.
PRO
April 09, 2026
Technology
90
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[Journal club] Flow as the Cross-Domain Manipulation Interface
Semantic Machine Intelligence Lab., Keio Univ.
PRO
April 09, 2026
More Decks by Semantic Machine Intelligence Lab., Keio Univ.
See All by Semantic Machine Intelligence Lab., Keio Univ.
[Journal club ] PHyCLIP: ðð-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
keio_smilab
PRO
0
40
[Journal club] ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
keio_smilab
PRO
0
100
[Journal club] ReLaGS: Relational Language Gaussian Splatting
keio_smilab
PRO
0
100
Mobi-ð: Mobilizing Your Robot Learning Policy
keio_smilab
PRO
0
160
A Gentle Introduction to Transformers
keio_smilab
PRO
16
6.8k
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
keio_smilab
PRO
0
58
[Journal club] VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
keio_smilab
PRO
0
140
[Journal club] Improved Mean Flows: On the Challenges of Fastforward Generative Models
keio_smilab
PRO
0
200
[Journal club] MemER: Scaling Up Memory for Robot Control via Experience Retrieval
keio_smilab
PRO
0
140
Other Decks in Technology
See All in Technology
WebGIS AI Agentã®ç޹ä»
_shimizu
0
340
Bucharest Tech Week 2026 - Guardians of the Cloud-Native Galaxy
edeandrea
PRO
0
130
AWS Security Hub CSPMã®æåã»å€±æäœéš
cmusudakeisuke
0
450
AIã®Reactç¿çåºŠãæž¬ã
uhyo
2
670
AIãã£ããæ€çŽ¢æ¹åã®3é±é
kworkdev
PRO
2
160
Claude Codeãã©ã®ããã« ãã£ããã¢ããããŠããã
oikon48
13
8.7k
ãŒã£ã¡ã§ã¯ãããç»å£ãã51åãã241ä»¶ãã®çºä¿¡ã«åãã
subroh0508
1
290
IaC ã³ãŒããè³ç£ãžïŒAWS CDK 瀟å ã©ã€ãã©ãªãšæšªæå±é / aws-summit-japan-2026
gotok365
10
1.5k
ãNRUG vol.18ãKubernetesã«ãããNew RelicããŒã¿ååŸéåæžã®èãæ¹
nrug_member
0
170
人æè²æåç§äŒ.pdf
_awache
4
310
æ°è»œã«äœ¿ãã"æ å ±ã®ãã"ãšããŠã®Notion掻çšãããããŒæ å ±ã®éç©ç¹ ãšã Claude Code à Notion AIã
syucream
1
170
AIãèªåŸçã«åãéçºã«ãŒããèšèšããŠããŒã éçºã«çµã¿èŸŒã
nekorush14
0
110
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
46
8.2k
Avoiding the âBad Training, Fasterâ Trap in the Age of AI
tmiket
0
180
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
370
Making the Leap to Tech Lead
cromwellryan
135
9.9k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Practical Orchestrator
shlominoach
191
11k
How STYLIGHT went responsive
nonsquared
100
6.2k
Ruling the World: When Life Gets Gamed
codingconduct
0
260
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.5k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Speed Design
sergeychernyshev
33
1.9k
Transcript
Mengda Xu1,2,3 Zhenjia Xu1,2 Yinghao Xu1 Cheng Chi1,2 Gordon Wetzstein1
Manuela Veloso3,4 Shuran Song1,2 1Stanford University, 2Columbia University, 3J.P. Morgan AI Research, 4Carnegie Mellon University Flow as the Cross-Domain Manipulation Interface 2026 ææµŠåæç 究宀 å°æè倪 Xu, Mengda., Xu, Zhenjia., Xu, Yinghao., Chi, Cheng., Wetzstein, Gordon., Veloso, Manuela., Song, Shuran. âFlow as the Cross-domain Manipulation Interfaceâ. In 8th Conference of Robot Learning, 2024. CoRL24
æŠèŠ 2 âª èæ¯ âª å®æ©ã§ã®ããŒã¿åéã¯é«ã³ã¹ã 容æã«åéå¯èœãªããŒã¿ãããããåŠç¿ã«äœ¿ããã ⪠人éã®åç»ïŒã·ãã¥ã¬ãŒã·ã§ã³ããŒã¿ âª
ææ¡ææ³ïŒIm2Flow2Act ⪠object flow ãåªä»ã«ããè»éçæãã¬ãŒã ã¯ãŒã¯ ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸ âª çµæ ⪠ããããã®å®æ©ããŒã¿ã䜿çšããã«ç©äœæäœå¯èœ ⪠ã·ãã¥ã¬ãŒã·ã§ã³ã»å®æ©å®éšã«ãããŠããŒã¹ã©ã€ã³ãäžåã
èæ¯ïŒãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸã®å¿ èŠæ§ 3 ï ããããã®å®æ©ããŒã¿ã®åéã¯é«ã³ã¹ã ï 宿©ç°å¢ã«åãããã·ãã¥ã¬ãŒã¿ç°å¢ã®æ§ç¯ã¯é«ã³ã¹ã åéã³ã¹ãã®äœãããŒã¿ãçšããã ⌠人éåç» ï
human-robot ã®ãšã³ããã£ã¡ã³ãã®ã£ãã ⌠ã·ãã¥ã¬ãŒã¿ã®åäžç°å¢ã«ãããè»éããŒã¿ ï sim-real ã®ãã¡ã€ã³ã®ã£ãã (èæ¯, ç©äœãã¯ã¹ãã£, etc...) ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸã®å¿ èŠæ§
é¢é£ç ç©¶: cross-domain data ããã®ããããåŠç¿ 4 ææ³ ç¹åŸŽ VRB [Bahl+, CVPR23]
人éåç»ããç©äœã®ææç¹ãšè»éãåŠç¿ PEAC [Ying+, NeurIPS24] Cross-embodiment data ãã latent action ãåŠç¿ ATM [Wen+, RSS24] 人éåç»ãã hand-centric ãªãããŒãåŠç¿ ï 宿©é©çšæã« target embodiment ã§ã®ããŒã¿åéãå¿ èŠ VRB [Bahl+, CVPR23] ATM [Wen+, RSS24]
ææ¡ææ³ïŒIm2Flow2Act 5 â Im2FlowïŒåæç»å, èšèªæç€º object flow ã¿ã¹ã¯ããšã«åéãã人éåç»ãçšããŠèšç·Ž object flowã«ãã
cross-embodiment data ãçšããè»éçæãã¬ãŒã ã¯ãŒã¯ ⺠ç©äœã®å§¿å¢ãå€åœ¢ã衚çŸå¯èœ ⺠embodiment-agnostic âº èæ¯ããã¯ã¹ãã£ã«é å¥ â¡ Flow2ActïŒobject flow, çŸåšç»å è»é ã·ãã¥ã¬ãŒã¿ã§åéããè»éããŒã¿ãçšããŠèšç·Ž ã¿ã¹ã¯å šäœã«ãããç©äœè»é
åæïŒLDM (Latent Diffusion Model), AnimateDiff 6 äºååŠç¿æžã¿T2Iæ¡æ£ã¢ãã« (SD) ã« motion
module ãå°å ¥ããŠåç»ãçæ âŒ Temporal Transformer ⺠æéæ¹åã®äžè²«æ§ãåäž âŒ T2Iã¢ãã«ã freeze ããŠåŠç¿ãè¡ã ⺠äœã³ã¹ããªèšç·Ž AnimateDiff [Guo+, ICLR24] LDM [Rombach+, CVPR22] çæç©ºéãäœæ¬¡å ãªæœåšç©ºéã«ããã ãšã§é«å質ãªç»åãé«éã«çæå¯èœ ⌠cross-attention ⺠æè»ãªæ¡ä»¶å ¥å (text, bbox, etc...) ⌠èšç®éãåæž ⺠é«å¹çãªèšç·Ž ⺠é«éãªæšè« ⌠Stable Diffusion ãäœ¿çš LDM [Rombach+, CVPR22] AnimateDiff [Guo+, ICLR24]
(b) AnimateDiff (SDããŒã¹) ã§ãããŒãçæ â ãããŒãSDã®æœåšç©ºéã«ãšã³ã³ãŒã ð¥0:ð 0 = ðžð
â±ð |ð â [0, ð] â¡ motion module ãèšç·Ž ð¥1:ð ð¡ = àŽ€ ðŒð¡ ð¥1:ð 0 + 1 â àŽ€ ðŒð¡ ð1:ð (æ¡æ£éçš) â = ðŒ ðž â±1:ð ,ð¥0 0,ð,ðŠ,ð0:ð~ð© 0,ðŒ ,ð¡ ð â ðð ð¥1:ð ð¡ , ð¡, ð¥0 0, ð ð ððð(ð), ðð ð¡ð¥ð¡ ðŠ 2 2 ⢠ð·ð ãfinetuneããŠãããŒãåºå Im2FlowïŒFlow Generation Network 7 åæç»å + èšèªæç€º object flow â Grounding DINOã§bboxãååŸ â¡ bboxå ãåäžã«ãµã³ããªã³ã° â±0 â ð 3Ãð»Ãð ð¢, ð£, ð£ðð ðððððð¡ðŠ ð¢, ð£:ç»åå ã®åº§æš ð£ðð ðððððð¡ðŠ:ç©äœã®å¯èŠæ§ (a) ð» ð â±1 â ð 3ÃðÃð»Ãð â±0:ð ïŒæ£è§£ãã㌠ð ïŒåæç»å ðŠ ïŒèšèªæç€º ð¡ ïŒæå» ðžð ïŒSDãšã³ã³ãŒã ð·ð ïŒSDãã³ãŒã àŽ€ ðŒt :ãã€ãºã¹ã±ãžã¥ãŒã© (pre-defined) ð ð ððð/ðð ð¡ð¥ð¡ïŒCLIPãšã³ã³ãŒã (ç»å/èšèª)
(c) 1)State Encoder ð : ð ð¡ = ð(ðð¡ , ð¥0
) ã» (察象ã®äœçœ®ãå§¿å¢ã«é¢ãã) ç¶æ 衚çŸãçæ ã»åç¹ã®åº§æšããšã³ã³ãŒããCLSããŒã¯ã³ã§èŠçŽ 2)Temporal Alignment ð : ð§ð¡ = ð(â±0:ð , ð ð¡ , ðð¡ ) ã»æå»t 以éã®ãããŒã«ã€ããŠã®æœåšè¡šçŸãäºæž¬ ã»ð¿2 lossïŒ Æž ð§ð¡ â ð§ð¡ 2 Æž ð§ð¡ = ð ðð¡:ð 3)Diffusion Action Head : ð(ðð¡ |ð§ð¡ , ð ð¡ , ðð¡ ) æ¡æ£ã¢ãã«ãçšããŠè»éç³»åãçæ Flow2ActïŒ Flow-Conditioned Policy 8 ðð¡ : Nåã® key point ã®æå»tã«ããã ç»åå åº§æš ð¢ð¡ ð, ð£ð¡ ð ð=1 ð ð¥0 : Nåã® key point ã®åæãã¬ãŒã ã«ããã3次å åº§æš â±0:ð : ã¿ã¹ã¯å šäœã® object flow ðð¡ : ããããã® proprioception ð : ãããŒãæœåšè¡šçŸã«ãšã³ã³ãŒã ðð¡ : æå»tããã®è»éç³»å ðð¡ , ⊠, ðð¡+ð¿ (b) Online Point Tracking TAPIR ãçšã㊠key point ã远跡 TAPIR [Doersch+, ICCV23] object flow, çŸåšç»å è»é
å®éšèšå® 9 ⌠4ã€ã®ã¿ã¹ã¯ã§è©äŸ¡ ⌠Pick-and-place ⌠Pouring ⌠Open
drawer ⌠Folding cloth ⌠åŠç¿èšå® ⌠object flow: ⌠H=W=32 ⌠T=32 ⌠åŠç¿æéïŒèšèŒãªã ⌠ããããïŒUR5e â±1 â ð 3ÃðÃð»Ãð ⌠èšç·ŽããŒã¿ ⌠人éåç» âŒ äººéã«ããåã¿ã¹ã¯ã®ã㢠⌠ããŒã¿æ°: èšèŒãªã ⌠ã·ãã¥ã¬ãŒã¿ïŒMuJoCo ⌠ããããïŒUR5e ⌠ããŒã¿æ°: 4800
å®éççµæ@Simulation 10 âº å šãŠã®ã¿ã¹ã¯ã§ä»ææ³ãäžåã Demonstration-conditioned åäžã®äººéã®ãã¢ããobject flowãæœåº Language-conditioned åæãã¬ãŒã ãšèšèªæç€ºãå ¥åãšããã·ã¹ãã å šäœã®è©äŸ¡ Ablation
1)Heuristic å§¿å¢æšå®ã«ããè»éçæ 2)Grid Flow ç»åå šäœãäžæ§ã«ãµã³ããªã³ã° 3)No alignment Temporal Alignmentã䜿çšããªã
å®éççµæ@Real-World 11 âº å šãŠã®ã¿ã¹ã¯ã§ä»ææ³ãäžåã Demonstration-conditioned åäžã®äººéã®ãã¢ããobject flowãæœåº Language-conditioned åæãã¬ãŒã ãšèšèªæç€ºãå ¥åãšããã·ã¹ãã å šäœã®è©äŸ¡ Ablation
1)Heuristic å§¿å¢æšå®ã«ããè»éçæ 2)Grid Flow ç»åå šäœãäžæ§ã«ãµã³ããªã³ã° 3)No alignment Temporal Alignmentã䜿çšããªã
宿§ççµæ 12 ⺠object flow ãç©äœã®è»éãé©åã«è¡šçŸ
宿§ççµæ: Ablation Study ã«ããã倱æäŸ 13 ã¯ããã«åŒãåºããæŒãæ»ã è»éã¯æ£ãããïŒã³ãããå転ããŠãã ãããŒãšå®è»éã«ãããæå»ã®äžæŽå ï äžæ£ç¢ºãªåäœ
ï 誀ã£ãæ¹åãžã®åäœ ï ãžãã¿ãŒãçºç
ãŸãšã 14 âª èæ¯ âª å®æ©ã§ã®ããŒã¿åéã¯é«ã³ã¹ã 容æã«åéå¯èœãªããŒã¿ãããããåŠç¿ã«äœ¿ããã ⪠人éã®åç»ïŒã·ãã¥ã¬ãŒã·ã§ã³ããŒã¿ âª
ææ¡ææ³ïŒIm2Flow2Act ⪠object flow ãåªä»ã«ããè»éçæãã¬ãŒã ã¯ãŒã¯ ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸ âª çµæ ⪠ããããã®å®æ©ããŒã¿ã䜿çšããã«ç©äœæäœå¯èœ ⪠ã·ãã¥ã¬ãŒã·ã§ã³ã»å®æ©å®éšã«ãããŠããŒã¹ã©ã€ã³ãäžåã