Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[Journal club] Flow as the Cross-Domain Manipul...
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Semantic Machine Intelligence Lab., Keio Univ.
PRO
April 09, 2026
Technology
90
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
[Journal club] Flow as the Cross-Domain Manipulation Interface
Semantic Machine Intelligence Lab., Keio Univ.
PRO
April 09, 2026
More Decks by Semantic Machine Intelligence Lab., Keio Univ.
See All by Semantic Machine Intelligence Lab., Keio Univ.
[Journal club ] PHyCLIP: ðð-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
keio_smilab
PRO
0
40
[Journal club] ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
keio_smilab
PRO
0
100
[Journal club] ReLaGS: Relational Language Gaussian Splatting
keio_smilab
PRO
0
100
Mobi-ð: Mobilizing Your Robot Learning Policy
keio_smilab
PRO
0
160
A Gentle Introduction to Transformers
keio_smilab
PRO
16
6.8k
FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching
keio_smilab
PRO
0
58
[Journal club] VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
keio_smilab
PRO
0
140
[Journal club] Improved Mean Flows: On the Challenges of Fastforward Generative Models
keio_smilab
PRO
0
200
[Journal club] MemER: Scaling Up Memory for Robot Control via Experience Retrieval
keio_smilab
PRO
0
140
Other Decks in Technology
See All in Technology
AI-DLCã âãã®ãŸãŸå°å ¥ããªãã£ãâ話 ~çµç¹ã«åãããŠã¢ãžã£ã¹ããã ç§ãã¡ã®å®è·µå ±æ~
hiroramos4
PRO
1
380
PostgreSQL 19 æ°æ©èœæŠèŠ OSC Hokkaido 2026
nori_shinoda
0
210
åèã§ã®å身赎任ããAWSãããç¶ããåèã«æ»ã£ãŠãã話
yama3133
1
100
ã¶ã»ããŒã¿ããŒã¹ãMySQL ïœ OSC 2026 Sendai ïœ
sakaik
0
170
ãåæã«åºãŸããäººæ° AI ãšãŒãžã§ã³ããçéã§äœããïŒïŒAWS Summit Japan 2026è¬æŒè³æïŒ
minorun365
PRO
10
2.3k
ãã£ãžã«ã«çGithub Onshapeã®ç޹ä»
shiba_8ro
0
310
AI äžåªå¹«äœ 寫 CodeïŒ ç¶å°æ¡åŸ 300 æŽå¢å° 1500ïŒ æååŠäœæäœ DevOps
appleboy
0
120
AIãèªåŸçã«åãéçºã«ãŒããèšèšããŠããŒã éçºã«çµã¿èŸŒã
nekorush14
0
110
ãã»ãããŒè³æãClaude Code ãã»ãã¥ã¢ã«äœ¿ãããã®èãæ¹ãšèšå®ã®åã©ãã / Claude Code Webinar 20260616
masahirokawahara
2
440
èªåã詳ãããªãé åã§AIã䜿ã #ãããã¹2026
konifar
20
7k
[AWS Summit Japan 2026]è¿·ã£ãŠããããªããž_å°ããªäžæ©ãããããããŠèªåãå©ããŠããã
sh_fk2
1
340
æ°è»œã«äœ¿ãã"æ å ±ã®ãã"ãšããŠã®Notion掻çšãããããŒæ å ±ã®éç©ç¹ ãšã Claude Code à Notion AIã
syucream
1
170
Featured
See All Featured
The Power of CSS Pseudo Elements
geoffreycrofte
82
6.3k
Docker and Python
trallard
47
3.9k
Become a Pro
speakerdeck
PRO
31
6k
A Soul's Torment
seathinner
6
3k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
6k
Producing Creativity
orderedlist
PRO
348
40k
Building the Perfect Custom Keyboard
takai
2
800
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.4k
Jamie Indigo - Trashchatâs Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
170
Rails Girls ZÃŒrich Keynote
gr2m
96
14k
Claude Code ã®ããã
schroneko
67
230k
The Spectacular Lies of Maps
axbom
PRO
1
820
Transcript
Mengda Xu1,2,3 Zhenjia Xu1,2 Yinghao Xu1 Cheng Chi1,2 Gordon Wetzstein1
Manuela Veloso3,4 Shuran Song1,2 1Stanford University, 2Columbia University, 3J.P. Morgan AI Research, 4Carnegie Mellon University Flow as the Cross-Domain Manipulation Interface 2026 ææµŠåæç 究宀 å°æè倪 Xu, Mengda., Xu, Zhenjia., Xu, Yinghao., Chi, Cheng., Wetzstein, Gordon., Veloso, Manuela., Song, Shuran. âFlow as the Cross-domain Manipulation Interfaceâ. In 8th Conference of Robot Learning, 2024. CoRL24
æŠèŠ 2 âª èæ¯ âª å®æ©ã§ã®ããŒã¿åéã¯é«ã³ã¹ã 容æã«åéå¯èœãªããŒã¿ãããããåŠç¿ã«äœ¿ããã ⪠人éã®åç»ïŒã·ãã¥ã¬ãŒã·ã§ã³ããŒã¿ âª
ææ¡ææ³ïŒIm2Flow2Act ⪠object flow ãåªä»ã«ããè»éçæãã¬ãŒã ã¯ãŒã¯ ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸ âª çµæ ⪠ããããã®å®æ©ããŒã¿ã䜿çšããã«ç©äœæäœå¯èœ ⪠ã·ãã¥ã¬ãŒã·ã§ã³ã»å®æ©å®éšã«ãããŠããŒã¹ã©ã€ã³ãäžåã
èæ¯ïŒãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸã®å¿ èŠæ§ 3 ï ããããã®å®æ©ããŒã¿ã®åéã¯é«ã³ã¹ã ï 宿©ç°å¢ã«åãããã·ãã¥ã¬ãŒã¿ç°å¢ã®æ§ç¯ã¯é«ã³ã¹ã åéã³ã¹ãã®äœãããŒã¿ãçšããã ⌠人éåç» ï
human-robot ã®ãšã³ããã£ã¡ã³ãã®ã£ãã ⌠ã·ãã¥ã¬ãŒã¿ã®åäžç°å¢ã«ãããè»éããŒã¿ ï sim-real ã®ãã¡ã€ã³ã®ã£ãã (èæ¯, ç©äœãã¯ã¹ãã£, etc...) ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸã®å¿ èŠæ§
é¢é£ç ç©¶: cross-domain data ããã®ããããåŠç¿ 4 ææ³ ç¹åŸŽ VRB [Bahl+, CVPR23]
人éåç»ããç©äœã®ææç¹ãšè»éãåŠç¿ PEAC [Ying+, NeurIPS24] Cross-embodiment data ãã latent action ãåŠç¿ ATM [Wen+, RSS24] 人éåç»ãã hand-centric ãªãããŒãåŠç¿ ï 宿©é©çšæã« target embodiment ã§ã®ããŒã¿åéãå¿ èŠ VRB [Bahl+, CVPR23] ATM [Wen+, RSS24]
ææ¡ææ³ïŒIm2Flow2Act 5 â Im2FlowïŒåæç»å, èšèªæç€º object flow ã¿ã¹ã¯ããšã«åéãã人éåç»ãçšããŠèšç·Ž object flowã«ãã
cross-embodiment data ãçšããè»éçæãã¬ãŒã ã¯ãŒã¯ ⺠ç©äœã®å§¿å¢ãå€åœ¢ã衚çŸå¯èœ ⺠embodiment-agnostic âº èæ¯ããã¯ã¹ãã£ã«é å¥ â¡ Flow2ActïŒobject flow, çŸåšç»å è»é ã·ãã¥ã¬ãŒã¿ã§åéããè»éããŒã¿ãçšããŠèšç·Ž ã¿ã¹ã¯å šäœã«ãããç©äœè»é
åæïŒLDM (Latent Diffusion Model), AnimateDiff 6 äºååŠç¿æžã¿T2Iæ¡æ£ã¢ãã« (SD) ã« motion
module ãå°å ¥ããŠåç»ãçæ âŒ Temporal Transformer ⺠æéæ¹åã®äžè²«æ§ãåäž âŒ T2Iã¢ãã«ã freeze ããŠåŠç¿ãè¡ã ⺠äœã³ã¹ããªèšç·Ž AnimateDiff [Guo+, ICLR24] LDM [Rombach+, CVPR22] çæç©ºéãäœæ¬¡å ãªæœåšç©ºéã«ããã ãšã§é«å質ãªç»åãé«éã«çæå¯èœ ⌠cross-attention ⺠æè»ãªæ¡ä»¶å ¥å (text, bbox, etc...) ⌠èšç®éãåæž ⺠é«å¹çãªèšç·Ž ⺠é«éãªæšè« ⌠Stable Diffusion ãäœ¿çš LDM [Rombach+, CVPR22] AnimateDiff [Guo+, ICLR24]
(b) AnimateDiff (SDããŒã¹) ã§ãããŒãçæ â ãããŒãSDã®æœåšç©ºéã«ãšã³ã³ãŒã ð¥0:ð 0 = ðžð
â±ð |ð â [0, ð] â¡ motion module ãèšç·Ž ð¥1:ð ð¡ = àŽ€ ðŒð¡ ð¥1:ð 0 + 1 â àŽ€ ðŒð¡ ð1:ð (æ¡æ£éçš) â = ðŒ ðž â±1:ð ,ð¥0 0,ð,ðŠ,ð0:ð~ð© 0,ðŒ ,ð¡ ð â ðð ð¥1:ð ð¡ , ð¡, ð¥0 0, ð ð ððð(ð), ðð ð¡ð¥ð¡ ðŠ 2 2 ⢠ð·ð ãfinetuneããŠãããŒãåºå Im2FlowïŒFlow Generation Network 7 åæç»å + èšèªæç€º object flow â Grounding DINOã§bboxãååŸ â¡ bboxå ãåäžã«ãµã³ããªã³ã° â±0 â ð 3Ãð»Ãð ð¢, ð£, ð£ðð ðððððð¡ðŠ ð¢, ð£:ç»åå ã®åº§æš ð£ðð ðððððð¡ðŠ:ç©äœã®å¯èŠæ§ (a) ð» ð â±1 â ð 3ÃðÃð»Ãð â±0:ð ïŒæ£è§£ãã㌠ð ïŒåæç»å ðŠ ïŒèšèªæç€º ð¡ ïŒæå» ðžð ïŒSDãšã³ã³ãŒã ð·ð ïŒSDãã³ãŒã àŽ€ ðŒt :ãã€ãºã¹ã±ãžã¥ãŒã© (pre-defined) ð ð ððð/ðð ð¡ð¥ð¡ïŒCLIPãšã³ã³ãŒã (ç»å/èšèª)
(c) 1)State Encoder ð : ð ð¡ = ð(ðð¡ , ð¥0
) ã» (察象ã®äœçœ®ãå§¿å¢ã«é¢ãã) ç¶æ 衚çŸãçæ ã»åç¹ã®åº§æšããšã³ã³ãŒããCLSããŒã¯ã³ã§èŠçŽ 2)Temporal Alignment ð : ð§ð¡ = ð(â±0:ð , ð ð¡ , ðð¡ ) ã»æå»t 以éã®ãããŒã«ã€ããŠã®æœåšè¡šçŸãäºæž¬ ã»ð¿2 lossïŒ Æž ð§ð¡ â ð§ð¡ 2 Æž ð§ð¡ = ð ðð¡:ð 3)Diffusion Action Head : ð(ðð¡ |ð§ð¡ , ð ð¡ , ðð¡ ) æ¡æ£ã¢ãã«ãçšããŠè»éç³»åãçæ Flow2ActïŒ Flow-Conditioned Policy 8 ðð¡ : Nåã® key point ã®æå»tã«ããã ç»åå åº§æš ð¢ð¡ ð, ð£ð¡ ð ð=1 ð ð¥0 : Nåã® key point ã®åæãã¬ãŒã ã«ããã3次å åº§æš â±0:ð : ã¿ã¹ã¯å šäœã® object flow ðð¡ : ããããã® proprioception ð : ãããŒãæœåšè¡šçŸã«ãšã³ã³ãŒã ðð¡ : æå»tããã®è»éç³»å ðð¡ , ⊠, ðð¡+ð¿ (b) Online Point Tracking TAPIR ãçšã㊠key point ã远跡 TAPIR [Doersch+, ICCV23] object flow, çŸåšç»å è»é
å®éšèšå® 9 ⌠4ã€ã®ã¿ã¹ã¯ã§è©äŸ¡ ⌠Pick-and-place ⌠Pouring ⌠Open
drawer ⌠Folding cloth ⌠åŠç¿èšå® ⌠object flow: ⌠H=W=32 ⌠T=32 ⌠åŠç¿æéïŒèšèŒãªã ⌠ããããïŒUR5e â±1 â ð 3ÃðÃð»Ãð ⌠èšç·ŽããŒã¿ ⌠人éåç» âŒ äººéã«ããåã¿ã¹ã¯ã®ã㢠⌠ããŒã¿æ°: èšèŒãªã ⌠ã·ãã¥ã¬ãŒã¿ïŒMuJoCo ⌠ããããïŒUR5e ⌠ããŒã¿æ°: 4800
å®éççµæ@Simulation 10 âº å šãŠã®ã¿ã¹ã¯ã§ä»ææ³ãäžåã Demonstration-conditioned åäžã®äººéã®ãã¢ããobject flowãæœåº Language-conditioned åæãã¬ãŒã ãšèšèªæç€ºãå ¥åãšããã·ã¹ãã å šäœã®è©äŸ¡ Ablation
1)Heuristic å§¿å¢æšå®ã«ããè»éçæ 2)Grid Flow ç»åå šäœãäžæ§ã«ãµã³ããªã³ã° 3)No alignment Temporal Alignmentã䜿çšããªã
å®éççµæ@Real-World 11 âº å šãŠã®ã¿ã¹ã¯ã§ä»ææ³ãäžåã Demonstration-conditioned åäžã®äººéã®ãã¢ããobject flowãæœåº Language-conditioned åæãã¬ãŒã ãšèšèªæç€ºãå ¥åãšããã·ã¹ãã å šäœã®è©äŸ¡ Ablation
1)Heuristic å§¿å¢æšå®ã«ããè»éçæ 2)Grid Flow ç»åå šäœãäžæ§ã«ãµã³ããªã³ã° 3)No alignment Temporal Alignmentã䜿çšããªã
宿§ççµæ 12 ⺠object flow ãç©äœã®è»éãé©åã«è¡šçŸ
宿§ççµæ: Ablation Study ã«ããã倱æäŸ 13 ã¯ããã«åŒãåºããæŒãæ»ã è»éã¯æ£ãããïŒã³ãããå転ããŠãã ãããŒãšå®è»éã«ãããæå»ã®äžæŽå ï äžæ£ç¢ºãªåäœ
ï 誀ã£ãæ¹åãžã®åäœ ï ãžãã¿ãŒãçºç
ãŸãšã 14 âª èæ¯ âª å®æ©ã§ã®ããŒã¿åéã¯é«ã³ã¹ã 容æã«åéå¯èœãªããŒã¿ãããããåŠç¿ã«äœ¿ããã ⪠人éã®åç»ïŒã·ãã¥ã¬ãŒã·ã§ã³ããŒã¿ âª
ææ¡ææ³ïŒIm2Flow2Act ⪠object flow ãåªä»ã«ããè»éçæãã¬ãŒã ã¯ãŒã¯ ãšã³ããã£ã¡ã³ããç°å¢ã«äŸããªãåäœè¡šçŸ âª çµæ ⪠ããããã®å®æ©ããŒã¿ã䜿çšããã«ç©äœæäœå¯èœ ⪠ã·ãã¥ã¬ãŒã·ã§ã³ã»å®æ©å®éšã«ãããŠããŒã¹ã©ã€ã³ãäžåã