Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[読み会]Not All Tokens Are Equal: Human-centric Vi...

Avatar for Kei Moriyama Kei Moriyama
January 08, 2024
43

[読み会]Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

Avatar for Kei Moriyama

Kei Moriyama

January 08, 2024
Tweet

Transcript

  1. ఏҊख๏1ɿCTM Blockʹ͓͚ΔΫϥελϦϯά Density peaks 用 ρi δi ρi = exp

    − 1 k ∑ xj ∈KNN(xi ) ||xi − xj ||2 2 xi δi = { minj:ρj >ρi ||xi − xj || 2 if ∃j s.t. ρj > ρi maxj ||xi − xj || 2 otherwise 大 ρi ρj 大 ρi
  2. ఏҊख๏1ɿCTM Blockʹ͓͚ΔΫϥελϦϯά 大 心 高 ρi × δi ρi ×

    δi ρi = exp − 1 k ∑ xj ∈KNN(xi ) ||xi − xj ||2 2 xi δi = { minj:ρj >ρi ||xi − xj || 2 if ∃j s.t. ρj > ρi maxj ||xi − xj || 2 otherwise 大 ρi ρj 大 ρi
  3. ఏҊख๏1ɿCTM Blockʹ͓͚Δಛ௃ྔͷ݁߹ yi = ∑ j∈Ci epjxj ∑ j∈Ci epj

    pj Ci yi Query Attention Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynamicvit: E ff i cient vision transformers with dynamic token sparsi fi cation. Adv. Neu- ral Inform. Process. Syst., 2 0 21 .