et al., 2016; Calixto et al., 2016; Libovický and Helcl, 2017; Helcl et al., 2018] • #!(")% • Cross-modal interactions with spatially-unaware global features [Calixto and Liu, 2017; Ma et al., 2017; Caglayan et al., 2017a; Madhyastha et al., 2017] • $!(")% • The integration of regional features from object detection networks [Huang et al., 2016; Grönroos et al., 2018] • 4 5/16/19