Slide 22
Slide 22 text
22
Multi-modal learning is a model to learn from multiple data source(text, image, voice, etc.).
It is expected to high accuracy than model which learn from single source
Concept
Text
Voice
Image
Multi-modal
learning
Increase accuracy of fraud item detection by using
multimodal model : image, product name, description and
price.
EC
Robotics
Develop ASVR(Audio-Visual Speech Recognition), which
has high noise-robust with combination of sound and video
signals,
Use Cases
*Waseda University, Ogata tetsuya (https://pdf.gakkai-web.net/gakkai/ieice/icd/html/2017/view/I_01_02.pdf)
*Mercari, Engineering Blog “https://tech.mercari.com/entry/2018/04/24/164919”,
Text
Multi-
modal
source
Single
source
Voice
Image
(Video)
+
Honda Research Institute
Mercari
Image Text
(Product name, Description etc. )
+