Object detector: Faster R-CNN pre-trained on Visual Genome • Number of objects: 36 (=m) • N_{R, L, X}: {5, 9, 5} • Optimizer: Adam (learning rate: 1e-4) • Batch size: 256 • Epochs: 10 (QA), 20 (otherwise) • Fine tuning ◦ Learning rate: 1e-5, 5e-5 ◦ Batch size: 32 ◦ Epochs: 4