Our pipeline consist of three components • Synthesizer Network: generate composite image • Target Network: classify/detect foreground object in composite image • Discriminator: identify whether composite image is real or not
is decoupled from training the target classifier → synthetic data has little value in improving performance of target network • Our approach • Synthesizer and target networks are trained in an adversarial manner → synthesizer produces meaningful training samples
Output: transformation function() • Restrict A to set of 2D affine transformation in this paper • Composite synthetic image: = ۩ () • ۩: alpha blending • Sptial transformer network create by , ,
extraction on and • Foreground/Background branch • Identical mid-level feature extraction on and • FC Regression Network: • Concatnate mid-level feature of and • Outputs affine transformation parameter
network to learn more efficiently • Synthesizer need to produce realistic composite image • Binary classification • Input(composite image, real images) • Loss function : Ε log + Ε log 1 −
parameter of while keeping parameters of , fixed • Update parameter of , while keeping parameter of fixed : Synthesizer network, : Target network, : Discriminator
affine transform • Red line: train model with MNIST, then finetune with AffNIST data • Green line: train model with MNIST, then finetune with Synthetic data • Synthesis data • foreground:MNIST digits • Background: black background