Truth Image Fig. 6. Fully convolutional networks improve performance on PASCAL. The left column shows the output of our most accurate net, FCN-8s. The TABLE 8 The role of foreground, background, and shape cues. All scores are the mean intersection over union metric excluding background. The architecture and optimization are fixed to those of FCN-32s (Reference) and only input masking differs. train test FG BG FG BG mean IU Reference keep keep keep keep 84.8 Reference-FG keep keep keep mask 81.0 Reference-BG keep keep mask keep 19.8 FG-only keep mask keep mask 76.1 BG-only mask keep mask keep 37.8 Shape mask mask mask mask 29.1 Masking the foreground at inference time is catastrophic. However, masking the foreground during learning yields a network capable of recognizing object segments without observing a single pixel of the labeled class. Masking the background has little effect overall but does lead to class confusion in certain cases. When the background is masked during both learning and inference, the network unsurpris- ingly achieves nearly perfect background accuracy; however certain classes are more confused. All-in-all this suggests that FCNs do incorporate context even though decisions are driven by foreground pixels. To separate the contribution of shape, we learn a net restricted to the simple input of foreground/background masks. The accuracy in this shape-only condition is lower Shelhamer et al. 2016 7 a b c d Fig. 4. Result on the ISBI cell tracking challenge. (a) part of an input image of the “PhC-U373” data set. (b) Segmentation result (cyan mask) with manual ground truth (yellow border) (c) input image of the “DIC-HeLa” data set. (d) Segmentation result (random colored masks) with manual ground truth (yellow border). Table 2. Segmentation results (IOU) on the ISBI cell tracking challenge 2015. Name PhC-U373 DIC-HeLa IMCB-SG (2014) 0.2669 0.2935 KTH-SE (2014) 0.7953 0.4607 HOUS-US (2014) 0.5323 - Ronneberger et al. 2015 10 Body Part Lungs Clavicles Heart Evaluation Metric D J D J D J Human Observer [5] - 0.946 - 0.896 - 0.878 ASM Tuned [5] (*) - 0.927 - 0.734 - 0.814 Hybrid Voting [5] (*) - 0.949 - 0.736 - 0.860 Ibragimov et al. [9] - 0.953 - - - - Seghers et al. [11] - 0.951 - - - - InvertedNet with ELU 0.974 0.950 0.929 0.868 0.937 0.882 TABLE VI: Our best architecture compared with state-of-the-art methods; (*) single-class algorithms trained and evaluated for different organs separately; ”-” the score was not reported Fig. 7: Segmentation results and corresponding Jaccard scores on some images for U-Net (top row) and proposed InvertedNet with ELUs (bottom row). The contour of the ground-truth is shown in green, segmentation result of the algorithm in red and overlap of two contours in yellow. Novikov et al. 2018