• ⼈⼯知能による画像の領域分割
• 各ピクセルが何の物体に属しているかを推定する
FCN-8s SDS [14] Ground Truth Image
Fig. 6. Fully convolutional networks improve performance on PASCAL.
The left column shows the output of our most accurate net, FCN-8s. The
The role of foreground, background, and shape cues. All scores are the
mean intersection over union metric excluding background. The
architecture and optimization are fixed to those of FCN-32s (Reference)
and only input masking differs.
train test
Reference keep keep keep keep 84.8
Reference-FG keep keep keep mask 81.0
Reference-BG keep keep mask keep 19.8
FG-only keep mask keep mask 76.1
BG-only mask keep mask keep 37.8
Shape mask mask mask mask 29.1
Masking the foreground at inference time is catastrophic.
However, masking the foreground during learning yields
a network capable of recognizing object segments without
observing a single pixel of the labeled class. Masking the
background has little effect overall but does lead to class
confusion in certain cases. When the background is masked
during both learning and inference, the network unsurpris-
ingly achieves nearly perfect background accuracy; however
certain classes are more confused. All-in-all this suggests
that FCNs do incorporate context even though decisions are
driven by foreground pixels.
To separate the contribution of shape, we learn a net
restricted to the simple input of foreground/background
masks. The accuracy in this shape-only condition is lower
Shelhamer et al. 2016
Fig. 4. Result on the ISBI cell tracking challenge. (a) part of an input image of the
“PhC-U373” data set. (b) Segmentation result (cyan mask) with manual ground truth
(yellow border) (c) input image of the “DIC-HeLa” data set. (d) Segmentation result
(random colored masks) with manual ground truth (yellow border).
Table 2. Segmentation results (IOU) on the ISBI cell tracking challenge 2015.
Name PhC-U373 DIC-HeLa
IMCB-SG (2014) 0.2669 0.2935
KTH-SE (2014) 0.7953 0.4607
HOUS-US (2014) 0.5323 -
Ronneberger et al. 2015
Body Part Lungs Clavicles Heart
Evaluation Metric D J D J D J
Human Observer [5] - 0.946 - 0.896 - 0.878
ASM Tuned [5] (*) - 0.927 - 0.734 - 0.814
Hybrid Voting [5] (*) - 0.949 - 0.736 - 0.860
Ibragimov et al. [9] - 0.953 - - - -
Seghers et al. [11] - 0.951 - - - -
InvertedNet with ELU 0.974 0.950 0.929 0.868 0.937 0.882
TABLE VI: Our best architecture compared with state-of-the-art methods; (*) single-class algorithms trained and evaluated for
different organs separately; ”-” the score was not reported
Fig. 7: Segmentation results and corresponding Jaccard scores on some images for U-Net (top row) and proposed InvertedNet
with ELUs (bottom row). The contour of the ground-truth is shown in green, segmentation result of the algorithm in red and
overlap of two contours in yellow.
Novikov et al. 2018