.75FD"OPNBMZ%FUFDUJPO%BUBTFU .75FD"% aset of rial ac- ct-free mages dust or le kind evalu- fferent cal in- a 2007 provide es with h class ng and nnota- nerated arance ore, ar- pproxi- aly de- gested give a ct our- meth- initial Figure 2: Example images for all five textures and ten ob- ject categories of the MVTec AD dataset. For each cate- 2.1.2 Segmentation of Anomalous Regions For the evaluation of methods that segment anomalies in images, only very few public datasets are currently avail- able. All of them focus on the inspection of textured sur- faces and, to the best of our knowledge, there does not yet exist a comprehensive dataset that allows for the segmenta- tion of anomalous regions in natural images. Carrera et al. [6] provide NanoTWICE,2 a dataset of 45 gray-scale images that show a nanofibrous material ac- quired by a scanning electron microscope. Five defect-free images can be used for training. The remaining 40 images contain anomalous regions in the form of specks of dust or flattened areas. Since the dataset only provides a single kind of texture, it is unclear how well algorithms that are evalu- ated on this dataset generalize to other textures of different domains. A dataset that is specifically designed for optical in- spection of textured surfaces was proposed during a 2007 DAGM workshop by Wieler and Hahn [28]. They provide ten classes of artificially generated gray-scale textures with defects weakly annotated in the form of ellipses. Each class comprises 1000 defect-free texture patches for training and 150 defective patches for testing. However, their annota- tions are quite coarse and since the textures were generated by very similar texture models, the variance in appearance between the different textures is quite low. Furthermore, ar- tificially generated datasets can only be seen as an approxi- .75FDυΠπɾϛϡϯϔϯͷը૾ॲཧιϑτΣΞ։ൃձࣾ IUUQTXXXNWUFDDPNDPNQBOZSFTFBSDIEBUBTFUT ຕҎ্ͷҟৗݕ༻ը૾σʔληοτΛެ։ छྨͷҟͳΔମͱςΫενϟͷਖ਼ৗҟৗσʔλΛؚΉ ղઆจίϯϐϡʔλϏδϣϯͷτοϓձٞ$713Ͱ࠾
.75FD"OPNBMZ%FUFDUJPO%BUBTFU .75FD"% Category AE (SSIM) AE (L2) AnoGAN CNN Feature Dictionary Texture Inspection Variation Model Textures Carpet 0.43 0.90 0.57 0.42 0.82 0.16 0.89 0.36 0.57 0.61 - Grid 0.38 1.00 0.57 0.98 0.90 0.12 0.57 0.33 1.00 0.05 - Leather 0.00 0.92 0.06 0.82 0.91 0.12 0.63 0.71 0.00 0.99 - Tile 1.00 0.04 1.00 0.54 0.97 0.05 0.97 0.44 1.00 0.43 - Wood 0.84 0.82 1.00 0.47 0.89 0.47 0.79 0.88 0.42 1.00 - Objects Bottle 0.85 0.90 0.70 0.89 0.95 0.43 1.00 0.06 - 1.00 0.13 Cable 0.74 0.48 0.93 0.18 0.98 0.07 0.97 0.24 - - Capsule 0.78 0.43 1.00 0.24 0.96 0.20 0.78 0.03 - 1.00 0.03 Hazelnut 1.00 0.07 0.93 0.84 0.83 0.16 0.90 0.07 - - Metal nut 1.00 0.08 0.68 0.77 0.86 0.13 0.55 0.74 - 0.32 0.83 Pill 0.92 0.28 1.00 0.23 1.00 0.24 0.85 0.06 - 1.00 0.13 Screw 0.95 0.06 0.98 0.39 0.41 0.28 0.73 0.13 - 1.00 0.10 Toothbrush 0.75 0.73 1.00 0.97 1.00 0.13 1.00 0.03 - 1.00 0.60 Transistor 1.00 0.03 0.97 0.45 0.98 0.35 1.00 0.15 - - Zipper 1.00 0.60 0.97 0.63 0.78 0.40 0.78 0.29 - - Table 2: Results of the evaluated methods when ap- plied to the classification of anomalous images. For each dataset category, the ratio of correctly classified samples of anomaly-free (top row) and anomalous images (bottom ment is not possible for every ob strict the evaluation of this meth (Table 2). We use 30 randomly se each object category in its origin and variance parameters at each p are converted to gray-scale before Anomaly maps are obtained b of each test pixel’s gray value to relative to its predicted standard GMM-based texture inspection, w plementation of the HALCON ma 4.2. Data Augmentation Since the evaluated methods ba are typically trained on large data performed for these methods for b For the texture images, we random lar patches of fixed size from the object category, we apply a random Additional mirroring is applied w We augment each category to crea 4.3. Evaluation Metric Each of the evaluated method spatial map in which large value Regions egment anomalies in ts are currently avail- ction of textured sur- ge, there does not yet ows for the segmenta- mages. WICE,2 a dataset of nofibrous material ac- cope. Five defect-free remaining 40 images m of specks of dust or provides a single kind rithms that are evalu- r textures of different igned for optical in- oposed during a 2007 n [28]. They provide ay-scale textures with of ellipses. Each class tches for training and owever, their annota- xtures were generated ariance in appearance low. Furthermore, ar- be seen as an approxi- pervised anomaly de- s have been suggested ntel et al. [20] give a ork. We restrict our- state-of-the art meth- baseline for our initial works Figure 2: Example images for all five textures and ten ob- ject categories of the MVTec AD dataset. For each cate- gory, the top row shows an anomaly-free image. The middle શମతʹΦʔτΤϯίʔμʔΛ༻͍ͨϞσϧ͕ڧ͍ ্ஈਖ਼ৗσʔλͷਖ਼ղ Լஈҟৗσʔλͷਖ਼ղ
"& - ͱ"& 44*. ଛࣦؔඞͣࣗޡࠩͰͳ͍͚ͯ͘ͳ͍Θ͚Ͱͳ͍ -ޡࠩ ࣗޡࠩ 44*. 4USVDUVBM4*.JMBSJUZ JOEFY<> Dosovitskiy and Brox (2016). It increases the quality of the produced reconstructions by extracting features from both the input image x and its reconstruction ˆ x and enforc- ing them to be equal. Consider F : Rk⇥h⇥w ! Rf to be a feature extractor that obtains an f-dimensional feature vector from an input image. Then, a regularizer can be added to the loss function of the autoencoder, yielding the feature matching autoencoder (FM-AE) loss LFM(x, ˆ x) = L2(x, ˆ x) + kF(x) F(ˆ x)k2 2 , (3) where > 0 denotes the weighting factor between the two loss terms. F can be parameterized using the first layers of a CNN pretrained on an image classification task. During evaluation, a residual map RFM is obtained by comparing the per-pixel `2-distance of x and ˆ x. The hope is that sharper, more realistic reconstructions will lead to better residual maps compared to a standard `2-autoencoder. 3.1.4. SSIM Autoencoder. We show that employing more elaborate architectures such as VAEs or FM-AEs does not yield satisfactory improvements of the residial maps over deterministic `2-autoencoders in the unsupervised defect segmentation task. They are all based on per-pixel evaluation metrics that assume an unrealistic indepen- dence between neighboring pixels. Therefore, they fail to detect structural differences between the inputs and their l(p, q) = 2µpµq + c1 µ2 p + µ2 q + c1 (5) c(p, q) = 2 p q + c2 2 p + 2 q + c2 (6) s(p, q) = 2 pq + c2 2 p q + c2 . (7) The constants c1 and c2 ensure numerical stability and are typically set to c1 = 0.01 and c2 = 0.03. By substituting (5)-(7) into (4), the SSIM is given by SSIM(p, q) = (2µpµq + c1)(2 pq + c2) (µ2 p + µ2 q + c1)( 2 p + 2 q + c2) . (8) It holds that SSIM(p, q) 2 [ 1, 1]. In particular, SSIM(p, q) = 1 if and only if p and q are identical (Wang et al., 2004). Figure 2 shows the different percep- tions of the three similarity functions that form the SSIM index. Each of the patch pairs p and q has a constant `2- residual of 0.25 per pixel and hence assigns low defect scores to each of the three cases. SSIM on the other hand is sensitive to variations in the patches’ mean, variance, and covariance in its respective residual map and assigns low similarity to each of the patch pairs in one of the comparison functions. training them purely on defect-free image data. During testing, the autoencoder will fail to reconstruct defects that have not been observed during training, which can thus be segmented by comparing the original input to the recon- struction and computing a residual map R(x, ˆ x) 2 Rw⇥h. 3.1.1. `2-Autoencoder. To force the autoencoder to recon- struct its input, a loss function must be defined that guides it towards this behavior. For simplicity and computational speed, one often chooses a per-pixel error measure, such as the L2 loss L2(x, ˆ x) = h 1 X r=0 w 1 X c=0 (x(r, c) ˆ x(r, c))2 , (2) where x(r, c) denotes the intensity value of image x at the pixel (r, c). To obtain a residual map R`2 (x, ˆ x) during evaluation, the per-pixel `2-distance of x and ˆ x is com- puted. 3.1.2. Variational Autoencoder. Various extensions to the deterministic autoencoder framework exist. VAEs (Kingma and Welling, 2014) impose constraints on the latent variables to follow a certain distribution z ⇠ P(z). For simplicity, the distribution is typically chosen to be ਓ͕ؒײ͡Δҧ͍ ɹɾըૉ ً ͷมԽ ɹɾίϯτϥετͷมԽ ɹɾߏͷมԽ ͕ූ߸ԽલޙͰͲΕ͘Β͍มԽ͔ͨ͠Λද͢ࢦඪ ʢը૾Λখ͍͞8JOEPXʹ͚ͯܭࢉʣ IUUQWJTVBMJ[FIBUFOBCMPHDPNFOUSZ IUUQTEGUBMLKQ Q
"OP("/<> To be published in the proceedings of IPMI 2017 Fig. 2. (a) Deep convolutional generative adversarial network. (b) t-SNE embedding of normal (blue) and anomalous (red) images on the feature representation of the last convolution layer (orange in (a)) of the discriminator. 2.1 Unsupervised Manifold Learning of Normal Anatomical Variability ("/Λਖ਼ৗσʔλͰ܇࿅͠ɺਖ਼ৗσʔλͷੜϞσϧʢ֬ʣΛߏங IUUQTBSYJWPSHBCT
$//'FBUVSF%JDUJPOBSZ<> and outputs the best k clusters and corresponding k centroids. A centroid is the mean position of all the elements of the cluster. For each cluster, we take the feature vector that is nearest to its centroid. The set of these k feature vectors compose the dictionary W. Figure 4 shows the pipeline for dictionary building. Figure 5 shows examples of dictionaries learned from images of the training set (anomaly free) with different patch sizes and number of clusters. The figure shows the subregions corresponding to each feature vector of the dictionary W. Figure 4. Examples of dictionary achieved considering different patch sizes and different number of subregions. [email protected][FY TUSJEF 3FT/FU EJNBWHQPPMMBZFS ใྔ LNFBOT ʢΫϥελத৺ʹҰ൪͍ۙ ϕΫτϧΛબʣ ಛྔϕΫτϧͷσΟΫγϣφϦͷதͰޓ͍ͷڑ͕͋ΔᮢΑΓԕ͍ϕΫτϧΛҟৗྖҬͱ͢Δ ࣍ݩݮΛߦͳ͍ͬͯΔ෦ IUUQTXXXODCJOMNOJIHPWQVCNFE
.75FD"OPNBMZ%FUFDUJPO%BUBTFU .75FD"% Category AE (SSIM) AE (L2) AnoGAN CNN Feature Dictionary Texture Inspection Variation Model Textures Carpet 0.43 0.90 0.57 0.42 0.82 0.16 0.89 0.36 0.57 0.61 - Grid 0.38 1.00 0.57 0.98 0.90 0.12 0.57 0.33 1.00 0.05 - Leather 0.00 0.92 0.06 0.82 0.91 0.12 0.63 0.71 0.00 0.99 - Tile 1.00 0.04 1.00 0.54 0.97 0.05 0.97 0.44 1.00 0.43 - Wood 0.84 0.82 1.00 0.47 0.89 0.47 0.79 0.88 0.42 1.00 - Objects Bottle 0.85 0.90 0.70 0.89 0.95 0.43 1.00 0.06 - 1.00 0.13 Cable 0.74 0.48 0.93 0.18 0.98 0.07 0.97 0.24 - - Capsule 0.78 0.43 1.00 0.24 0.96 0.20 0.78 0.03 - 1.00 0.03 Hazelnut 1.00 0.07 0.93 0.84 0.83 0.16 0.90 0.07 - - Metal nut 1.00 0.08 0.68 0.77 0.86 0.13 0.55 0.74 - 0.32 0.83 Pill 0.92 0.28 1.00 0.23 1.00 0.24 0.85 0.06 - 1.00 0.13 Screw 0.95 0.06 0.98 0.39 0.41 0.28 0.73 0.13 - 1.00 0.10 Toothbrush 0.75 0.73 1.00 0.97 1.00 0.13 1.00 0.03 - 1.00 0.60 Transistor 1.00 0.03 0.97 0.45 0.98 0.35 1.00 0.15 - - Zipper 1.00 0.60 0.97 0.63 0.78 0.40 0.78 0.29 - - Table 2: Results of the evaluated methods when ap- plied to the classification of anomalous images. For each dataset category, the ratio of correctly classified samples of anomaly-free (top row) and anomalous images (bottom ment is not possible for every ob strict the evaluation of this meth (Table 2). We use 30 randomly se each object category in its origin and variance parameters at each p are converted to gray-scale before Anomaly maps are obtained b of each test pixel’s gray value to relative to its predicted standard GMM-based texture inspection, w plementation of the HALCON ma 4.2. Data Augmentation Since the evaluated methods ba are typically trained on large data performed for these methods for b For the texture images, we random lar patches of fixed size from the object category, we apply a random Additional mirroring is applied w We augment each category to crea 4.3. Evaluation Metric Each of the evaluated method spatial map in which large value Regions egment anomalies in ts are currently avail- ction of textured sur- ge, there does not yet ows for the segmenta- mages. WICE,2 a dataset of nofibrous material ac- cope. Five defect-free remaining 40 images m of specks of dust or provides a single kind rithms that are evalu- r textures of different igned for optical in- oposed during a 2007 n [28]. They provide ay-scale textures with of ellipses. Each class tches for training and owever, their annota- xtures were generated ariance in appearance low. Furthermore, ar- be seen as an approxi- pervised anomaly de- s have been suggested ntel et al. [20] give a ork. We restrict our- state-of-the art meth- baseline for our initial works Figure 2: Example images for all five textures and ten ob- ject categories of the MVTec AD dataset. For each cate- gory, the top row shows an anomaly-free image. The middle શମతʹΦʔτΤϯίʔμʔΛ༻͍ͨϞσϧ͕ڧ͍ ্ஈਖ਼ৗσʔλͷਖ਼ղ Լஈҟৗσʔλͷਖ਼ղ