Clinical-grade computational pathology using weakly supervised
deep learning on whole slide images
https://www.nature.com/articles/s41591-019-0508-1
(Published: 15 July 2019)
Gabriele Campanella1,2, Matthew G. Hanna1, … & Thomas J. Fuchs1,2
1Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
2Weill Cornell Graduate School of Medical Sciences, New York, NY, USA.
LPixel Inc. Presents
Image Analysis x Machine Learning #6
2th Aug. 2019 @LPixel Inc.
発表者 Tu-chan
ARTICLES
https://doi.org/10.1038/s41591-019-0508-1
Clinical-grade computational pathology using
weakly supervised deep learning on whole
slide images
Gabriele Campanella1,2, Matthew G. Hanna1, Luke Geneslaw1, Allen Miraflor1,
Vitor Werneck Krauss Silva1, Klaus J. Busam1, Edi Brogi1, Victor E. Reuter1, David S. Klimstra1
and Thomas J. Fuchs 1,2*
The development of decision support systems for pathology and their deployment in clinical practice have been hindered by
the need for large manually annotated datasets. To overcome this problem, we present a multiple instance learning-based deep
learning system that uses only the reported diagnoses as labels for training, thereby avoiding expensive and time-consuming
pixel-wise manual annotations. We evaluated this framework at scale on a dataset of 44,732 whole slide images from 15,187
patients without any form of data curation. Tests on prostate cancer, basal cell carcinoma and breast cancer metastases to
axillary lymph nodes resulted in areas under the curve above 0.98 for all cancer types. Its clinical application would allow
pathologists to exclude 65–75% of slides while retaining 100% sensitivity. Our results show that this system has the ability
to train accurate classification models at unprecedented scale, laying the foundation for the deployment of computational
decision support systems in clinical practice.
§ 機械を⽤いた病理診断サポートを臨床現場にデプロイできない最⼤の理由は,病理医が提
供する教師データの不⾜.病理のアノテーション作業はとにかく⼤変なのです.
§ 15,187⼈の患者の44,732枚のwhole-slide image (WSI)を⽤いて,診断結果だけをラベルとした
weakly supervised learning (multiple instance learning: MIL)⼿法でCNN&RNNモデルを構築
§ 異なるタイプの癌(前⽴腺癌,⽪膚基底細胞癌,乳癌リンパ節転移)の診断において,
AUC 0.98を達成,病理医の診断の⾒落としをなくし,業務負担を約7割削減できた
論⽂の要旨
!既存のWSIデータ
!既存の診断レポート
!病理医のアノテーションなし
効率的な診断業務を実現
患者に病名を伝える臨床医(はなやか) 診断業務に追われる病理医(ブラック)
59,023 x 36,364 x3 = 6.0GB
461 x 284 = 130,924 tiles
Whole-slide image (WSI)
59,023px
36,364px
ect on the retrieval performance and its optimum value is
ced not only by the physical storage medium but also by
wer display resolution. While a tiled organization facilitates
ning processes, the zooming process on the other hand is
hallenge. At lower resolutions, larger areas of the image
e accessed in order to display the requested region, as
n Fig. 1(middle). In the extreme case, a thumbnail requires
re image data to be processed. This would probably be a
ensive task.
der to optimize zooming, a lower resolution version of the
ge can be pre-calculated and stored alongside the full res-
image. Thus, the typical image pyramid organization arises,
wn in Fig. 1(right). According to this scheme, the WSI con-
multiple images at different magnifications where the pyra-
vides distinct zoom level. The base of the pyramid contains
hest resolution, while the top contains the lowest resolution
typically a thumbnail. The thumbnail is a very low-
on version of the image, making it easy to see the entire
One or more levels may be created, at intermediate resolu-
o facilitate the loading of arbitrary magnification levels.
ramid level follows the tiled organization described above.
trategies are valid for open and proprietary image formats.
OM Workgroup 26 also adopted the image pyramid concept
oduced it in the standard. The DICOM Supplement 145 [19]
dvantage of the existing multi-frame objects where each
olution image is stored in a separate multi-frame object,
he individual tiles are stored as separated frames. Each res-
level is assigned a different DICOM series. The standard
pports 3D microscopy, a technique that acquires multiple
of the slide at different depths [20]. Therefore, each resolu-
el may contain several Z-planes, which represent acquisi-
different focal points. The Z-planes can be stored as
e objects within the DICOM series or, alternatively, in the
bject with the corresponding magnification level.
However, SOAP is considered a heavyweight technology for
web-services, because the verbose messages make them difficult
to include in lightweight applications, such as websites. Therefore,
the community demanded the inclusion of simpler services based
on the Restful technology. Moreover, web services for searching
and storage were also requested. So in 2014, the standard intro-
duced three REST web services: STOW-RS,1 QIDO-RS,2 and
WADO-RS.3
Besides the traditional DICOM Services and web-services,
DICOM also contemplates storage and streaming of JPEG2000
images. Currently, JPEG2000 is the most advanced format for gen-
eral purpose images. The standard defines a lossless compression
algorithm, as well as a protocol for ‘‘interacting with JPEG2000
based images in an efficient and effective manner” [23]. In fact,
the JPEG2000 standard has a special focus towards image interac-
tivity, supporting interesting features such as resolution scalability,
progressive refinement and spatial randomness [23,24]. The JPIP
protocol allows viewer application to interact with JPEG2000
images over networks. By using JPIP, viewer applications do not
have to download the whole image, and can instead request from
the JPIP server a particular region that best fits their visualization
purposes [23,25]. This image streaming strategy allows the storage
and consistency of image data to be optimized, as well as reducing
the required bandwidth to support visualization of remote images
[25].
The DICOM Standard supports both JPEG2000 images and the
JPIP protocol. Supplement 61 [26] introduced the JPEG2000 format
into regular DICOM images. The JPEG2000 pixel data is stored in
the DICOM image pixel data attribute, just like any other compres-
sion format such as JPEG Baseline or JPEG-LS. As such, this supple-
ment introduced two new transfer syntaxes: 1.2.840.10008.1.2.4.90,
and 1.2.840.10008.1.2.4.91. The support for JPIP was introduced by
supplement 106 [27]. In this case, the image pixel data is replaced
by the JPIP server URL. As a result, the viewer application can
request the image directly from the JPIP server, without mediation
from the PACS Archive. This behavior is supported by the transfer
Fig. 1. Single frame (left) format vs Tiled format (middle), and image pyramid example (right).
T. Marques Godinho et al. / Journal of Biomedical Informatics 71 (2017) 190–197
Journal of Biomedical Informatics 71 (2017) 190–197
ガラススライド 7.5x2.5cm
ARTICLES
E MEDICINE
a
Dataset Years Slides Patients Positive slides External slides ImageNet
Prostate in house 2016 12,132 836 2,402 0 19.8×
Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0×
Skin 2016–2017 9,962 5,325 1,659 3,710 21.4×
Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2×
Total 44,732 15,187 88.4×
63,744 px / 31.9 mm
28,649 px / 14.3 mm
3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm
b
癌の腺管
1%以下
positiveラベルのbag
negativeラベルのbag
Multiple instance learning
positiveラベルのbagのタイルは少なくとも⼀枚は癌タイル,
negative bagのタイルは全て癌でない
ARTICLES
RE MEDICINE
...
...
Slide tiling
Clinically relevant
dataset
Tile probability Top tiles Slide
targets
1
1
...
0
0
...
Learning
Inference
Classifier
CNN
a
Ranked tiles
d
RNN aggregation
Diagnosis
2
4
5
8
9 1 2 3 S
1
2
3
Tumor
Trained
MIL
model
...
Dataset Years Slides Patients Positive slides External slides ImageNet
Prostate in house 2016 12,132 836 2,402 0 19.8×
Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0×
Skin 2016–2017 9,962 5,325 1,659 3,710 21.4×
Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2×
Total 44,732 15,187 88.4×
63,744 px / 31.9 mm
28,649 px / 14.3 mm
3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm
b
c
データは操作せず全WSIをそのまま使⽤(ペンのマーキングが多すぎて使えなかった10枚を除く)
Augmentation,⾊の標準化もなし.
使⽤したデータセット
ARTICLES
NATURE MEDICINE
...
...
Slide tiling
Clinically relevant
dataset
Tile probability Top tiles Slide
targets
1
1
...
0
0
...
Learning
Inference
Classifier
CNN
a
Ranked tiles
d
MIL feature representation
RNN aggregation
Diagnosis
1
2
3
4
5
6
7
8
9 1 2 3 S
S
1
2
3
S
...
Tumor
probability
0
1.0
0.5
Trained
MIL
model
...
Dataset Years Slides Patients Positive slides External slides ImageNet
Prostate in house 2016 12,132 836 2,402 0 19.8×
Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0×
Skin 2016–2017 9,962 5,325 1,659 3,710 21.4×
Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2×
Total 44,732 15,187 88.4×
63,744 px / 31.9 mm
28,649 px / 14.3 mm
3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm
b
c
Fig. 1 | Overview of the data and proposed deep learning framework presented in this study. a, Description of the datasets. This study is based on a total
Extended Data Fig. 7 | Example of a slide tiled on a grid with no overlap at different magnifications. A slide represents a bag, and th
instances in that bag. In this work, instances at different magnifications are not part of the same bag. mpp, microns per pixel.
Tileのoverlapなし
50% overlap
67% overlap
Tile size 224x224
テスト時には全ての倍率で80%overlapさせた
異なる倍率ごとにTileを⽣成し,
multiscale ensemble
ARTICLES NATURE MEDICINE
−20
−10
0
10
20
−20 −10 0 10 20
t-SNE1
t-SNE2
0.25
0.50
0.75
1.00
Tumor probability
100
200
Count
0
0.1
0.2
0.3
0.4
0.5
102 103 104
Number of training WSIs
Minimum balanced validation error
Benign
Malignant
Malignant Suspicious
a b
c
56 µm
112 µm
Fig. 2 | Dataset size impact and model introspection. a, Dataset size plays an important role in achieving clinical-grade MIL classification performance.
Training of ResNet34 was performed with datasets of increasing size; for every reported training set size, five models were trained, and the validation
errors are reported as box plots (n=5). This experiment underlies the fact that a large number of slides are necessary for generalization of learning
under the MIL assumption. b,c, The prostate model has learned a rich feature representation of histopathology tiles. b, A ResNet34 model trained at 20×
ARTICLES
NATURE MEDICINE
data, but its performance was not better than that achieved by the
single-scale model trained at 20×.
Pathology expert analysis of the MIL-RNN error modes.
Pathologists specialized in each discipline analyzed the test set
errors made by MIL-RNN models trained at 20× magnification
(a selection of cases is presented in Fig. 4a–c). Several discrepan-
cies (six in prostate, eight in BCC and 23 in axillary lymph nodes;
see Fig. 4d) were found between the reported case diagnosis
and the true slide class (that is, presence/absence of tumor).
Because the ground truth is reliant on the diagnosis reported in
the LIS, the observed discrepancies can be due to several factors:
(1) under the current WSI scanning protocol, as only select slides
are scanned in each case, there exists the possibility of a mismatch
between the slide scanned and the reported LIS diagnosis linked to
each case; (2) a deeper slide level with no carcinoma present could
be selected for scanning; and (3) tissue was removed to create tis-
sue microarrays before slide scanning. Encouragingly, the training
In addition, two false positives were corrected to true positives.
False negative to true negative corrections were due to the tissue of
interest not being present on a deeper hematoxylin and eosin slide,
or sampling error at the time the frozen section was prepared. False
positive to true positive corrections were due to soft tissue meta-
static deposits or tumor emboli. The AUC improved from 0.965 to
0.989 given these corrections. Of the 23 false negatives, eight were
macro-metastasis, 13 were micro-metastasis and two were isolated
tumor cells (ITCs). Notably, 12 cases (four false negatives and eight
false positives) showed signs of treatment effect from neoadjuvant
chemotherapy.
Investigation of technical variability introduced by slide prepa-
ration at multiple institutions and different scanners. Several
sources of variability come into play in computational pathology. In
addition to all of the morphological variability, technical variability
is introduced during glass slide preparation and scanning. How this
variability can affect the prediction of an assistive model is a ques-
0
0.25
0.50
0.75
1.00
0
0.25
0.50
0.75
1.00
Specificity
Sensitivity
0
0.25
0.50
0.75
1.00
0
0.25
0.50
0.75
1.00
Specificity
Sensitivity
0.80
0.85
0.90
0.95
1.00
0.80
0.85
0.90
0.95
1.00
Model
(P = 0.00023)
MIL (AUC: 0.986)
MIL-RNN (AUC: 0.991)
0
0.25
0.50
0.75
1.00
0
0.25
0.50
0.75
1.00
Specificity
Sensitivity
0.80
0.85
0.90
0.95
1.00
0.80
0.85
0.90
0.95
1.00
Model
(P = 0.9)
MIL (AUC: 0.965)
MIL-RNN (AUC: 0.966)
0.80
0.85
0.90
0.95
1.00
0.80
0.85
0.90
0.95
1.00
Model
(P = 0.1)
MIL (AUC: 0.986)
MIL-RNN (AUC: 0.988)
a b c
Fig. 3 | Weakly supervised models achieve high performance across all tissue types. The performances of the models trained at 20× magnification on
the respective test datasets were measured in terms of AUC for each tumor type. a, For prostate cancer (n=1,784) the MIL-RNN model significantly
(P<0.001) outperformed the model trained with MIL alone, resulting in an AUC of 0.991. b,c, The BCC model (n=1,575) performed at 0.988 (b), while
breast metastases detection (n=1,473) achieved an AUC of 0.966 (c). For these latter datasets, adding an RNN did not significantly improve performance.
Statistical significance was assessed using DeLong’s test for two correlated ROC curves.
前⽴腺癌AUC 0.986 ⽪膚癌AUC 0.986 乳癌LN転移AUC 0.965
WSI 1000枚以上で精度が安定
癌のタイプによって異なるが概ね1万
枚のWSIが必要
データセットのサイズと分類精度
ARTICLES
NATURE MEDICINE
MSK in-house
test set
scanned on Aperio
(n = 1,784)
MSK in-house
test set
scanned on Philips
(n = 1,274)
MSK external
test set
scanned on Aperio
(n = 12,727)
0.5
0.6
0.7
0.8
0.9
1.0
Multiple instance learning
(trained on MSK dataset)
MSK
test set
(n = 1,473 )
CAMELYON16
test set
(n = 129)
CAMELYON16
test set
(n = 129)
MSK
test set
(n = 1,473 )
0.5
0.6
0.7
0.8
0.9
1.0
Fully supervised learning
(trained on CAMELYON16 dataset)
AUC
AUC
–5.84%
–2.65%
–7.15%
–20.2%
a b
Fig. 5 | Weak supervision on large datasets leads to higher generalization performance than fully supervised learning on small curated datasets. The
generalization performance of the proposed prostate and breast models were evaluated on different external test sets. a, Results of the prostate model
trained with MIL on MSK in-house slides and tested on: (1) the in-house test set (n=1,784) digitized on Leica Aperio AT2 scanners; (2) the in-house test
コンサル症例でもそんなに精度変わらない
アノテーションデータから作ったモデルでもデータ
が少ないと汎⽤性低い
True positives
False negatives
False positives
200 µm
Prostate BCC Axillary lymph nodes
False
negative
False
positive
False
negative
False
positive
False
negative
False
positive
Benign/negative 3 56 3 2 17 1
Atypical/other/suspicious 3 16 1 11 4 31
Carcinoma/positive 6 0 12 4 23 2
True error rate 6/345 72/1,439 12/255 13/1,320 23/403 32/1,070
d
200 µm
200 µm 200 µm
200 µm
200 µm
200 µm
200 µm
200 µm
Fig. 4 | Pathology analysis of the misclassification errors on the test sets. a–c, Randomly selected examples of classification results on the
set. Examples of true positive, false negative and false positive classifications are shown for each tumor type. The MIL-RNN model trained a
magnification was run with a step size of 20 pixels across a region of interest, generating a tumor probability heat map. On every slide, the b
represents the enlarged area. For the prostate dataset (a), the true positive represents a difficult diagnosis due to tumor found next to atrop
inflammation; the false negative shows a very low tumor volume; and for the false positive the model identified atypical small acinar prolifer
a small focus of glands with atypical epithelial cells. For the BCC dataset (b), the true positive has a low tumor volume; the false negative ha
volume; and for the false positive the tongue of the epithelium abutting from the base of the epidermis shows an architecture similar to BCC
axillary lymph nodes dataset (c), the true positive shows ITCs with a neoadjuvant chemotherapy treatment effect; the false negative shows
focus cluster of ITCs missed due to the very low tumor volume and blurring; and the false positive shows displaced epithelium/benign papi
in a lymph node. d, Subspecialty pathologists analyzed the slides that were misclassified by the MIL-RNN models. While slides can either b
negative for a specific tumor, sometimes it is not possible to diagnose a single slide with certainty based on morphology alone. These cases
テストセットにおけるmisclassificationの内訳
True positives
False negatives
False positives
200 µm
Prostate BCC Axillary lymph
False
negative
False
positive
False
negative
False
positive
False
negative
F
po
Benign/negative 3 56 3 2 17
Atypical/other/suspicious 3 16 1 11 4
Carcinoma/positive 6 0 12 4 23
True error rate 6/345 72/1,439 12/255 13/1,320 23/403 32/
d
200 µm
200 µm 200 µm
200 µm
200 µm
200 µm
Fig. 4 | Pathology analysis of the misclassification errors on the test sets. a–c, Randomly selected examples of classification re
set. Examples of true positive, false negative and false positive classifications are shown for each tumor type. The MIL-RNN mod
magnification was run with a step size of 20 pixels across a region of interest, generating a tumor probability heat map. On every
represents the enlarged area. For the prostate dataset (a), the true positive represents a difficult diagnosis due to tumor found n
inflammation; the false negative shows a very low tumor volume; and for the false positive the model identified atypical small ac
a small focus of glands with atypical epithelial cells. For the BCC dataset (b), the true positive has a low tumor volume; the false
volume; and for the false positive the tongue of the epithelium abutting from the base of the epidermis shows an architecture sim
axillary lymph nodes dataset (c), the true positive shows ITCs with a neoadjuvant chemotherapy treatment effect; the false nega
focus cluster of ITCs missed due to the very low tumor volume and blurring; and the false positive shows displaced epithelium/b
in a lymph node. d, Subspecialty pathologists analyzed the slides that were misclassified by the MIL-RNN models. While slides c
negative for a specific tumor, sometimes it is not possible to diagnose a single slide with certainty based on morphology alone. T
False negativeとなるのは微⼩な癌が多い
atypical/suspiciousは形態学のみでは決められないもの
乳癌では術前化学療法を受けていた症例も含まれていた
(実臨床では許容範囲だろう)
倍率の違い;前⽴腺癌ではx20で,⽪膚ではx5が精度が⾼かった.
前⽴腺では20はFNが少なく,x5はFPが少なかった.総合的には単⼀倍率よりもensembleが精度が良かった
in Fig. 6 (see Extended Data Fig. 6 for BCC and breast metastases)
that our prostate model would allow the removal of more than 75%
of the slides from the workload of a pathologist without any loss in
sensitivity at the patient level. For pathologists who must operate in
the increasingly complex, detailed and data-driven environment of
cancer diagnostics, tools such as this will allow non-subspecialized
pathologists to confidently and efficiently classify cancer with 100%
sensitivity.
Online content
Any methods, additional references, Nature Research reporting
summaries, source data, statements of code and data availability and
associated accession codes are available at https://doi.org/10.1038/
s41591-019-0508-1.
Received: 23 October 2018; Accepted: 3 June 2019;
Published: xx xx xxxx
References
1. Ball, C. S. The early history of the compound microscope. Bios 37,
51–60 (1966).
2. Hajdu, S. I. Microscopic contributions of pioneer pathologists. Ann. Clin. Lab.
Sci. 41, 201–206 (2011).
3. Fuchs, T. J., Wild, P. J., Moch, H. & Buhmann, J. M. Computational pathology
analysis of tissue microarrays predicts survival of renal clear cell carcinoma
13. Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images.
Preprint at https://arxiv.org/abs/1703.02442 (2017).
14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee, J. & Sheet, D. Classifying
histopathology whole-slides using fusion of decisions from deep
convolutional network on a collection of random multi-views at multi-
magnification. In 2017 IEEE 14th International Symposium on Biomedical
Imaging 1024–1027 (IEEE, 2017).
15. Valkonen, M. et al. Metastasis detection from whole slide images using local
features and random forests. Cytom. Part A 91, 555–565 (2017).
16. Bejnordi, B. E. et al. Using deep convolutional neural networks to
identify and classify tumor-associated stroma in diagnostic breast biopsies.
Mod. Pathol. 31, 1502–1512 (2018).
17. Mobadersany, P. et al. Predicting cancer outcomes from histology and
genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115,
E2970–E2979 (2018).
18. Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep learning
for identifying metastatic breast cancer. Preprint at https://arxiv.org/
abs/1606.05718 (2016).
19. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image
analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inform. 7,
29 (2016).
20. Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency
of histopathological diagnosis. Sci. Rep. 6, 26286 (2016).
21. Coudray, N. et al. Classification and mutation prediction from non-small cell
lung cancer histopathology images using deep learning. Nat. Med. 24,
1559–1567 (2018).
22. Olsen, T. et al. Diagnostic performance of deep learning algorithms applied to
three common diagnoses in dermatopathology. J. Pathol. Inform. 9, 32 (2018).
23. Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning
Predicted
positive
Predicted
negative
0
0.25
0.50
0.75
1.00
0
0.25
0.50
0.75
1.00
0 25 50 75 100
% slides reviewed
Sensitivity
Probability
Tumor
probability
Cases
a b
Fig. 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each case, based on their tumor
probability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s prediction would allow
pathologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case level (n=1,784).
各症例には,癌があるスライド
とないスライドが混ざっている
癌があるかな〜と思いながら
順に⾒ていくしかない
ig. 6 (see Extended Data Fig. 6 for BCC and breast metastases)
our prostate model would allow the removal of more than 75%
he slides from the workload of a pathologist without any loss in
13. Liu, Y. et al. Detecting cancer metastases on gig
Preprint at https://arxiv.org/abs/1703.02442 (20
14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee
histopathology whole-slides using fusion of dec
Predicted
positive
Predi
nega
0
0.25
0.50
0
0.25
0.50
0.75
1.00
0 25 50
% slides reviewed
Sensitivi
Probability
s
6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each c
ability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s pr
ologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case l
こちらはスルーできる
ポジティブス
ライドに集中
ARTICLES
NATURE MEDICINE
⽪膚がん (n=1575)
threshold 0.025
乳癌LN転移 (n=1473)
65% 65%
threshold 0.25
診断で重要なのは,患者レベルで癌の⾒落と
しがないこと
-> 感度100%が必要
感度100%にあげても,気合を⼊れてチェック
するスライドは3割程度.仕事量の65〜75%が
エネルギーダウンできる
in Fig. 6 (see Extended Data Fig. 6 for BCC and breast metastases)
that our prostate model would allow the removal of more than 75%
of the slides from the workload of a pathologist without any loss in
sensitivity at the patient level. For pathologists who must operate in
the increasingly complex, detailed and data-driven environment of
cancer diagnostics, tools such as this will allow non-subspecialized
pathologists to confidently and efficiently classify cancer with 100%
sensitivity.
Online content
13. Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images.
Preprint at https://arxiv.org/abs/1703.02442 (2017).
14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee, J. & Sheet, D. Classifying
histopathology whole-slides using fusion of decisions from deep
convolutional network on a collection of random multi-views at multi-
magnification. In 2017 IEEE 14th International Symposium on Biomedical
Imaging 1024–1027 (IEEE, 2017).
15. Valkonen, M. et al. Metastasis detection from whole slide images using local
features and random forests. Cytom. Part A 91, 555–565 (2017).
16. Bejnordi, B. E. et al. Using deep convolutional neural networks to
identify and classify tumor-associated stroma in diagnostic breast biopsies.
Mod. Pathol. 31, 1502–1512 (2018).
Predicted
positive
Predicted
negative
0
0.25
0.50
0.75
1.00
0
0.25
0.50
0.75
1.00
0 25 50 75 100
% slides reviewed
Sensitivity
Probability
Tumor
probability
Cases
a b
Fig. 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each case, based on their tumor
probability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s prediction would allow
pathologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case level (n=1,784).
前⽴腺がん針⽣検 (n=1784)
75%の症例が癌陰性例
感度1
positive prediction threshold 0.5
probabilityの順にソートしていき感度を計算
まとめ
§ 余計な労⼒を追加せず,既存のデータを活⽤し,癌の検出に実臨床レベルの性能を発揮
§ この研究のすごいところ:とにかくデータ量が多い(単⼀の癌で1万枚のWSI)
§ ⽣データが持つ⽣物学的多様性・テクニカルなアーチファクトまでの学習してしまうほど
§ “Clinical-grade”の⽬標とするところ
⼈間のパフォーマンスと競うものではない(診断時間,検出精度etc)
なぜなら専⾨病理医のチームは100%の感度と特異度を持つ知識の集合体(超えられない壁)
⼈間は形態学+臨床像・分⼦⽣物学的情報など他のモダリティと総合的に判断して決断する
⽬指すところは,許容できる範囲の偽陽性率で(⼈間が後でチェック)がんの⾒落としがないこと
§ 病理医はあらゆる臓器の疾患に対処しなければならない,多種多様・複雑・詳細・膨⼤な知識量が必
要だが,専⾨分野以外の疾患にも⾒落としなく安⼼して診断できる環境を提供してくれる(期待)