LPixelLT20190802R.pdf

Clinical-grade computational pathology using weakly supervised deep learning on whole
slide images https://www.nature.com/articles/s41591-019-0508-1 (Published: 15 July 2019) Gabriele Campanella1,2, Matthew G. Hanna1, … & Thomas J. Fuchs1,2 1Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 2Weill Cornell Graduate School of Medical Sciences, New York, NY, USA. LPixel Inc. Presents Image Analysis x Machine Learning #6 2th Aug. 2019 @LPixel Inc. 発表者 Tu-chan ARTICLES https://doi.org/10.1038/s41591-019-0508-1 Clinical-grade computational pathology using weakly supervised deep learning on whole slide images Gabriele Campanella1,2, Matthew G. Hanna1, Luke Geneslaw1, Allen Miraflor1, Vitor Werneck Krauss Silva1, Klaus J. Busam1, Edi Brogi1, Victor E. Reuter1, David S. Klimstra1 and Thomas J. Fuchs 1,2* The development of decision support systems for pathology and their deployment in clinical practice have been hindered by the need for large manually annotated datasets. To overcome this problem, we present a multiple instance learning-based deep learning system that uses only the reported diagnoses as labels for training, thereby avoiding expensive and time-consuming pixel-wise manual annotations. We evaluated this framework at scale on a dataset of 44,732 whole slide images from 15,187 patients without any form of data curation. Tests on prostate cancer, basal cell carcinoma and breast cancer metastases to axillary lymph nodes resulted in areas under the curve above 0.98 for all cancer types. Its clinical application would allow pathologists to exclude 65–75% of slides while retaining 100% sensitivity. Our results show that this system has the ability to train accurate classification models at unprecedented scale, laying the foundation for the deployment of computational decision support systems in clinical practice.

§ 機械を⽤いた病理診断サポートを臨床現場にデプロイできない最⼤の理由は，病理医が提供する教師データの不⾜．病理のアノテーション作業はとにかく⼤変なのです． § 15,187⼈の患者の44,732枚のwhole-slide image (WSI)を⽤いて，診断結果だけをラベルとした weakly supervised learning
(multiple instance learning: MIL)⼿法でCNN&RNNモデルを構築 § 異なるタイプの癌（前⽴腺癌，⽪膚基底細胞癌，乳癌リンパ節転移）の診断において， AUC 0.98を達成，病理医の診断の⾒落としをなくし，業務負担を約7割削減できた論⽂の要旨！既存のWSIデータ！既存の診断レポート！病理医のアノテーションなし効率的な診断業務を実現患者に病名を伝える臨床医（はなやか）診断業務に追われる病理医（ブラック）

59,023 x 36,364 x3 = 6.0GB 461 x 284 =
130,924 tiles Whole-slide image (WSI) 59,023px 36,364px ect on the retrieval performance and its optimum value is ced not only by the physical storage medium but also by wer display resolution. While a tiled organization facilitates ning processes, the zooming process on the other hand is hallenge. At lower resolutions, larger areas of the image e accessed in order to display the requested region, as n Fig. 1(middle). In the extreme case, a thumbnail requires re image data to be processed. This would probably be a ensive task. der to optimize zooming, a lower resolution version of the ge can be pre-calculated and stored alongside the full res- image. Thus, the typical image pyramid organization arises, wn in Fig. 1(right). According to this scheme, the WSI con- multiple images at different magnifications where the pyra- vides distinct zoom level. The base of the pyramid contains hest resolution, while the top contains the lowest resolution typically a thumbnail. The thumbnail is a very low- on version of the image, making it easy to see the entire One or more levels may be created, at intermediate resolu- o facilitate the loading of arbitrary magnification levels. ramid level follows the tiled organization described above. trategies are valid for open and proprietary image formats. OM Workgroup 26 also adopted the image pyramid concept oduced it in the standard. The DICOM Supplement 145 [19] dvantage of the existing multi-frame objects where each olution image is stored in a separate multi-frame object, he individual tiles are stored as separated frames. Each res- level is assigned a different DICOM series. The standard pports 3D microscopy, a technique that acquires multiple of the slide at different depths [20]. Therefore, each resolu- el may contain several Z-planes, which represent acquisi- different focal points. The Z-planes can be stored as e objects within the DICOM series or, alternatively, in the bject with the corresponding magnification level. However, SOAP is considered a heavyweight technology for web-services, because the verbose messages make them difficult to include in lightweight applications, such as websites. Therefore, the community demanded the inclusion of simpler services based on the Restful technology. Moreover, web services for searching and storage were also requested. So in 2014, the standard introduced three REST web services: STOW-RS,1 QIDO-RS,2 and WADO-RS.3 Besides the traditional DICOM Services and web-services, DICOM also contemplates storage and streaming of JPEG2000 images. Currently, JPEG2000 is the most advanced format for gen- eral purpose images. The standard defines a lossless compression algorithm, as well as a protocol for ‘‘interacting with JPEG2000 based images in an efficient and effective manner” [23]. In fact, the JPEG2000 standard has a special focus towards image interac- tivity, supporting interesting features such as resolution scalability, progressive refinement and spatial randomness [23,24]. The JPIP protocol allows viewer application to interact with JPEG2000 images over networks. By using JPIP, viewer applications do not have to download the whole image, and can instead request from the JPIP server a particular region that best fits their visualization purposes [23,25]. This image streaming strategy allows the storage and consistency of image data to be optimized, as well as reducing the required bandwidth to support visualization of remote images [25]. The DICOM Standard supports both JPEG2000 images and the JPIP protocol. Supplement 61 [26] introduced the JPEG2000 format into regular DICOM images. The JPEG2000 pixel data is stored in the DICOM image pixel data attribute, just like any other compression format such as JPEG Baseline or JPEG-LS. As such, this supplement introduced two new transfer syntaxes: 1.2.840.10008.1.2.4.90, and 1.2.840.10008.1.2.4.91. The support for JPIP was introduced by supplement 106 [27]. In this case, the image pixel data is replaced by the JPIP server URL. As a result, the viewer application can request the image directly from the JPIP server, without mediation from the PACS Archive. This behavior is supported by the transfer Fig. 1. Single frame (left) format vs Tiled format (middle), and image pyramid example (right). T. Marques Godinho et al. / Journal of Biomedical Informatics 71 (2017) 190–197 Journal of Biomedical Informatics 71 (2017) 190–197 ガラススライド 7.5x2.5cm ARTICLES E MEDICINE a Dataset Years Slides Patients Positive slides External slides ImageNet Prostate in house 2016 12,132 836 2,402 0 19.8× Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0× Skin 2016–2017 9,962 5,325 1,659 3,710 21.4× Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2× Total 44,732 15,187 88.4× 63,744 px / 31.9 mm 28,649 px / 14.3 mm 3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm b 癌の腺管 1％以下

positiveラベルのbag negativeラベルのbag Multiple instance learning positiveラベルのbagのタイルは少なくとも⼀枚は癌タイル， negative bagのタイルは全て癌でない ARTICLES RE
MEDICINE ... ... Slide tiling Clinically relevant dataset Tile probability Top tiles Slide targets 1 1 ... 0 0 ... Learning Inference Classifier CNN a Ranked tiles d RNN aggregation Diagnosis 2 4 5 8 9 1 2 3 S 1 2 3 Tumor Trained MIL model ... Dataset Years Slides Patients Positive slides External slides ImageNet Prostate in house 2016 12,132 836 2,402 0 19.8× Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0× Skin 2016–2017 9,962 5,325 1,659 3,710 21.4× Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2× Total 44,732 15,187 88.4× 63,744 px / 31.9 mm 28,649 px / 14.3 mm 3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm b c データは操作せず全WSIをそのまま使⽤（ペンのマーキングが多すぎて使えなかった10枚を除く） Augmentation，⾊の標準化もなし．使⽤したデータセット ARTICLES NATURE MEDICINE ... ... Slide tiling Clinically relevant dataset Tile probability Top tiles Slide targets 1 1 ... 0 0 ... Learning Inference Classifier CNN a Ranked tiles d MIL feature representation RNN aggregation Diagnosis 1 2 3 4 5 6 7 8 9 1 2 3 S S 1 2 3 S ... Tumor probability 0 1.0 0.5 Trained MIL model ... Dataset Years Slides Patients Positive slides External slides ImageNet Prostate in house 2016 12,132 836 2,402 0 19.8× Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0× Skin 2016–2017 9,962 5,325 1,659 3,710 21.4× Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2× Total 44,732 15,187 88.4× 63,744 px / 31.9 mm 28,649 px / 14.3 mm 3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm b c Fig. 1 | Overview of the data and proposed deep learning framework presented in this study. a, Description of the datasets. This study is based on a total Extended Data Fig. 7 | Example of a slide tiled on a grid with no overlap at different magnifications. A slide represents a bag, and th instances in that bag. In this work, instances at different magnifications are not part of the same bag. mpp, microns per pixel. Tileのoverlapなし 50% overlap 67% overlap Tile size 224x224 テスト時には全ての倍率で80％overlapさせた異なる倍率ごとにTileを⽣成し， multiscale ensemble

ARTICLES NATURE MEDICINE −20 −10 0 10 20 −20 −10
0 10 20 t-SNE1 t-SNE2 0.25 0.50 0.75 1.00 Tumor probability 100 200 Count 0 0.1 0.2 0.3 0.4 0.5 102 103 104 Number of training WSIs Minimum balanced validation error Benign Malignant Malignant Suspicious a b c 56 µm 112 µm Fig. 2 | Dataset size impact and model introspection. a, Dataset size plays an important role in achieving clinical-grade MIL classification performance. Training of ResNet34 was performed with datasets of increasing size; for every reported training set size, five models were trained, and the validation errors are reported as box plots (n=5). This experiment underlies the fact that a large number of slides are necessary for generalization of learning under the MIL assumption. b,c, The prostate model has learned a rich feature representation of histopathology tiles. b, A ResNet34 model trained at 20× ARTICLES NATURE MEDICINE data, but its performance was not better than that achieved by the single-scale model trained at 20×. Pathology expert analysis of the MIL-RNN error modes. Pathologists specialized in each discipline analyzed the test set errors made by MIL-RNN models trained at 20× magnification (a selection of cases is presented in Fig. 4a–c). Several discrepancies (six in prostate, eight in BCC and 23 in axillary lymph nodes; see Fig. 4d) were found between the reported case diagnosis and the true slide class (that is, presence/absence of tumor). Because the ground truth is reliant on the diagnosis reported in the LIS, the observed discrepancies can be due to several factors: (1) under the current WSI scanning protocol, as only select slides are scanned in each case, there exists the possibility of a mismatch between the slide scanned and the reported LIS diagnosis linked to each case; (2) a deeper slide level with no carcinoma present could be selected for scanning; and (3) tissue was removed to create tissue microarrays before slide scanning. Encouragingly, the training In addition, two false positives were corrected to true positives. False negative to true negative corrections were due to the tissue of interest not being present on a deeper hematoxylin and eosin slide, or sampling error at the time the frozen section was prepared. False positive to true positive corrections were due to soft tissue metastatic deposits or tumor emboli. The AUC improved from 0.965 to 0.989 given these corrections. Of the 23 false negatives, eight were macro-metastasis, 13 were micro-metastasis and two were isolated tumor cells (ITCs). Notably, 12 cases (four false negatives and eight false positives) showed signs of treatment effect from neoadjuvant chemotherapy. Investigation of technical variability introduced by slide preparation at multiple institutions and different scanners. Several sources of variability come into play in computational pathology. In addition to all of the morphological variability, technical variability is introduced during glass slide preparation and scanning. How this variability can affect the prediction of an assistive model is a ques- 0 0.25 0.50 0.75 1.00 0 0.25 0.50 0.75 1.00 Specificity Sensitivity 0 0.25 0.50 0.75 1.00 0 0.25 0.50 0.75 1.00 Specificity Sensitivity 0.80 0.85 0.90 0.95 1.00 0.80 0.85 0.90 0.95 1.00 Model (P = 0.00023) MIL (AUC: 0.986) MIL-RNN (AUC: 0.991) 0 0.25 0.50 0.75 1.00 0 0.25 0.50 0.75 1.00 Specificity Sensitivity 0.80 0.85 0.90 0.95 1.00 0.80 0.85 0.90 0.95 1.00 Model (P = 0.9) MIL (AUC: 0.965) MIL-RNN (AUC: 0.966) 0.80 0.85 0.90 0.95 1.00 0.80 0.85 0.90 0.95 1.00 Model (P = 0.1) MIL (AUC: 0.986) MIL-RNN (AUC: 0.988) a b c Fig. 3 | Weakly supervised models achieve high performance across all tissue types. The performances of the models trained at 20× magnification on the respective test datasets were measured in terms of AUC for each tumor type. a, For prostate cancer (n=1,784) the MIL-RNN model significantly (P<0.001) outperformed the model trained with MIL alone, resulting in an AUC of 0.991. b,c, The BCC model (n=1,575) performed at 0.988 (b), while breast metastases detection (n=1,473) achieved an AUC of 0.966 (c). For these latter datasets, adding an RNN did not significantly improve performance. Statistical significance was assessed using DeLong’s test for two correlated ROC curves. 前⽴腺癌AUC 0.986 ⽪膚癌AUC 0.986 乳癌LN転移AUC 0.965 WSI 1000枚以上で精度が安定癌のタイプによって異なるが概ね1万枚のWSIが必要データセットのサイズと分類精度 ARTICLES NATURE MEDICINE MSK in-house test set scanned on Aperio (n = 1,784) MSK in-house test set scanned on Philips (n = 1,274) MSK external test set scanned on Aperio (n = 12,727) 0.5 0.6 0.7 0.8 0.9 1.0 Multiple instance learning (trained on MSK dataset) MSK test set (n = 1,473 ) CAMELYON16 test set (n = 129) CAMELYON16 test set (n = 129) MSK test set (n = 1,473 ) 0.5 0.6 0.7 0.8 0.9 1.0 Fully supervised learning (trained on CAMELYON16 dataset) AUC AUC –5.84% –2.65% –7.15% –20.2% a b Fig. 5 | Weak supervision on large datasets leads to higher generalization performance than fully supervised learning on small curated datasets. The generalization performance of the proposed prostate and breast models were evaluated on different external test sets. a, Results of the prostate model trained with MIL on MSK in-house slides and tested on: (1) the in-house test set (n=1,784) digitized on Leica Aperio AT2 scanners; (2) the in-house test コンサル症例でもそんなに精度変わらないアノテーションデータから作ったモデルでもデータが少ないと汎⽤性低い True positives False negatives False positives 200 µm Prostate BCC Axillary lymph nodes False negative False positive False negative False positive False negative False positive Benign/negative 3 56 3 2 17 1 Atypical/other/suspicious 3 16 1 11 4 31 Carcinoma/positive 6 0 12 4 23 2 True error rate 6/345 72/1,439 12/255 13/1,320 23/403 32/1,070 d 200 µm 200 µm 200 µm 200 µm 200 µm 200 µm 200 µm 200 µm Fig. 4 | Pathology analysis of the misclassification errors on the test sets. a–c, Randomly selected examples of classification results on the set. Examples of true positive, false negative and false positive classifications are shown for each tumor type. The MIL-RNN model trained a magnification was run with a step size of 20 pixels across a region of interest, generating a tumor probability heat map. On every slide, the b represents the enlarged area. For the prostate dataset (a), the true positive represents a difficult diagnosis due to tumor found next to atrop inflammation; the false negative shows a very low tumor volume; and for the false positive the model identified atypical small acinar prolifer a small focus of glands with atypical epithelial cells. For the BCC dataset (b), the true positive has a low tumor volume; the false negative ha volume; and for the false positive the tongue of the epithelium abutting from the base of the epidermis shows an architecture similar to BCC axillary lymph nodes dataset (c), the true positive shows ITCs with a neoadjuvant chemotherapy treatment effect; the false negative shows focus cluster of ITCs missed due to the very low tumor volume and blurring; and the false positive shows displaced epithelium/benign papi in a lymph node. d, Subspecialty pathologists analyzed the slides that were misclassified by the MIL-RNN models. While slides can either b negative for a specific tumor, sometimes it is not possible to diagnose a single slide with certainty based on morphology alone. These cases テストセットにおけるmisclassificationの内訳 True positives False negatives False positives 200 µm Prostate BCC Axillary lymph False negative False positive False negative False positive False negative F po Benign/negative 3 56 3 2 17 Atypical/other/suspicious 3 16 1 11 4 Carcinoma/positive 6 0 12 4 23 True error rate 6/345 72/1,439 12/255 13/1,320 23/403 32/ d 200 µm 200 µm 200 µm 200 µm 200 µm 200 µm Fig. 4 | Pathology analysis of the misclassification errors on the test sets. a–c, Randomly selected examples of classification re set. Examples of true positive, false negative and false positive classifications are shown for each tumor type. The MIL-RNN mod magnification was run with a step size of 20 pixels across a region of interest, generating a tumor probability heat map. On every represents the enlarged area. For the prostate dataset (a), the true positive represents a difficult diagnosis due to tumor found n inflammation; the false negative shows a very low tumor volume; and for the false positive the model identified atypical small ac a small focus of glands with atypical epithelial cells. For the BCC dataset (b), the true positive has a low tumor volume; the false volume; and for the false positive the tongue of the epithelium abutting from the base of the epidermis shows an architecture sim axillary lymph nodes dataset (c), the true positive shows ITCs with a neoadjuvant chemotherapy treatment effect; the false nega focus cluster of ITCs missed due to the very low tumor volume and blurring; and the false positive shows displaced epithelium/b in a lymph node. d, Subspecialty pathologists analyzed the slides that were misclassified by the MIL-RNN models. While slides c negative for a specific tumor, sometimes it is not possible to diagnose a single slide with certainty based on morphology alone. T False negativeとなるのは微⼩な癌が多い atypical/suspiciousは形態学のみでは決められないもの乳癌では術前化学療法を受けていた症例も含まれていた（実臨床では許容範囲だろう）倍率の違い；前⽴腺癌ではx20で，⽪膚ではx5が精度が⾼かった．前⽴腺では20はFNが少なく，x5はFPが少なかった．総合的には単⼀倍率よりもensembleが精度が良かった

in Fig. 6 (see Extended Data Fig. 6 for BCC
and breast metastases) that our prostate model would allow the removal of more than 75% of the slides from the workload of a pathologist without any loss in sensitivity at the patient level. For pathologists who must operate in the increasingly complex, detailed and data-driven environment of cancer diagnostics, tools such as this will allow non-subspecialized pathologists to confidently and efficiently classify cancer with 100% sensitivity. Online content Any methods, additional references, Nature Research reporting summaries, source data, statements of code and data availability and associated accession codes are available at https://doi.org/10.1038/ s41591-019-0508-1. Received: 23 October 2018; Accepted: 3 June 2019; Published: xx xx xxxx References 1. Ball, C. S. The early history of the compound microscope. Bios 37, 51–60 (1966). 2. Hajdu, S. I. Microscopic contributions of pioneer pathologists. Ann. Clin. Lab. Sci. 41, 201–206 (2011). 3. Fuchs, T. J., Wild, P. J., Moch, H. & Buhmann, J. M. Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma 13. Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images. Preprint at https://arxiv.org/abs/1703.02442 (2017). 14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee, J. & Sheet, D. Classifying histopathology whole-slides using fusion of decisions from deep convolutional network on a collection of random multi-views at multi- magnification. In 2017 IEEE 14th International Symposium on Biomedical Imaging 1024–1027 (IEEE, 2017). 15. Valkonen, M. et al. Metastasis detection from whole slide images using local features and random forests. Cytom. Part A 91, 555–565 (2017). 16. Bejnordi, B. E. et al. Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Mod. Pathol. 31, 1502–1512 (2018). 17. Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018). 18. Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep learning for identifying metastatic breast cancer. Preprint at https://arxiv.org/ abs/1606.05718 (2016). 19. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inform. 7, 29 (2016). 20. Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 6, 26286 (2016). 21. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018). 22. Olsen, T. et al. Diagnostic performance of deep learning algorithms applied to three common diagnoses in dermatopathology. J. Pathol. Inform. 9, 32 (2018). 23. Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning Predicted positive Predicted negative 0 0.25 0.50 0.75 1.00 0 0.25 0.50 0.75 1.00 0 25 50 75 100 % slides reviewed Sensitivity Probability Tumor probability Cases a b Fig. 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each case, based on their tumor probability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s prediction would allow pathologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case level (n=1,784). 各症例には，癌があるスライドとないスライドが混ざっている癌があるかな〜と思いながら順に⾒ていくしかない ig. 6 (see Extended Data Fig. 6 for BCC and breast metastases) our prostate model would allow the removal of more than 75% he slides from the workload of a pathologist without any loss in 13. Liu, Y. et al. Detecting cancer metastases on gig Preprint at https://arxiv.org/abs/1703.02442 (20 14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee histopathology whole-slides using fusion of dec Predicted positive Predi nega 0 0.25 0.50 0 0.25 0.50 0.75 1.00 0 25 50 % slides reviewed Sensitivi Probability s 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each c ability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s pr ologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case l こちらはスルーできるポジティブスライドに集中 ARTICLES NATURE MEDICINE ⽪膚がん (n=1575) threshold 0.025 乳癌LN転移 (n=1473) 65％ 65％ threshold 0.25 診断で重要なのは，患者レベルで癌の⾒落としがないこと -> 感度100％が必要感度100％にあげても，気合を⼊れてチェックするスライドは3割程度．仕事量の65〜75％がエネルギーダウンできる in Fig. 6 (see Extended Data Fig. 6 for BCC and breast metastases) that our prostate model would allow the removal of more than 75% of the slides from the workload of a pathologist without any loss in sensitivity at the patient level. For pathologists who must operate in the increasingly complex, detailed and data-driven environment of cancer diagnostics, tools such as this will allow non-subspecialized pathologists to confidently and efficiently classify cancer with 100% sensitivity. Online content 13. Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images. Preprint at https://arxiv.org/abs/1703.02442 (2017). 14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee, J. & Sheet, D. Classifying histopathology whole-slides using fusion of decisions from deep convolutional network on a collection of random multi-views at multi- magnification. In 2017 IEEE 14th International Symposium on Biomedical Imaging 1024–1027 (IEEE, 2017). 15. Valkonen, M. et al. Metastasis detection from whole slide images using local features and random forests. Cytom. Part A 91, 555–565 (2017). 16. Bejnordi, B. E. et al. Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Mod. Pathol. 31, 1502–1512 (2018). Predicted positive Predicted negative 0 0.25 0.50 0.75 1.00 0 0.25 0.50 0.75 1.00 0 25 50 75 100 % slides reviewed Sensitivity Probability Tumor probability Cases a b Fig. 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each case, based on their tumor probability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s prediction would allow pathologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case level (n=1,784). 前⽴腺がん針⽣検 (n=1784) 75％の症例が癌陰性例感度1 positive prediction threshold 0.5 probabilityの順にソートしていき感度を計算

まとめ § 余計な労⼒を追加せず，既存のデータを活⽤し，癌の検出に実臨床レベルの性能を発揮 § この研究のすごいところ：とにかくデータ量が多い（単⼀の癌で1万枚のWSI） § ⽣データが持つ⽣物学的多様性・テクニカルなアーチファクトまでの学習してしまうほど § “Clinical-grade”の⽬標とするところ⼈間のパフォーマンスと競うものではない（診断時間，検出精度etc）
なぜなら専⾨病理医のチームは100％の感度と特異度を持つ知識の集合体（超えられない壁）⼈間は形態学＋臨床像・分⼦⽣物学的情報など他のモダリティと総合的に判断して決断する⽬指すところは，許容できる範囲の偽陽性率で（⼈間が後でチェック）がんの⾒落としがないこと § 病理医はあらゆる臓器の疾患に対処しなければならない，多種多様・複雑・詳細・膨⼤な知識量が必要だが，専⾨分野以外の疾患にも⾒落としなく安⼼して診断できる環境を提供してくれる（期待）

LPixelLT20190802R.pdf

LPixelLT20190802R.pdf

Tsuyama

More Decks by Tsuyama

Featured

Transcript

Clinical-grade computational pathology using weakly supervised deep learning on whole

59,023 x 36,364 x3 = 6.0GB 461 x 284 =

positiveラベルのbag negativeラベルのbag Multiple instance learning positiveラベルのbagのタイルは少なくとも⼀枚は癌タイル， negative bagのタイルは全て癌でない ARTICLES RE

ARTICLES NATURE MEDICINE −20 −10 0 10 20 −20 −10

in Fig. 6 (see Extended Data Fig. 6 for BCC