Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LPixelLT20190802R.pdf

Tsuyama
August 03, 2019
290

 LPixelLT20190802R.pdf

Tsuyama

August 03, 2019
Tweet

Transcript

  1. Clinical-grade computational pathology using weakly supervised
    deep learning on whole slide images
    https://www.nature.com/articles/s41591-019-0508-1
    (Published: 15 July 2019)
    Gabriele Campanella1,2, Matthew G. Hanna1, … & Thomas J. Fuchs1,2
    1Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
    2Weill Cornell Graduate School of Medical Sciences, New York, NY, USA.
    LPixel Inc. Presents
    Image Analysis x Machine Learning #6
    2th Aug. 2019 @LPixel Inc.
    発表者 Tu-chan
    ARTICLES
    https://doi.org/10.1038/s41591-019-0508-1
    Clinical-grade computational pathology using
    weakly supervised deep learning on whole
    slide images
    Gabriele Campanella1,2, Matthew G. Hanna1, Luke Geneslaw1, Allen Miraflor1,
    Vitor Werneck Krauss Silva1, Klaus J. Busam1, Edi Brogi1, Victor E. Reuter1, David S. Klimstra1
    and Thomas J. Fuchs 1,2*
    The development of decision support systems for pathology and their deployment in clinical practice have been hindered by
    the need for large manually annotated datasets. To overcome this problem, we present a multiple instance learning-based deep
    learning system that uses only the reported diagnoses as labels for training, thereby avoiding expensive and time-consuming
    pixel-wise manual annotations. We evaluated this framework at scale on a dataset of 44,732 whole slide images from 15,187
    patients without any form of data curation. Tests on prostate cancer, basal cell carcinoma and breast cancer metastases to
    axillary lymph nodes resulted in areas under the curve above 0.98 for all cancer types. Its clinical application would allow
    pathologists to exclude 65–75% of slides while retaining 100% sensitivity. Our results show that this system has the ability
    to train accurate classification models at unprecedented scale, laying the foundation for the deployment of computational
    decision support systems in clinical practice.

    View Slide

  2. § 機械を⽤いた病理診断サポートを臨床現場にデプロイできない最⼤の理由は,病理医が提
    供する教師データの不⾜.病理のアノテーション作業はとにかく⼤変なのです.
    § 15,187⼈の患者の44,732枚のwhole-slide image (WSI)を⽤いて,診断結果だけをラベルとした
    weakly supervised learning (multiple instance learning: MIL)⼿法でCNN&RNNモデルを構築
    § 異なるタイプの癌(前⽴腺癌,⽪膚基底細胞癌,乳癌リンパ節転移)の診断において,
    AUC 0.98を達成,病理医の診断の⾒落としをなくし,業務負担を約7割削減できた
    論⽂の要旨
    !既存のWSIデータ
    !既存の診断レポート
    !病理医のアノテーションなし
    効率的な診断業務を実現
    患者に病名を伝える臨床医(はなやか) 診断業務に追われる病理医(ブラック)

    View Slide

  3. 59,023 x 36,364 x3 = 6.0GB
    461 x 284 = 130,924 tiles
    Whole-slide image (WSI)
    59,023px
    36,364px
    ect on the retrieval performance and its optimum value is
    ced not only by the physical storage medium but also by
    wer display resolution. While a tiled organization facilitates
    ning processes, the zooming process on the other hand is
    hallenge. At lower resolutions, larger areas of the image
    e accessed in order to display the requested region, as
    n Fig. 1(middle). In the extreme case, a thumbnail requires
    re image data to be processed. This would probably be a
    ensive task.
    der to optimize zooming, a lower resolution version of the
    ge can be pre-calculated and stored alongside the full res-
    image. Thus, the typical image pyramid organization arises,
    wn in Fig. 1(right). According to this scheme, the WSI con-
    multiple images at different magnifications where the pyra-
    vides distinct zoom level. The base of the pyramid contains
    hest resolution, while the top contains the lowest resolution
    typically a thumbnail. The thumbnail is a very low-
    on version of the image, making it easy to see the entire
    One or more levels may be created, at intermediate resolu-
    o facilitate the loading of arbitrary magnification levels.
    ramid level follows the tiled organization described above.
    trategies are valid for open and proprietary image formats.
    OM Workgroup 26 also adopted the image pyramid concept
    oduced it in the standard. The DICOM Supplement 145 [19]
    dvantage of the existing multi-frame objects where each
    olution image is stored in a separate multi-frame object,
    he individual tiles are stored as separated frames. Each res-
    level is assigned a different DICOM series. The standard
    pports 3D microscopy, a technique that acquires multiple
    of the slide at different depths [20]. Therefore, each resolu-
    el may contain several Z-planes, which represent acquisi-
    different focal points. The Z-planes can be stored as
    e objects within the DICOM series or, alternatively, in the
    bject with the corresponding magnification level.
    However, SOAP is considered a heavyweight technology for
    web-services, because the verbose messages make them difficult
    to include in lightweight applications, such as websites. Therefore,
    the community demanded the inclusion of simpler services based
    on the Restful technology. Moreover, web services for searching
    and storage were also requested. So in 2014, the standard intro-
    duced three REST web services: STOW-RS,1 QIDO-RS,2 and
    WADO-RS.3
    Besides the traditional DICOM Services and web-services,
    DICOM also contemplates storage and streaming of JPEG2000
    images. Currently, JPEG2000 is the most advanced format for gen-
    eral purpose images. The standard defines a lossless compression
    algorithm, as well as a protocol for ‘‘interacting with JPEG2000
    based images in an efficient and effective manner” [23]. In fact,
    the JPEG2000 standard has a special focus towards image interac-
    tivity, supporting interesting features such as resolution scalability,
    progressive refinement and spatial randomness [23,24]. The JPIP
    protocol allows viewer application to interact with JPEG2000
    images over networks. By using JPIP, viewer applications do not
    have to download the whole image, and can instead request from
    the JPIP server a particular region that best fits their visualization
    purposes [23,25]. This image streaming strategy allows the storage
    and consistency of image data to be optimized, as well as reducing
    the required bandwidth to support visualization of remote images
    [25].
    The DICOM Standard supports both JPEG2000 images and the
    JPIP protocol. Supplement 61 [26] introduced the JPEG2000 format
    into regular DICOM images. The JPEG2000 pixel data is stored in
    the DICOM image pixel data attribute, just like any other compres-
    sion format such as JPEG Baseline or JPEG-LS. As such, this supple-
    ment introduced two new transfer syntaxes: 1.2.840.10008.1.2.4.90,
    and 1.2.840.10008.1.2.4.91. The support for JPIP was introduced by
    supplement 106 [27]. In this case, the image pixel data is replaced
    by the JPIP server URL. As a result, the viewer application can
    request the image directly from the JPIP server, without mediation
    from the PACS Archive. This behavior is supported by the transfer
    Fig. 1. Single frame (left) format vs Tiled format (middle), and image pyramid example (right).
    T. Marques Godinho et al. / Journal of Biomedical Informatics 71 (2017) 190–197
    Journal of Biomedical Informatics 71 (2017) 190–197
    ガラススライド 7.5x2.5cm
    ARTICLES
    E MEDICINE
    a
    Dataset Years Slides Patients Positive slides External slides ImageNet
    Prostate in house 2016 12,132 836 2,402 0 19.8×
    Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0×
    Skin 2016–2017 9,962 5,325 1,659 3,710 21.4×
    Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2×
    Total 44,732 15,187 88.4×
    63,744 px / 31.9 mm
    28,649 px / 14.3 mm
    3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm
    b
    癌の腺管
    1%以下

    View Slide

  4. positiveラベルのbag
    negativeラベルのbag
    Multiple instance learning
    positiveラベルのbagのタイルは少なくとも⼀枚は癌タイル,
    negative bagのタイルは全て癌でない
    ARTICLES
    RE MEDICINE
    ...
    ...
    Slide tiling
    Clinically relevant
    dataset
    Tile probability Top tiles Slide
    targets
    1
    1
    ...
    0
    0
    ...
    Learning
    Inference
    Classifier
    CNN
    a
    Ranked tiles
    d
    RNN aggregation
    Diagnosis
    2
    4
    5
    8
    9 1 2 3 S
    1
    2
    3
    Tumor
    Trained
    MIL
    model
    ...
    Dataset Years Slides Patients Positive slides External slides ImageNet
    Prostate in house 2016 12,132 836 2,402 0 19.8×
    Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0×
    Skin 2016–2017 9,962 5,325 1,659 3,710 21.4×
    Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2×
    Total 44,732 15,187 88.4×
    63,744 px / 31.9 mm
    28,649 px / 14.3 mm
    3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm
    b
    c
    データは操作せず全WSIをそのまま使⽤(ペンのマーキングが多すぎて使えなかった10枚を除く)
    Augmentation,⾊の標準化もなし.
    使⽤したデータセット
    ARTICLES
    NATURE MEDICINE
    ...
    ...
    Slide tiling
    Clinically relevant
    dataset
    Tile probability Top tiles Slide
    targets
    1
    1
    ...
    0
    0
    ...
    Learning
    Inference
    Classifier
    CNN
    a
    Ranked tiles
    d
    MIL feature representation
    RNN aggregation
    Diagnosis
    1
    2
    3
    4
    5
    6
    7
    8
    9 1 2 3 S
    S
    1
    2
    3
    S
    ...
    Tumor
    probability
    0
    1.0
    0.5
    Trained
    MIL
    model
    ...
    Dataset Years Slides Patients Positive slides External slides ImageNet
    Prostate in house 2016 12,132 836 2,402 0 19.8×
    Prostate external 2015–2017 12,727 6,323 12,413 12,727 29.0×
    Skin 2016–2017 9,962 5,325 1,659 3,710 21.4×
    Axillary lymph nodes 2013–2018 9,894 2,703 2,521 1,224 18.2×
    Total 44,732 15,187 88.4×
    63,744 px / 31.9 mm
    28,649 px / 14.3 mm
    3,000 px / 1.5 mm 1,200 px / 600 µm 300 px / 150 µm
    b
    c
    Fig. 1 | Overview of the data and proposed deep learning framework presented in this study. a, Description of the datasets. This study is based on a total
    Extended Data Fig. 7 | Example of a slide tiled on a grid with no overlap at different magnifications. A slide represents a bag, and th
    instances in that bag. In this work, instances at different magnifications are not part of the same bag. mpp, microns per pixel.
    Tileのoverlapなし
    50% overlap
    67% overlap
    Tile size 224x224
    テスト時には全ての倍率で80%overlapさせた
    異なる倍率ごとにTileを⽣成し,
    multiscale ensemble

    View Slide

  5. ARTICLES NATURE MEDICINE
    −20
    −10
    0
    10
    20
    −20 −10 0 10 20
    t-SNE1
    t-SNE2
    0.25
    0.50
    0.75
    1.00
    Tumor probability
    100
    200
    Count
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    102 103 104
    Number of training WSIs
    Minimum balanced validation error
    Benign
    Malignant
    Malignant Suspicious
    a b
    c
    56 µm
    112 µm
    Fig. 2 | Dataset size impact and model introspection. a, Dataset size plays an important role in achieving clinical-grade MIL classification performance.
    Training of ResNet34 was performed with datasets of increasing size; for every reported training set size, five models were trained, and the validation
    errors are reported as box plots (n=5). This experiment underlies the fact that a large number of slides are necessary for generalization of learning
    under the MIL assumption. b,c, The prostate model has learned a rich feature representation of histopathology tiles. b, A ResNet34 model trained at 20×
    ARTICLES
    NATURE MEDICINE
    data, but its performance was not better than that achieved by the
    single-scale model trained at 20×.
    Pathology expert analysis of the MIL-RNN error modes.
    Pathologists specialized in each discipline analyzed the test set
    errors made by MIL-RNN models trained at 20× magnification
    (a selection of cases is presented in Fig. 4a–c). Several discrepan-
    cies (six in prostate, eight in BCC and 23 in axillary lymph nodes;
    see Fig. 4d) were found between the reported case diagnosis
    and the true slide class (that is, presence/absence of tumor).
    Because the ground truth is reliant on the diagnosis reported in
    the LIS, the observed discrepancies can be due to several factors:
    (1) under the current WSI scanning protocol, as only select slides
    are scanned in each case, there exists the possibility of a mismatch
    between the slide scanned and the reported LIS diagnosis linked to
    each case; (2) a deeper slide level with no carcinoma present could
    be selected for scanning; and (3) tissue was removed to create tis-
    sue microarrays before slide scanning. Encouragingly, the training
    In addition, two false positives were corrected to true positives.
    False negative to true negative corrections were due to the tissue of
    interest not being present on a deeper hematoxylin and eosin slide,
    or sampling error at the time the frozen section was prepared. False
    positive to true positive corrections were due to soft tissue meta-
    static deposits or tumor emboli. The AUC improved from 0.965 to
    0.989 given these corrections. Of the 23 false negatives, eight were
    macro-metastasis, 13 were micro-metastasis and two were isolated
    tumor cells (ITCs). Notably, 12 cases (four false negatives and eight
    false positives) showed signs of treatment effect from neoadjuvant
    chemotherapy.
    Investigation of technical variability introduced by slide prepa-
    ration at multiple institutions and different scanners. Several
    sources of variability come into play in computational pathology. In
    addition to all of the morphological variability, technical variability
    is introduced during glass slide preparation and scanning. How this
    variability can affect the prediction of an assistive model is a ques-
    0
    0.25
    0.50
    0.75
    1.00
    0
    0.25
    0.50
    0.75
    1.00
    Specificity
    Sensitivity
    0
    0.25
    0.50
    0.75
    1.00
    0
    0.25
    0.50
    0.75
    1.00
    Specificity
    Sensitivity
    0.80
    0.85
    0.90
    0.95
    1.00
    0.80
    0.85
    0.90
    0.95
    1.00
    Model
    (P = 0.00023)
    MIL (AUC: 0.986)
    MIL-RNN (AUC: 0.991)
    0
    0.25
    0.50
    0.75
    1.00
    0
    0.25
    0.50
    0.75
    1.00
    Specificity
    Sensitivity
    0.80
    0.85
    0.90
    0.95
    1.00
    0.80
    0.85
    0.90
    0.95
    1.00
    Model
    (P = 0.9)
    MIL (AUC: 0.965)
    MIL-RNN (AUC: 0.966)
    0.80
    0.85
    0.90
    0.95
    1.00
    0.80
    0.85
    0.90
    0.95
    1.00
    Model
    (P = 0.1)
    MIL (AUC: 0.986)
    MIL-RNN (AUC: 0.988)
    a b c
    Fig. 3 | Weakly supervised models achieve high performance across all tissue types. The performances of the models trained at 20× magnification on
    the respective test datasets were measured in terms of AUC for each tumor type. a, For prostate cancer (n=1,784) the MIL-RNN model significantly
    (P<0.001) outperformed the model trained with MIL alone, resulting in an AUC of 0.991. b,c, The BCC model (n=1,575) performed at 0.988 (b), while
    breast metastases detection (n=1,473) achieved an AUC of 0.966 (c). For these latter datasets, adding an RNN did not significantly improve performance.
    Statistical significance was assessed using DeLong’s test for two correlated ROC curves.
    前⽴腺癌AUC 0.986 ⽪膚癌AUC 0.986 乳癌LN転移AUC 0.965
    WSI 1000枚以上で精度が安定
    癌のタイプによって異なるが概ね1万
    枚のWSIが必要
    データセットのサイズと分類精度
    ARTICLES
    NATURE MEDICINE
    MSK in-house
    test set
    scanned on Aperio
    (n = 1,784)
    MSK in-house
    test set
    scanned on Philips
    (n = 1,274)
    MSK external
    test set
    scanned on Aperio
    (n = 12,727)
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Multiple instance learning
    (trained on MSK dataset)
    MSK
    test set
    (n = 1,473 )
    CAMELYON16
    test set
    (n = 129)
    CAMELYON16
    test set
    (n = 129)
    MSK
    test set
    (n = 1,473 )
    0.5
    0.6
    0.7
    0.8
    0.9
    1.0
    Fully supervised learning
    (trained on CAMELYON16 dataset)
    AUC
    AUC
    –5.84%
    –2.65%
    –7.15%
    –20.2%
    a b
    Fig. 5 | Weak supervision on large datasets leads to higher generalization performance than fully supervised learning on small curated datasets. The
    generalization performance of the proposed prostate and breast models were evaluated on different external test sets. a, Results of the prostate model
    trained with MIL on MSK in-house slides and tested on: (1) the in-house test set (n=1,784) digitized on Leica Aperio AT2 scanners; (2) the in-house test
    コンサル症例でもそんなに精度変わらない
    アノテーションデータから作ったモデルでもデータ
    が少ないと汎⽤性低い
    True positives
    False negatives
    False positives
    200 µm
    Prostate BCC Axillary lymph nodes
    False
    negative
    False
    positive
    False
    negative
    False
    positive
    False
    negative
    False
    positive
    Benign/negative 3 56 3 2 17 1
    Atypical/other/suspicious 3 16 1 11 4 31
    Carcinoma/positive 6 0 12 4 23 2
    True error rate 6/345 72/1,439 12/255 13/1,320 23/403 32/1,070
    d
    200 µm
    200 µm 200 µm
    200 µm
    200 µm
    200 µm
    200 µm
    200 µm
    Fig. 4 | Pathology analysis of the misclassification errors on the test sets. a–c, Randomly selected examples of classification results on the
    set. Examples of true positive, false negative and false positive classifications are shown for each tumor type. The MIL-RNN model trained a
    magnification was run with a step size of 20 pixels across a region of interest, generating a tumor probability heat map. On every slide, the b
    represents the enlarged area. For the prostate dataset (a), the true positive represents a difficult diagnosis due to tumor found next to atrop
    inflammation; the false negative shows a very low tumor volume; and for the false positive the model identified atypical small acinar prolifer
    a small focus of glands with atypical epithelial cells. For the BCC dataset (b), the true positive has a low tumor volume; the false negative ha
    volume; and for the false positive the tongue of the epithelium abutting from the base of the epidermis shows an architecture similar to BCC
    axillary lymph nodes dataset (c), the true positive shows ITCs with a neoadjuvant chemotherapy treatment effect; the false negative shows
    focus cluster of ITCs missed due to the very low tumor volume and blurring; and the false positive shows displaced epithelium/benign papi
    in a lymph node. d, Subspecialty pathologists analyzed the slides that were misclassified by the MIL-RNN models. While slides can either b
    negative for a specific tumor, sometimes it is not possible to diagnose a single slide with certainty based on morphology alone. These cases
    テストセットにおけるmisclassificationの内訳
    True positives
    False negatives
    False positives
    200 µm
    Prostate BCC Axillary lymph
    False
    negative
    False
    positive
    False
    negative
    False
    positive
    False
    negative
    F
    po
    Benign/negative 3 56 3 2 17
    Atypical/other/suspicious 3 16 1 11 4
    Carcinoma/positive 6 0 12 4 23
    True error rate 6/345 72/1,439 12/255 13/1,320 23/403 32/
    d
    200 µm
    200 µm 200 µm
    200 µm
    200 µm
    200 µm
    Fig. 4 | Pathology analysis of the misclassification errors on the test sets. a–c, Randomly selected examples of classification re
    set. Examples of true positive, false negative and false positive classifications are shown for each tumor type. The MIL-RNN mod
    magnification was run with a step size of 20 pixels across a region of interest, generating a tumor probability heat map. On every
    represents the enlarged area. For the prostate dataset (a), the true positive represents a difficult diagnosis due to tumor found n
    inflammation; the false negative shows a very low tumor volume; and for the false positive the model identified atypical small ac
    a small focus of glands with atypical epithelial cells. For the BCC dataset (b), the true positive has a low tumor volume; the false
    volume; and for the false positive the tongue of the epithelium abutting from the base of the epidermis shows an architecture sim
    axillary lymph nodes dataset (c), the true positive shows ITCs with a neoadjuvant chemotherapy treatment effect; the false nega
    focus cluster of ITCs missed due to the very low tumor volume and blurring; and the false positive shows displaced epithelium/b
    in a lymph node. d, Subspecialty pathologists analyzed the slides that were misclassified by the MIL-RNN models. While slides c
    negative for a specific tumor, sometimes it is not possible to diagnose a single slide with certainty based on morphology alone. T
    False negativeとなるのは微⼩な癌が多い
    atypical/suspiciousは形態学のみでは決められないもの
    乳癌では術前化学療法を受けていた症例も含まれていた
    (実臨床では許容範囲だろう)
    倍率の違い;前⽴腺癌ではx20で,⽪膚ではx5が精度が⾼かった.
    前⽴腺では20はFNが少なく,x5はFPが少なかった.総合的には単⼀倍率よりもensembleが精度が良かった

    View Slide

  6. in Fig. 6 (see Extended Data Fig. 6 for BCC and breast metastases)
    that our prostate model would allow the removal of more than 75%
    of the slides from the workload of a pathologist without any loss in
    sensitivity at the patient level. For pathologists who must operate in
    the increasingly complex, detailed and data-driven environment of
    cancer diagnostics, tools such as this will allow non-subspecialized
    pathologists to confidently and efficiently classify cancer with 100%
    sensitivity.
    Online content
    Any methods, additional references, Nature Research reporting
    summaries, source data, statements of code and data availability and
    associated accession codes are available at https://doi.org/10.1038/
    s41591-019-0508-1.
    Received: 23 October 2018; Accepted: 3 June 2019;
    Published: xx xx xxxx
    References
    1. Ball, C. S. The early history of the compound microscope. Bios 37,
    51–60 (1966).
    2. Hajdu, S. I. Microscopic contributions of pioneer pathologists. Ann. Clin. Lab.
    Sci. 41, 201–206 (2011).
    3. Fuchs, T. J., Wild, P. J., Moch, H. & Buhmann, J. M. Computational pathology
    analysis of tissue microarrays predicts survival of renal clear cell carcinoma
    13. Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images.
    Preprint at https://arxiv.org/abs/1703.02442 (2017).
    14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee, J. & Sheet, D. Classifying
    histopathology whole-slides using fusion of decisions from deep
    convolutional network on a collection of random multi-views at multi-
    magnification. In 2017 IEEE 14th International Symposium on Biomedical
    Imaging 1024–1027 (IEEE, 2017).
    15. Valkonen, M. et al. Metastasis detection from whole slide images using local
    features and random forests. Cytom. Part A 91, 555–565 (2017).
    16. Bejnordi, B. E. et al. Using deep convolutional neural networks to
    identify and classify tumor-associated stroma in diagnostic breast biopsies.
    Mod. Pathol. 31, 1502–1512 (2018).
    17. Mobadersany, P. et al. Predicting cancer outcomes from histology and
    genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115,
    E2970–E2979 (2018).
    18. Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A. H. Deep learning
    for identifying metastatic breast cancer. Preprint at https://arxiv.org/
    abs/1606.05718 (2016).
    19. Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image
    analysis: a comprehensive tutorial with selected use cases. J. Pathol. Inform. 7,
    29 (2016).
    20. Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency
    of histopathological diagnosis. Sci. Rep. 6, 26286 (2016).
    21. Coudray, N. et al. Classification and mutation prediction from non-small cell
    lung cancer histopathology images using deep learning. Nat. Med. 24,
    1559–1567 (2018).
    22. Olsen, T. et al. Diagnostic performance of deep learning algorithms applied to
    three common diagnoses in dermatopathology. J. Pathol. Inform. 9, 32 (2018).
    23. Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning
    Predicted
    positive
    Predicted
    negative
    0
    0.25
    0.50
    0.75
    1.00
    0
    0.25
    0.50
    0.75
    1.00
    0 25 50 75 100
    % slides reviewed
    Sensitivity
    Probability
    Tumor
    probability
    Cases
    a b
    Fig. 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each case, based on their tumor
    probability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s prediction would allow
    pathologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case level (n=1,784).
    各症例には,癌があるスライド
    とないスライドが混ざっている
    癌があるかな〜と思いながら
    順に⾒ていくしかない
    ig. 6 (see Extended Data Fig. 6 for BCC and breast metastases)
    our prostate model would allow the removal of more than 75%
    he slides from the workload of a pathologist without any loss in
    13. Liu, Y. et al. Detecting cancer metastases on gig
    Preprint at https://arxiv.org/abs/1703.02442 (20
    14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee
    histopathology whole-slides using fusion of dec
    Predicted
    positive
    Predi
    nega
    0
    0.25
    0.50
    0
    0.25
    0.50
    0.75
    1.00
    0 25 50
    % slides reviewed
    Sensitivi
    Probability
    s
    6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each c
    ability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s pr
    ologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case l
    こちらはスルーできる
    ポジティブス
    ライドに集中
    ARTICLES
    NATURE MEDICINE
    ⽪膚がん (n=1575)
    threshold 0.025
    乳癌LN転移 (n=1473)
    65% 65%
    threshold 0.25
    診断で重要なのは,患者レベルで癌の⾒落と
    しがないこと
    -> 感度100%が必要
    感度100%にあげても,気合を⼊れてチェック
    するスライドは3割程度.仕事量の65〜75%が
    エネルギーダウンできる
    in Fig. 6 (see Extended Data Fig. 6 for BCC and breast metastases)
    that our prostate model would allow the removal of more than 75%
    of the slides from the workload of a pathologist without any loss in
    sensitivity at the patient level. For pathologists who must operate in
    the increasingly complex, detailed and data-driven environment of
    cancer diagnostics, tools such as this will allow non-subspecialized
    pathologists to confidently and efficiently classify cancer with 100%
    sensitivity.
    Online content
    13. Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images.
    Preprint at https://arxiv.org/abs/1703.02442 (2017).
    14. Das, K., Karri, S. P. K., Guha Roy, A, Chatterjee, J. & Sheet, D. Classifying
    histopathology whole-slides using fusion of decisions from deep
    convolutional network on a collection of random multi-views at multi-
    magnification. In 2017 IEEE 14th International Symposium on Biomedical
    Imaging 1024–1027 (IEEE, 2017).
    15. Valkonen, M. et al. Metastasis detection from whole slide images using local
    features and random forests. Cytom. Part A 91, 555–565 (2017).
    16. Bejnordi, B. E. et al. Using deep convolutional neural networks to
    identify and classify tumor-associated stroma in diagnostic breast biopsies.
    Mod. Pathol. 31, 1502–1512 (2018).
    Predicted
    positive
    Predicted
    negative
    0
    0.25
    0.50
    0.75
    1.00
    0
    0.25
    0.50
    0.75
    1.00
    0 25 50 75 100
    % slides reviewed
    Sensitivity
    Probability
    Tumor
    probability
    Cases
    a b
    Fig. 6 | Impact of the proposed decision support system on clinical practice. a, By ordering the cases, and slides within each case, based on their tumor
    probability, pathologists can focus their attention on slides that are probably positive for cancer. b, Following the algorithm’s prediction would allow
    pathologists to potentially ignore more than 75% of the slides while retaining 100% sensitivity for prostate cancer at the case level (n=1,784).
    前⽴腺がん針⽣検 (n=1784)
    75%の症例が癌陰性例
    感度1
    positive prediction threshold 0.5
    probabilityの順にソートしていき感度を計算

    View Slide

  7. まとめ
    § 余計な労⼒を追加せず,既存のデータを活⽤し,癌の検出に実臨床レベルの性能を発揮
    § この研究のすごいところ:とにかくデータ量が多い(単⼀の癌で1万枚のWSI)
    § ⽣データが持つ⽣物学的多様性・テクニカルなアーチファクトまでの学習してしまうほど
    § “Clinical-grade”の⽬標とするところ
    ⼈間のパフォーマンスと競うものではない(診断時間,検出精度etc)
    なぜなら専⾨病理医のチームは100%の感度と特異度を持つ知識の集合体(超えられない壁)
    ⼈間は形態学+臨床像・分⼦⽣物学的情報など他のモダリティと総合的に判断して決断する
    ⽬指すところは,許容できる範囲の偽陽性率で(⼈間が後でチェック)がんの⾒落としがないこと
    § 病理医はあらゆる臓器の疾患に対処しなければならない,多種多様・複雑・詳細・膨⼤な知識量が必
    要だが,専⾨分野以外の疾患にも⾒落としなく安⼼して診断できる環境を提供してくれる(期待)

    View Slide