Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Никитин.pdf

opentalks2
February 04, 2021

 Никитин.pdf

opentalks2

February 04, 2021
Tweet

More Decks by opentalks2

Other Decks in Business

Transcript

  1. About us Celsus is analyzing medical images in order to

    1) automate part of the radiology workflow to support faster decisions 2) reduce the risk of the errors Our team • Backend/frontend • Sales & Marketing • Data Team • ML team (11 ML engineers)
  2. Example - Mammography datasets © 2020, ООО “Медицинские скрининг системы”

    4 Dataset 1 Classes: • Mass (malignant & benign) • Calcifications (malignant & benign) 1878 patients from 1 medical organization 1 annotator per image, non-verified Dataset 2 Classes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening 200 patients from 1 medical organization 3 annotators per image, verified Dataset 3 Patient-level annotations (benign/malignant) 5720 patients from ~30 medical organizations Unknown annotators
  3. Example - Mammography datasets © 2020, ООО “Медицинские скрининг системы”

    5 Dataset 1 Classes: • Mass (malignant & benign) • Calcifications (malignant & benign) 1878 patients from 1 medical organization 1 annotator per image, non-verified Dataset 2 Classes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening 200 patients from 1 medical organization 3 annotators per image, verified Dataset 3 Patient-level annotations (benign/malignant) 5720 patients from ~30 medical organizations Unknown annotators
  4. Example - Mammography datasets © 2020, ООО “Медицинские скрининг системы”

    6 Dataset 1 Classes: • Mass (malignant & benign) • Calcifications (malignant & benign) 1878 patients from 1 medical organization 1 annotator per image, non-verified Dataset 2 Classes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening 200 patients from 1 medical organization 3 annotators per image, verified Dataset 3 Patient-level annotations (benign/malignant) 5720 patients from ~30 medical organizations Unknown annotators
  5. Example - Mammography datasets © 2020, ООО “Медицинские скрининг системы”

    7 Dataset 1 Classes: • Mass (malignant & benign) • Calcifications (malignant & benign) 1878 patients from 1 medical organization 1 annotator per image, non-verified Dataset 2 Classes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening 200 patients from 1 medical organization 3 annotators per image, verified Dataset 3 Patient-level annotations (benign/malignant) 5720 patients from ~30 medical organizations Unknown annotators
  6. Terminology © 2020, ООО “Медицинские скрининг системы” • X -

    input image or a series of images • y - target variable ◦ classification - normal/pathological ◦ sorting - risk score ◦ detection - bounding boxes ◦ segmentation - per-pixel masks
  7. Problem sources 1) X a) Different scanners b) Different image

    acquisition protocols c) Human errors 2) Noisy y a) Inter-observer annotation variability b) Intra-observer annotation variability c) Unreliable data sources 3) Different label sets of y a) Different classes in the different datasets
  8. Problem sources #2 4) Weak y a) Image-, volume-, patient-level

    annotations are cheaper and easier to acquire 5) Scarce y a) Annotation of medical images is expensive b) Some conditions are naturally rare
  9. Types of solutions • Prevention • Data-centric solutions • Architectural

    modifications • Training process modifications • ...
  10. Xs

  11. Our solution Universal image preprocessor • Artifact removal • Contrast/brightness

    normalization • Correct color inversion & flip • Crop by the region of the interest • Unifying digitized film scans and digital scans • ….
  12. Ambiguous “background” category © 2020, ООО “Медицинские скрининг системы” 22

    Dataset 1 Bounding boxes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening Dataset 2 Bounding boxes: • Malignant mass • Benign mass We will penalize network for proposing non-annotated regions in Dataset 2 that contain objects!
  13. Reducing annotator variability • Prevention ◦ Creating and updating labeling

    guides ◦ Exams for annotators • Post factum ◦ Automatic resolution of minor conflicts ◦ Additional expert opinion on major conflicts
  14. Dealing with noisy annotations during training • Dropping or re-weighting

    noisy examples ◦ Usually doesn’t work well in the medical domain • Label smoothing ◦ Soft labels ◦ Smoothing for mask boundaries • Online noise correction ◦ Bounding box correction ◦ Mask refinement
  15. Training meta-models 1. Training detector/segmentator with existing small annotated datasets

    2. Generating features for large weakly labeled dataset 3. Training simple meta-model to predict final target
  16. Gold standard datasets It is absolutely necessary to create and

    update gold standard datasets: • Consensus labeling by at least 3 annotators • Preferably clinically verified • It must include all conditions that your system is supposed to detect But remember - every time you test your experiment on this dataset - you are one step closer to overfitting
  17. Main takeaways • You need to stay strong if you

    want to work with the medical data
  18. Main takeaways • You need to stay strong if you

    want to work with the medical data • It is both challenging and rewarding
  19. Main takeaways • You need to stay strong if you

    want to work with the medical data • It is both challenging and rewarding • Read literature, but stay creative - every problem is unique
  20. Main takeaways • You need to stay strong if you

    want to work with the medical data • It is both challenging and rewarding • Read literature, but stay creative - every problem is unique • Invest time and money into creating gold standard dataset for pre-release testing of all your crazy ideas