Никитин.pdf

Evgeniy Nikitin Training medical neural networks with the real-world data

About us Celsus is analyzing medical images in order to
1) automate part of the radiology workﬂow to support faster decisions 2) reduce the risk of the errors Our team • Backend/frontend • Sales & Marketing • Data Team • ML team (11 ML engineers)

Are medical datasets really that bad?

Example - Mammography datasets © 2020, ООО “Медицинские скрининг системы”
4 Dataset 1 Classes: • Mass (malignant & benign) • Calcifications (malignant & benign) 1878 patients from 1 medical organization 1 annotator per image, non-verified Dataset 2 Classes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening 200 patients from 1 medical organization 3 annotators per image, verified Dataset 3 Patient-level annotations (benign/malignant) 5720 patients from ~30 medical organizations Unknown annotators

Terminology © 2020, ООО “Медицинские скрининг системы” • X -
input image or a series of images • y - target variable ◦ classiﬁcation - normal/pathological ◦ sorting - risk score ◦ detection - bounding boxes ◦ segmentation - per-pixel masks

Problem sources 1) X a) Different scanners b) Different image
acquisition protocols c) Human errors 2) Noisy y a) Inter-observer annotation variability b) Intra-observer annotation variability c) Unreliable data sources 3) Different label sets of y a) Different classes in the different datasets

Problem sources #2 4) Weak y a) Image-, volume-, patient-level
annotations are cheaper and easier to acquire 5) Scarce y a) Annotation of medical images is expensive b) Some conditions are naturally rare

Types of solutions • Prevention • Data-centric solutions • Architectural
modiﬁcations • Training process modiﬁcations • ...

Variations in X

Our solution Universal image preprocessor • Artifact removal • Contrast/brightness
normalization • Correct color inversion & ﬂip • Crop by the region of the interest • Unifying digitized ﬁlm scans and digital scans • ….

Preprocessed images

Even better?

Out-of-Distribution in Production? • Classic CV methods • Unsupervised feature-based
methods • Image quality branch • Confidence branch

Training “image quality” branch

Conﬁdence branch https://arxiv.org/pdf/1802.04865.pdf

Different label spaces

Ambiguous “background” category © 2020, ООО “Медицинские скрининг системы” 22
Dataset 1 Bounding boxes: • Malignant mass • Benign mass • Malignant calcification • Benign calcification • Skin thickening Dataset 2 Bounding boxes: • Malignant mass • Benign mass We will penalize network for proposing non-annotated regions in Dataset 2 that contain objects!

Architectural modiﬁcations

Augmenting label spaces with pseudo labeling https://arxiv.org/pdf/2008.06614.pdf

Auxiliary classiﬁers for class decoupling Mass Malignant Benign p 1-p

Noisy labels

Reducing annotator variability • Prevention ◦ Creating and updating labeling
guides ◦ Exams for annotators • Post factum ◦ Automatic resolution of minor conﬂicts ◦ Additional expert opinion on major conﬂicts

git merge for annotations © 2020, ООО “Медицинские скрининг системы”
31

Dealing with noisy annotations during training • Dropping or re-weighting
noisy examples ◦ Usually doesn’t work well in the medical domain • Label smoothing ◦ Soft labels ◦ Smoothing for mask boundaries • Online noise correction ◦ Bounding box correction ◦ Mask reﬁnement

https://arxiv.org/pdf/2003.01285.pdf

Weak labels

Multiple Instance Learning (MIL) https://arxiv.org/pdf/1911.05650.pdf

MIL upgrades • Other pooling techniques (LSEPooling) • Attention mechanisms
• Aggregating bags of local instances

Training meta-models 1. Training detector/segmentator with existing small annotated datasets
2. Generating features for large weakly labeled dataset 3. Training simple meta-model to predict ﬁnal target

Scarce labels

Unsupervised pre-training https://arxiv.org/pdf/2002.05709.pdf

Unsupervised pre-training in medicine

Domain adaptation & Transfer Learning

How to avoid going crazy?

Gold standard datasets It is absolutely necessary to create and
update gold standard datasets: • Consensus labeling by at least 3 annotators • Preferably clinically veriﬁed • It must include all conditions that your system is supposed to detect But remember - every time you test your experiment on this dataset - you are one step closer to overﬁtting

Main takeaways • You need to stay strong if you
want to work with the medical data

want to work with the medical data • It is both challenging and rewarding

want to work with the medical data • It is both challenging and rewarding • Read literature, but stay creative - every problem is unique

want to work with the medical data • It is both challenging and rewarding • Read literature, but stay creative - every problem is unique • Invest time and money into creating gold standard dataset for pre-release testing of all your crazy ideas

Никитин.pdf

Никитин.pdf

More Decks by opentalks2

Other Decks in Business

Featured

Transcript