4th Place Solution for SpaceNet6 Challenge

Slide 1

Slide 1 text

4th Place Solution for SpaceNet6* Challenge: Multi-Sensor All Weather Mapping Motoki Kimura (handle: motokimura) * Held as a part of CVPR’20 EarthVision workshop 1

Slide 2

Slide 2 text

1. SpaceNet6 Challenge 2

Slide 3

Slide 3 text

SpaceNet6 Chellenge ■ Extract building footprints using two modalities of remote sensing data: synthetic aperture radar (SAR) and electro-optical imagery ■ AOI: Rotterdam, the Netherlands (~120 km^2) Building footprint annotations overlaid on electro-optical imagery (left) and SAR imagery (right) (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”) 3

Slide 4

Slide 4 text

SpaceNet6 Dataset: Capella SAR + Maxar Optical 4 The formation of SpaceNet6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Announcing the Winners”)

Slide 5

Slide 5 text

SpaceNet6 Dataset: Capella SAR Imagery Pros of SAR ■ Any illumination setting (day or night) ■ Cloud Penetrating Cons of SAR ■ Various types of scattering ■ Complex geometric distortions e.g., layover See SAR-101 blog bost if interested in SAR! 5 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

Slide 6

Slide 6 text

SpaceNet6 Dataset: Capella SAR Imagery Spec of Capella SAR ■ Captured by aerial platform ■ 4 channels (quad polarization) ■ Spatial resolution ~0.5m ■ Off-nadir look angle ~35° 6 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

Slide 7

Slide 7 text

SpaceNet6 Dataset: Maxar Optical Imagery Spec of Maxar Optical ■ Captured by WorldView-2 satellite ■ 4 channels (RGB + NIR) ■ Spatial resolution ~0.5m ■ Off-nadir look angle ~17° Only available in training set 7 Visible spectrum imagery (R, G, B) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

Slide 8

Slide 8 text

SpaceNet6 Dataset: Annotations Spec ■ Modified 3DBAG dataset ■ ~48,000 building footprints in Rotterdam AOI ■ Footprint of each building is represented as a polygon Only available in training set 8 Building footprint annotations overlaid on electro-optical imagery (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

Slide 9

Slide 9 text

Evaluation Metric ■ If IoU of proposed polygon and ground truth polygon > 0.5, it is counted as a TP (otherwise counted as a FP) ■ Ground truth with no matched proposal is counted as a FN ■ Compute F1 score (= evaluation metric) GT polygon Proposed polygon GT polygon Proposed polygon IoU = 9

Slide 10

Slide 10 text

2. motokimura’s Solution 10

Slide 11

Slide 11 text

Pipeline of motokimura’s solution U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP © SpaceNet LLC 11

Slide 12

Slide 12 text

Slide 13

Slide 13 text

− = ■ U-Net models output segmentation score for 2 classes: building body and building edge ■ Learning building edge helps the networks to learn accurate shape of the buildings and to separate neighboring ones U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP × 5 folds Building body Building edge 13 © SpaceNet LLC

Slide 14

Slide 14 text

U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP ■ Apply watershed algorithm to the score map in order to extract building footprints as polygons ■ As watershed algorithm sees the score map as topographic surface and allocate a region to each of local maxima, it helps to separate the buildings which are close to each other Expand “seed” regions with lower threshold Building score Building region Seed threshold 1-D slice on image plane 14 © SpaceNet LLC

Slide 15

Slide 15 text

U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP - area - shape of the smallest external rectangle - major/minor axis length - mean/std value of SAR intensity - mean/std value of predicted building score - neighbor candidate counts in some distance ranges, etc. LightGBM Morphological features Whether input polygon is TP or FP Building polygon ■ Remove false positives using LightGBM models which were trained on footprint morphological features ■ LightGBM models were trained on the predicted footprints on the validation set of each fold 15 © SpaceNet LLC

Slide 16

Slide 16 text

Ablation study Method F-score at Public LB (%) F-score at Private LB (%) Baseline (EfficientNet-b7 × 5) 39.29 - + Watershed 42.94 - + Ensemble (EfficientNet-b7 ×10 + EfficientNet-b8 × 5) 44.38 - + LightGBM 44.80 39.61 ■ U-Net with EefficientNet-B7 encoder (on 5-folds) achieves the comparable score to top-15 in public LB ■ Applying watershed algorithm greatly improves the F-score (+3.65) compared to a simpler alternative used in the baseline: binarize the score map with a threshold and then extract isolated contours as polygons ■ Ensembling U-Net models with EefficientNet-B7/B8 encoders gives a moderate improvement (+1.44) ■ Post-processing with LightGBM models shows only a marginal improvement (+0.42) 16

Slide 17

Slide 17 text

Trick-1: EfficientNet encoder ■ U-Net with EfficientNet encoders achieved the best performance while having less parameters ■ I ensembled U-Net with EfficientNet-B7/B8 encoders which achieved the best segmentation score ■ All encoders were pre-trained on ImageNet: this makes the convergence much faster and improves the accuracy 17

Slide 18

Slide 18 text

Trick-2: loss function and optimizer Loss = Lbce + Ldice ■ Lbce : binary cross entropy loss ■ Ldice : dice loss (= 1 - dice) ■ As dice coefficient evaluates spatial overlap (like IoU metric) for each class, it works well on class-imbalanced data ■ Combining dice loss with binary cross entropy made the convergence faster and improved the accuracy Optimizer: Adam ■ Adam worked better than other optimizers 18

Slide 19

Slide 19 text

Trick-3: dataset fold ■ It was crucial to separate folds by spatial location of the images to avoid leakage because most of the images are spatially overlapped ■ I split the dataset into 5 folds by longitude which was extracted from GeoTIFF metadata Spatial distribution of training tiles overlaid on OpenStreetMaps (only 786 tiles randomly sampled from 3401 are shown here) 19 Satellite images are from SpaceNet6 dataset © SpaceNet Partners Vector map is from OpenStreetMaps © OpenStreetMap contributors

Slide 20

Slide 20 text

Trick-4: aligning image orientation ■ Because of layover, tall objects appear laid out to the direction from which the image was captured ■ Direction: either of north (upward in image) or south (downward in image) ■ I selectively rotated SAR images before input to the networks so that the layover direction kept the same in every image 20 Example of layover (from CosmiQ Works blog post titled “SAR 101: An Introduction to Synthetic Aperture Radar”)

Slide 21

Slide 21 text

What did not work ■ Network architectures other than U-Net: FPN, PSPNet, PAN, and DeepLab-v3 ■ Focal loss ■ Data augmentation ■ Random flipping/rotation ■ Random brightness ■ Artificial speckle noise 21

Slide 22

Slide 22 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 22 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 23

Slide 23 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC 23

Slide 24

Slide 24 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 24 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 25

Slide 25 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 25 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 26

Slide 26 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 26 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 27

Slide 27 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 27 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 28

Slide 28 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 28 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 29

Slide 29 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 29 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 30

Slide 30 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 30 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 31

Slide 31 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 31 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 32

Slide 32 text

Results Input SAR image Predicted polygons overlaid on SAR image White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 32 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

Slide 33

Slide 33 text

3. Winners’ Solution 33

Slide 34

Slide 34 text

Final ranking Top-8 among 95 active teams in leader board 34

Slide 35

Slide 35 text

Approaches by top-5 teams 35 Approaches by top-5 teams (prize winners) (from CosmiQ Works blog post titled “SpaceNet 6: Announcing the Winners”)

Slide 36

Slide 36 text

1st place solution Architecture ■ U-Net + EfficientNet-b5 (×8-fold) ■ Input SAR strip ID and Y-coordinate to U-Net decoder Loss design ■ Focal loss + dice loss ■ Weight loss per image based on the number of buildings inside the image Pre-processing ■ Cut out black part of images for faster training and stable BN stats Augmentation ■ Random LR flip Test time augmentation ■ LR flip ■ Resize (1×, 0.8×, and 1.5×) See YouTube video by zbigniewwojna for details 36

Slide 37

Slide 37 text

4. Experiment Management & Implementation 37

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Experiment management Managing experiments is quite important in SpaceNet challenges because: ■ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ■ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt ./run_training.py --config config_specific.yaml --exp_id 100 39

Slide 40

Slide 40 text

Experiment management Managing experiments is quite important in SpaceNet challenges because: ■ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ■ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt inference_results/ ./run_training.py --config config_specific.yaml --exp_id 100 ./run_inference.py --exp_id 100 40

Slide 41

Slide 41 text

Implementation Large part of the code is borrowed from OSS below: Network modeling ■ qubvel/segmentation_models.pytorch Processing imageries (in GeoTIFF) and annotations (GeoJSON) ■ CosmiQ/solaris Config management ■ rbgirshick/yacs 41

Slide 42

Slide 42 text

Implementation Computational resources allowed in final testing phase: ■ training: 48 hours with p3.8xlarge (V100×4) ■ inference: 3 hours with p3.8xlarge (V100×4) 42

Slide 43

Slide 43 text

Implementation Computational resources allowed in final testing phase: ■ training: 48 hours with p3.8xlarge (V100×4) ■ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ■ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error 43

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Thank you! Motoki Kimura Computer Vision Research Engineer at Mobility Technologies Co., Ltd. Interests: Object Recognition, Autonomous Driving, and Remote Sensing Imagery Follow me - LinkedIn: https://www.linkedin.com/in/motokimura - GitHub: https://github.com/motokimura - Twitter: https://twitter.com/motokimura1 45