4th Place Solution for SpaceNet6 Challenge

4th Place Solution for SpaceNet6* Challenge: Multi-Sensor All Weather Mapping
Motoki Kimura (handle: motokimura) * Held as a part of CVPR’20 EarthVision workshop 1

1. SpaceNet6 Challenge 2

SpaceNet6 Chellenge ▪ Extract building footprints using two modalities of
remote sensing data: synthetic aperture radar (SAR) and electro-optical imagery ▪ AOI: Rotterdam, the Netherlands (~120 km^2) Building footprint annotations overlaid on electro-optical imagery (left) and SAR imagery (right) (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”) 3

SpaceNet6 Dataset: Capella SAR + Maxar Optical 4 The formation
of SpaceNet6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Announcing the Winners”)

SpaceNet6 Dataset: Capella SAR Imagery Pros of SAR ▪ Any
illumination setting (day or night) ▪ Cloud Penetrating Cons of SAR ▪ Various types of scattering ▪ Complex geometric distortions e.g., layover See SAR-101 blog bost if interested in SAR! 5 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

SpaceNet6 Dataset: Capella SAR Imagery Spec of Capella SAR ▪
Captured by aerial platform ▪ 4 channels (quad polarization) ▪ Spatial resolution ~0.5m ▪ Off-nadir look angle ~35° 6 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

SpaceNet6 Dataset: Maxar Optical Imagery Spec of Maxar Optical ▪
Captured by WorldView-2 satellite ▪ 4 channels (RGB + NIR) ▪ Spatial resolution ~0.5m ▪ Off-nadir look angle ~17° Only available in training set 7 Visible spectrum imagery (R, G, B) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

SpaceNet6 Dataset: Annotations Spec ▪ Modified 3DBAG dataset ▪ ~48,000
building footprints in Rotterdam AOI ▪ Footprint of each building is represented as a polygon Only available in training set 8 Building footprint annotations overlaid on electro-optical imagery (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)

Evaluation Metric ▪ If IoU of proposed polygon and ground
truth polygon > 0.5, it is counted as a TP (otherwise counted as a FP) ▪ Ground truth with no matched proposal is counted as a FN ▪ Compute F1 score (= evaluation metric) GT polygon Proposed polygon GT polygon Proposed polygon IoU = 9

2. motokimura’s Solution 10

Pipeline of motokimura’s solution U-Net w/ EfficientNet-b7 enc. (pre-trained on
ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP © SpaceNet LLC 11

U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images)
U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP © SpaceNet LLC 12

− = ▪ U-Net models output segmentation score for 2
classes: building body and building edge ▪ Learning building edge helps the networks to learn accurate shape of the buildings and to separate neighboring ones U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP × 5 folds Building body Building edge 13 © SpaceNet LLC

U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP ▪ Apply watershed algorithm to the score map in order to extract building footprints as polygons ▪ As watershed algorithm sees the score map as topographic surface and allocate a region to each of local maxima, it helps to separate the buildings which are close to each other Expand “seed” regions with lower threshold Building score Building region Seed threshold 1-D slice on image plane 14 © SpaceNet LLC

U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP - area - shape of the smallest external rectangle - major/minor axis length - mean/std value of SAR intensity - mean/std value of predicted building score - neighbor candidate counts in some distance ranges, etc. LightGBM Morphological features Whether input polygon is TP or FP Building polygon ▪ Remove false positives using LightGBM models which were trained on footprint morphological features ▪ LightGBM models were trained on the predicted footprints on the validation set of each fold 15 © SpaceNet LLC

Ablation study Method F-score at Public LB (%) F-score at
Private LB (%) Baseline (EfficientNet-b7 × 5) 39.29 - + Watershed 42.94 - + Ensemble (EfficientNet-b7 ×10 + EfficientNet-b8 × 5) 44.38 - + LightGBM 44.80 39.61 ▪ U-Net with EefficientNet-B7 encoder (on 5-folds) achieves the comparable score to top-15 in public LB ▪ Applying watershed algorithm greatly improves the F-score (+3.65) compared to a simpler alternative used in the baseline: binarize the score map with a threshold and then extract isolated contours as polygons ▪ Ensembling U-Net models with EefficientNet-B7/B8 encoders gives a moderate improvement (+1.44) ▪ Post-processing with LightGBM models shows only a marginal improvement (+0.42) 16

Trick-1: EfficientNet encoder ▪ U-Net with EfficientNet encoders achieved the
best performance while having less parameters ▪ I ensembled U-Net with EfficientNet-B7/B8 encoders which achieved the best segmentation score ▪ All encoders were pre-trained on ImageNet: this makes the convergence much faster and improves the accuracy 17

Trick-2: loss function and optimizer Loss = Lbce + Ldice
▪ Lbce : binary cross entropy loss ▪ Ldice : dice loss (= 1 - dice) ▪ As dice coefficient evaluates spatial overlap (like IoU metric) for each class, it works well on class-imbalanced data ▪ Combining dice loss with binary cross entropy made the convergence faster and improved the accuracy Optimizer: Adam ▪ Adam worked better than other optimizers 18

Trick-3: dataset fold ▪ It was crucial to separate folds
by spatial location of the images to avoid leakage because most of the images are spatially overlapped ▪ I split the dataset into 5 folds by longitude which was extracted from GeoTIFF metadata Spatial distribution of training tiles overlaid on OpenStreetMaps (only 786 tiles randomly sampled from 3401 are shown here) 19 Satellite images are from SpaceNet6 dataset © SpaceNet Partners Vector map is from OpenStreetMaps © OpenStreetMap contributors

Trick-4: aligning image orientation ▪ Because of layover, tall objects
appear laid out to the direction from which the image was captured ▪ Direction: either of north (upward in image) or south (downward in image) ▪ I selectively rotated SAR images before input to the networks so that the layover direction kept the same in every image 20 Example of layover (from CosmiQ Works blog post titled “SAR 101: An Introduction to Synthetic Aperture Radar”)

What did not work ▪ Network architectures other than U-Net:
FPN, PSPNet, PAN, and DeepLab-v3 ▪ Focal loss ▪ Data augmentation ▪ Random flipping/rotation ▪ Random brightness ▪ Artificial speckle noise 21

Results Input SAR image Predicted polygons overlaid on SAR image
White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 22 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC

White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC 23

3. Winners’ Solution 33

Final ranking Top-8 among 95 active teams in leader board
34

Approaches by top-5 teams 35 Approaches by top-5 teams (prize
winners) (from CosmiQ Works blog post titled “SpaceNet 6: Announcing the Winners”)

1st place solution Architecture ▪ U-Net + EfficientNet-b5 (×8-fold) ▪
Input SAR strip ID and Y-coordinate to U-Net decoder Loss design ▪ Focal loss + dice loss ▪ Weight loss per image based on the number of buildings inside the image Pre-processing ▪ Cut out black part of images for faster training and stable BN stats Augmentation ▪ Random LR flip Test time augmentation ▪ LR flip ▪ Resize (1×, 0.8×, and 1.5×) See YouTube video by zbigniewwojna for details 36

4. Experiment Management & Implementation 37

Experiment management Managing experiments is quite important in SpaceNet challenges
because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score 38

because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt ./run_training.py --config config_specific.yaml --exp_id 100 39

because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt inference_results/ ./run_training.py --config config_specific.yaml --exp_id 100 ./run_inference.py --exp_id 100 40

Implementation Large part of the code is borrowed from OSS
below: Network modeling ▪ qubvel/segmentation_models.pytorch Processing imageries (in GeoTIFF) and annotations (GeoJSON) ▪ CosmiQ/solaris Config management ▪ rbgirshick/yacs 41

Implementation Computational resources allowed in final testing phase: ▪ training:
48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) 42

48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ▪ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error 43

48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ▪ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error ▪ in final testing on p3.8xlarge, 4 models were trained in parallel (each was trained on one V100 card) 44

Thank you! Motoki Kimura Computer Vision Research Engineer at Mobility
Technologies Co., Ltd. Interests: Object Recognition, Autonomous Driving, and Remote Sensing Imagery Follow me - LinkedIn: https://www.linkedin.com/in/motokimura - GitHub: https://github.com/motokimura - Twitter: https://twitter.com/motokimura1 45

4th Place Solution for SpaceNet6 Challenge

4th Place Solution for SpaceNet6 Challenge

More Decks by Motoki Kimura

Other Decks in Research

Featured

Transcript