$30 off During Our Annual Pro Sale. View Details »

4th Place Solution for SpaceNet6 Challenge

4th Place Solution for SpaceNet6 Challenge

Presents 4th place solution for SpaceNet6 challenge, which was held as a part of CVPR2020 EarthVision workshop.

For details of SpaceNet6 challenge, see https://medium.com/the-downlinq/spacenet-6-announcing-the-winners-df817712b515

Motoki Kimura

July 17, 2020
Tweet

More Decks by Motoki Kimura

Other Decks in Research

Transcript

  1. 4th Place Solution for SpaceNet6* Challenge: Multi-Sensor All Weather Mapping

    Motoki Kimura (handle: motokimura) * Held as a part of CVPR’20 EarthVision workshop 1
  2. SpaceNet6 Chellenge ▪ Extract building footprints using two modalities of

    remote sensing data: synthetic aperture radar (SAR) and electro-optical imagery ▪ AOI: Rotterdam, the Netherlands (~120 km^2) Building footprint annotations overlaid on electro-optical imagery (left) and SAR imagery (right) (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”) 3
  3. SpaceNet6 Dataset: Capella SAR + Maxar Optical 4 The formation

    of SpaceNet6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Announcing the Winners”)
  4. SpaceNet6 Dataset: Capella SAR Imagery Pros of SAR ▪ Any

    illumination setting (day or night) ▪ Cloud Penetrating Cons of SAR ▪ Various types of scattering ▪ Complex geometric distortions e.g., layover See SAR-101 blog bost if interested in SAR! 5 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
  5. SpaceNet6 Dataset: Capella SAR Imagery Spec of Capella SAR ▪

    Captured by aerial platform ▪ 4 channels (quad polarization) ▪ Spatial resolution ~0.5m ▪ Off-nadir look angle ~35° 6 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
  6. SpaceNet6 Dataset: Maxar Optical Imagery Spec of Maxar Optical ▪

    Captured by WorldView-2 satellite ▪ 4 channels (RGB + NIR) ▪ Spatial resolution ~0.5m ▪ Off-nadir look angle ~17° Only available in training set 7 Visible spectrum imagery (R, G, B) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
  7. SpaceNet6 Dataset: Annotations Spec ▪ Modified 3DBAG dataset ▪ ~48,000

    building footprints in Rotterdam AOI ▪ Footprint of each building is represented as a polygon Only available in training set 8 Building footprint annotations overlaid on electro-optical imagery (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
  8. Evaluation Metric ▪ If IoU of proposed polygon and ground

    truth polygon > 0.5, it is counted as a TP (otherwise counted as a FP) ▪ Ground truth with no matched proposal is counted as a FN ▪ Compute F1 score (= evaluation metric) GT polygon Proposed polygon GT polygon Proposed polygon IoU = 9
  9. Pipeline of motokimura’s solution U-Net w/ EfficientNet-b7 enc. (pre-trained on

    ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP © SpaceNet LLC 11
  10. U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images)

    U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP © SpaceNet LLC 12
  11. − = ▪ U-Net models output segmentation score for 2

    classes: building body and building edge ▪ Learning building edge helps the networks to learn accurate shape of the buildings and to separate neighboring ones U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images) U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP × 5 folds Building body Building edge 13 © SpaceNet LLC
  12. U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images)

    U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP ▪ Apply watershed algorithm to the score map in order to extract building footprints as polygons ▪ As watershed algorithm sees the score map as topographic surface and allocate a region to each of local maxima, it helps to separate the buildings which are close to each other Expand “seed” regions with lower threshold Building score Building region Seed threshold 1-D slice on image plane 14 © SpaceNet LLC
  13. U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet + optical images)

    U-Net w/ EfficientNet-b7 enc. (pre-trained on ImageNet w/ Noisy Student setting) U-Net w/ EfficientNet-b8 enc. (pre-trained on ImageNet w/ AdvProp setting) Average ensemble Watershed LightGBM × 5 folds × 5 folds × 5 folds × 5 folds SAR image Building score map Building polygons Building polygons classified as TP - area - shape of the smallest external rectangle - major/minor axis length - mean/std value of SAR intensity - mean/std value of predicted building score - neighbor candidate counts in some distance ranges, etc. LightGBM Morphological features Whether input polygon is TP or FP Building polygon ▪ Remove false positives using LightGBM models which were trained on footprint morphological features ▪ LightGBM models were trained on the predicted footprints on the validation set of each fold 15 © SpaceNet LLC
  14. Ablation study Method F-score at Public LB (%) F-score at

    Private LB (%) Baseline (EfficientNet-b7 × 5) 39.29 - + Watershed 42.94 - + Ensemble (EfficientNet-b7 ×10 + EfficientNet-b8 × 5) 44.38 - + LightGBM 44.80 39.61 ▪ U-Net with EefficientNet-B7 encoder (on 5-folds) achieves the comparable score to top-15 in public LB ▪ Applying watershed algorithm greatly improves the F-score (+3.65) compared to a simpler alternative used in the baseline: binarize the score map with a threshold and then extract isolated contours as polygons ▪ Ensembling U-Net models with EefficientNet-B7/B8 encoders gives a moderate improvement (+1.44) ▪ Post-processing with LightGBM models shows only a marginal improvement (+0.42) 16
  15. Trick-1: EfficientNet encoder ▪ U-Net with EfficientNet encoders achieved the

    best performance while having less parameters ▪ I ensembled U-Net with EfficientNet-B7/B8 encoders which achieved the best segmentation score ▪ All encoders were pre-trained on ImageNet: this makes the convergence much faster and improves the accuracy 17
  16. Trick-2: loss function and optimizer Loss = Lbce + Ldice

    ▪ Lbce : binary cross entropy loss ▪ Ldice : dice loss (= 1 - dice) ▪ As dice coefficient evaluates spatial overlap (like IoU metric) for each class, it works well on class-imbalanced data ▪ Combining dice loss with binary cross entropy made the convergence faster and improved the accuracy Optimizer: Adam ▪ Adam worked better than other optimizers 18
  17. Trick-3: dataset fold ▪ It was crucial to separate folds

    by spatial location of the images to avoid leakage because most of the images are spatially overlapped ▪ I split the dataset into 5 folds by longitude which was extracted from GeoTIFF metadata Spatial distribution of training tiles overlaid on OpenStreetMaps (only 786 tiles randomly sampled from 3401 are shown here) 19 Satellite images are from SpaceNet6 dataset © SpaceNet Partners Vector map is from OpenStreetMaps © OpenStreetMap contributors
  18. Trick-4: aligning image orientation ▪ Because of layover, tall objects

    appear laid out to the direction from which the image was captured ▪ Direction: either of north (upward in image) or south (downward in image) ▪ I selectively rotated SAR images before input to the networks so that the layover direction kept the same in every image 20 Example of layover (from CosmiQ Works blog post titled “SAR 101: An Introduction to Synthetic Aperture Radar”)
  19. What did not work ▪ Network architectures other than U-Net:

    FPN, PSPNet, PAN, and DeepLab-v3 ▪ Focal loss ▪ Data augmentation ▪ Random flipping/rotation ▪ Random brightness ▪ Artificial speckle noise 21
  20. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 22 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  21. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC 23
  22. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 24 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  23. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 25 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  24. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 26 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  25. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 27 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  26. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 28 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  27. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 29 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  28. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 30 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  29. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 31 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  30. Results Input SAR image Predicted polygons overlaid on SAR image

    White: True Positives, Yellow: False Positives Ground-truth polygons overlaid on optical image Blue: False Negatives 32 SAR and optical images are from SpaceNet6 dataset © SpaceNet LLC
  31. Approaches by top-5 teams 35 Approaches by top-5 teams (prize

    winners) (from CosmiQ Works blog post titled “SpaceNet 6: Announcing the Winners”)
  32. 1st place solution Architecture ▪ U-Net + EfficientNet-b5 (×8-fold) ▪

    Input SAR strip ID and Y-coordinate to U-Net decoder Loss design ▪ Focal loss + dice loss ▪ Weight loss per image based on the number of buildings inside the image Pre-processing ▪ Cut out black part of images for faster training and stable BN stats Augmentation ▪ Random LR flip Test time augmentation ▪ LR flip ▪ Resize (1×, 0.8×, and 1.5×) See YouTube video by zbigniewwojna for details 36
  33. Experiment management Managing experiments is quite important in SpaceNet challenges

    because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score 38
  34. Experiment management Managing experiments is quite important in SpaceNet challenges

    because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt ./run_training.py --config config_specific.yaml --exp_id 100 39
  35. Experiment management Managing experiments is quite important in SpaceNet challenges

    because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt inference_results/ ./run_training.py --config config_specific.yaml --exp_id 100 ./run_inference.py --exp_id 100 40
  36. Implementation Large part of the code is borrowed from OSS

    below: Network modeling ▪ qubvel/segmentation_models.pytorch Processing imageries (in GeoTIFF) and annotations (GeoJSON) ▪ CosmiQ/solaris Config management ▪ rbgirshick/yacs 41
  37. Implementation Computational resources allowed in final testing phase: ▪ training:

    48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) 42
  38. Implementation Computational resources allowed in final testing phase: ▪ training:

    48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ▪ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error 43
  39. Implementation Computational resources allowed in final testing phase: ▪ training:

    48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ▪ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error ▪ in final testing on p3.8xlarge, 4 models were trained in parallel (each was trained on one V100 card) 44
  40. Thank you! Motoki Kimura Computer Vision Research Engineer at Mobility

    Technologies Co., Ltd. Interests: Object Recognition, Autonomous Driving, and Remote Sensing Imagery Follow me - LinkedIn: https://www.linkedin.com/in/motokimura - GitHub: https://github.com/motokimura - Twitter: https://twitter.com/motokimura1 45