#38 REX sur l'organisation d'une compétition Kaggle

#38 REX sur l'organisation d'une compétition Kaggle

De la préparation du dataset et du choix de la métrique, au lancement de la compétition et à l'inévitable leak pour finir par les interviews avec les vainqueurs et la validation des résultats, la gestion d'une compétition Kaggle n'est pas un long fleuve tranquille. Retour sur une aventure extraordinaire !

Bio : Jeff Faudi - Machine Learning et Open-Innovation chez Airbus DS Intelligence.


Toulouse Data Science

June 17, 2019


  1. Lessons learnt from organizing a Kaggle competition Jean-Francois (Jeff) Faudi

    Analytics & Open-Innovation @Airbus DS Intelligence
  2. Who Are We ? Delivering value from data in our

    Digitally connected world 1,000+ CUSTOMERS in 100 COUNTRIES 150 RESELLERS 25 DRS PARTNERS >100Bn km² ARCHIVE DATA 1,800 EMPLOYEES Secure Communications Cyber Security Security Solutions Defence and Space COMMUNICATIONS INTELLIGENCE & SECURITY (CIS) MILITARY AIRCRAFT SPACE SYSTEMS UNMANNED AERIAL SYSTEMS Intelligence Commercial Aircraft Helicopters Future Applications
  3. DEFENCE AND SPACE Airbus constellation of optical and radar satellites

  4. Multi-Sources, Multi-Resolution DMC Constellation - 22m SPOT 6/7 – 1.5m

    TerraSAR-X – 25cm-40m Pléiades – 50cm Pléiades Neo - 30cm HAPS Zephyr - 20cm
  5. www.intelligence-airbusds.com/sandbox

  6. Cloud detection Machine Learning helped reduce error rates from 11%

    to 3% in the critical process of creating cloud masks
  7. None
  8. Here’s the backstory Shipping traffic is growing fast. More ships

    increase the chances of infractions at sea like environmentally devastating ship accidents, piracy, illegal fishing, drug trafficking, and illegal cargo movement. This has compelled many organizations, from environmental protection agencies to insurance companies and national government authorities, to have a closer watch over the open seas. Photo by VanveenJF on Unsplash
  9. Ready to prepare and release a great dataset? • Select

    areas • Define classes • Tag ships • Resolve issues • Analyze data • Prepare split • Release under a liberal license
  10. The Intelligence Playground Access any place on Earth Run massive

    inference jobs on imagery Tag ground truth and define taxonomy Access all imagery online Export of machine learning datasets Import of qualification datasets Connect to external processings (through docker) Computation of metrics (scores)
  11. A dataset of 150,000 images of 768x768 pixels (@1,5m) with

    over 80,000 labelled ships has been prepared
  12. How to select the score? GT ML TP TN FP

    FN P R F2 0 0 0 0 0 0 Inf. Inf. 1 0 1 0 0 1 0 0 Inf. 0 1 0 0 0 0 1 0 1 1 1 0 0 0 1 1 2 1 0 1 0 0.8 2 0 0 0 2 1 0 2 2 1 0 2 3 0
  13. Launch! • A dataset of 150,000 images • 60.000 $

    of prize money • 2 challenges • 3 months • 4 winners
  14. Kaggle Discussions and Kernels More than 1,000 posts in discussion

    and comments shared with the community about Airbus images and best strategies to extract ships. More than 100 kernels have been published. This includes code source and results. https://www.kaggle.c om/meaninglesslives/ airbus-ship-detection- data-visualization https://www.kaggle.c om/ezietsman/airbus- eda ODS.AI > Привет я говорю по русски
  15. None
  16. Leak management

  17. None
  18. Final submission deadline November 14th The private leaderboard is calculated

    with approximately 12% of the test data. The final results will be based on the other 88%, so the final standings are different. A total of 1,876 competitors (in 883 teams) provided 12,813 entries to the Leader board in 3 months. The source code of the 3 winners has been delivered to Airbus together with short interviews.
  19. Winners submissions

  20. More hints and tricks in kernels • 4th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71667#latest-427690 •

    6th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71782#latest-437667 • 9th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71595#latest-457550 • 10th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71607#latest-423229 • 11th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71659#latest-457548 • 14th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71664#latest-427689 • https://www.youtube.com/watch?v=0Opb8gB1p4w • https://www.youtube.com/watch?v=MIbetMAnC04 (leak, russian) • https://www.youtube.com/watch?v=fZS-h8fwRIk (russian) • https://github.com/neptune-ml/open-solution-ship-detection • 8th: https://github.com/SeuTao/Airbus-Ship-Detection-Challenge-2018_8th_place_solution • 10th: https://github.com/tkuanlun350/Kaggle_Ship_Detection_2018 • 21st: https://github.com/pascal1129/kaggle_airbus_ship_detection • https://github.com/toshi-k/kaggle-airbus-ship-detection-challenge (oriented SSD) And elsewhere…
  21. Speed Prize The top 100 best participants could take part

    in the special prize to make the algorithm super-fast. Resulting in the capacity to process one full SPOT image in less than 1 min.
  22. None
  23. Measurables improvements Validation of the models has been done with

    (yet) another dataset composed of 30 images annotated by human image analyst in real use cases
  24. DEFENCE AND SPACE Improvements in detection

  25. DEFENCE AND SPACE Improvements in instance detection

  26. In production

  27. Thank you for your attention MAIN TAKEWAYS • Get your

    OBJECTIVE clear • Release a LARGE unpublished database • Beware of the SCORE that you choose • Select CAREFULLY your validation dataset • Spare time to READ discussions and code • Get ready for the LEAK • Implement corrective actions QUICKLY • Do not got to far off the beaten TRACKS Jeff Faudi @jeffaudi