Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#38 REX sur l'organisation d'une compétition Kaggle

#38 REX sur l'organisation d'une compétition Kaggle

De la préparation du dataset et du choix de la métrique, au lancement de la compétition et à l'inévitable leak pour finir par les interviews avec les vainqueurs et la validation des résultats, la gestion d'une compétition Kaggle n'est pas un long fleuve tranquille. Retour sur une aventure extraordinaire !

Bio : Jeff Faudi - Machine Learning et Open-Innovation chez Airbus DS Intelligence.

Toulouse Data Science

June 17, 2019
Tweet

More Decks by Toulouse Data Science

Other Decks in Technology

Transcript

  1. Lessons learnt from organizing a Kaggle competition Jean-Francois (Jeff) Faudi

    Analytics & Open-Innovation @Airbus DS Intelligence
  2. Who Are We ? Delivering value from data in our

    Digitally connected world 1,000+ CUSTOMERS in 100 COUNTRIES 150 RESELLERS 25 DRS PARTNERS >100Bn km² ARCHIVE DATA 1,800 EMPLOYEES Secure Communications Cyber Security Security Solutions Defence and Space COMMUNICATIONS INTELLIGENCE & SECURITY (CIS) MILITARY AIRCRAFT SPACE SYSTEMS UNMANNED AERIAL SYSTEMS Intelligence Commercial Aircraft Helicopters Future Applications
  3. Multi-Sources, Multi-Resolution DMC Constellation - 22m SPOT 6/7 – 1.5m

    TerraSAR-X – 25cm-40m Pléiades – 50cm Pléiades Neo - 30cm HAPS Zephyr - 20cm
  4. Cloud detection Machine Learning helped reduce error rates from 11%

    to 3% in the critical process of creating cloud masks
  5. Here’s the backstory Shipping traffic is growing fast. More ships

    increase the chances of infractions at sea like environmentally devastating ship accidents, piracy, illegal fishing, drug trafficking, and illegal cargo movement. This has compelled many organizations, from environmental protection agencies to insurance companies and national government authorities, to have a closer watch over the open seas. Photo by VanveenJF on Unsplash
  6. Ready to prepare and release a great dataset? • Select

    areas • Define classes • Tag ships • Resolve issues • Analyze data • Prepare split • Release under a liberal license
  7. The Intelligence Playground Access any place on Earth Run massive

    inference jobs on imagery Tag ground truth and define taxonomy Access all imagery online Export of machine learning datasets Import of qualification datasets Connect to external processings (through docker) Computation of metrics (scores)
  8. A dataset of 150,000 images of 768x768 pixels (@1,5m) with

    over 80,000 labelled ships has been prepared
  9. How to select the score? GT ML TP TN FP

    FN P R F2 0 0 0 0 0 0 Inf. Inf. 1 0 1 0 0 1 0 0 Inf. 0 1 0 0 0 0 1 0 1 1 1 0 0 0 1 1 2 1 0 1 0 0.8 2 0 0 0 2 1 0 2 2 1 0 2 3 0
  10. Launch! • A dataset of 150,000 images • 60.000 $

    of prize money • 2 challenges • 3 months • 4 winners
  11. Kaggle Discussions and Kernels More than 1,000 posts in discussion

    and comments shared with the community about Airbus images and best strategies to extract ships. More than 100 kernels have been published. This includes code source and results. https://www.kaggle.c om/meaninglesslives/ airbus-ship-detection- data-visualization https://www.kaggle.c om/ezietsman/airbus- eda ODS.AI > Привет я говорю по русски
  12. Final submission deadline November 14th The private leaderboard is calculated

    with approximately 12% of the test data. The final results will be based on the other 88%, so the final standings are different. A total of 1,876 competitors (in 883 teams) provided 12,813 entries to the Leader board in 3 months. The source code of the 3 winners has been delivered to Airbus together with short interviews.
  13. More hints and tricks in kernels • 4th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71667#latest-427690 •

    6th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71782#latest-437667 • 9th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71595#latest-457550 • 10th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71607#latest-423229 • 11th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71659#latest-457548 • 14th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71664#latest-427689 • https://www.youtube.com/watch?v=0Opb8gB1p4w • https://www.youtube.com/watch?v=MIbetMAnC04 (leak, russian) • https://www.youtube.com/watch?v=fZS-h8fwRIk (russian) • https://github.com/neptune-ml/open-solution-ship-detection • 8th: https://github.com/SeuTao/Airbus-Ship-Detection-Challenge-2018_8th_place_solution • 10th: https://github.com/tkuanlun350/Kaggle_Ship_Detection_2018 • 21st: https://github.com/pascal1129/kaggle_airbus_ship_detection • https://github.com/toshi-k/kaggle-airbus-ship-detection-challenge (oriented SSD) And elsewhere…
  14. Speed Prize The top 100 best participants could take part

    in the special prize to make the algorithm super-fast. Resulting in the capacity to process one full SPOT image in less than 1 min.
  15. Measurables improvements Validation of the models has been done with

    (yet) another dataset composed of 30 images annotated by human image analyst in real use cases
  16. Thank you for your attention MAIN TAKEWAYS • Get your

    OBJECTIVE clear • Release a LARGE unpublished database • Beware of the SCORE that you choose • Select CAREFULLY your validation dataset • Spare time to READ discussions and code • Get ready for the LEAK • Implement corrective actions QUICKLY • Do not got to far off the beaten TRACKS Jeff Faudi @jeffaudi