Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#38 REX sur l'organisation d'une compétition Kaggle

#38 REX sur l'organisation d'une compétition Kaggle

De la préparation du dataset et du choix de la métrique, au lancement de la compétition et à l'inévitable leak pour finir par les interviews avec les vainqueurs et la validation des résultats, la gestion d'une compétition Kaggle n'est pas un long fleuve tranquille. Retour sur une aventure extraordinaire !

Bio : Jeff Faudi - Machine Learning et Open-Innovation chez Airbus DS Intelligence.

Toulouse Data Science

June 17, 2019
Tweet

More Decks by Toulouse Data Science

Other Decks in Technology

Transcript

  1. Lessons learnt
    from organizing a
    Kaggle competition
    Jean-Francois (Jeff) Faudi
    Analytics & Open-Innovation
    @Airbus DS Intelligence

    View Slide

  2. Who Are We ? Delivering value from data in
    our Digitally connected world
    1,000+ CUSTOMERS
    in 100 COUNTRIES
    150 RESELLERS
    25 DRS PARTNERS
    >100Bn km² ARCHIVE DATA
    1,800 EMPLOYEES
    Secure
    Communications
    Cyber
    Security
    Security
    Solutions
    Defence and Space
    COMMUNICATIONS
    INTELLIGENCE
    & SECURITY (CIS)
    MILITARY
    AIRCRAFT
    SPACE
    SYSTEMS
    UNMANNED
    AERIAL SYSTEMS
    Intelligence
    Commercial Aircraft Helicopters
    Future
    Applications

    View Slide

  3. DEFENCE AND SPACE
    Airbus constellation
    of optical and radar satellites

    View Slide

  4. Multi-Sources, Multi-Resolution
    DMC Constellation - 22m SPOT 6/7 – 1.5m TerraSAR-X – 25cm-40m
    Pléiades – 50cm Pléiades Neo - 30cm HAPS Zephyr - 20cm

    View Slide

  5. www.intelligence-airbusds.com/sandbox

    View Slide

  6. Cloud detection
    Machine Learning helped reduce error rates from 11% to 3%
    in the critical process of creating cloud masks

    View Slide

  7. View Slide

  8. Here’s the
    backstory
    Shipping traffic is growing fast. More
    ships increase the chances of
    infractions at sea like environmentally
    devastating ship accidents, piracy,
    illegal fishing, drug trafficking, and
    illegal cargo movement. This has
    compelled many organizations, from
    environmental protection agencies to
    insurance companies and national
    government authorities, to have a
    closer watch over the open seas.
    Photo by VanveenJF on Unsplash

    View Slide

  9. Ready to prepare and
    release a great dataset?
    • Select areas
    • Define classes
    • Tag ships
    • Resolve issues
    • Analyze data
    • Prepare split
    • Release under a liberal license

    View Slide

  10. The Intelligence Playground
    Access any place
    on Earth
    Run massive
    inference jobs on
    imagery
    Tag ground truth
    and
    define taxonomy
    Access all imagery
    online
    Export of machine
    learning datasets
    Import of
    qualification datasets
    Connect to
    external processings
    (through docker)
    Computation of
    metrics (scores)

    View Slide

  11. A dataset of 150,000
    images of 768x768
    pixels (@1,5m) with
    over 80,000 labelled
    ships has been
    prepared

    View Slide

  12. How to select the score?
    GT ML TP TN FP FN P R F2
    0 0 0 0 0 0 Inf. Inf. 1
    0 1 0 0 1 0 0 Inf. 0
    1 0 0 0 0 1 0
    1 1 1 0 0 0 1
    1 2 1 0 1 0 0.8
    2 0 0 0
    2 1 0
    2 2 1 0
    2 3 0

    View Slide

  13. Launch!
    • A dataset of 150,000 images
    • 60.000 $ of prize money
    • 2 challenges
    • 3 months
    • 4 winners

    View Slide

  14. Kaggle
    Discussions
    and
    Kernels
    More than 1,000 posts in discussion
    and comments shared with the
    community about Airbus images and
    best strategies to extract ships.
    More than 100 kernels
    have been published.
    This includes code
    source and results.
    https://www.kaggle.c
    om/meaninglesslives/
    airbus-ship-detection-
    data-visualization
    https://www.kaggle.c
    om/ezietsman/airbus-
    eda
    ODS.AI > Привет я говорю по русски

    View Slide

  15. View Slide

  16. Leak management

    View Slide

  17. View Slide

  18. Final submission deadline
    November 14th
    The private leaderboard is
    calculated with approximately
    12% of the test data.
    The final results will be based on
    the other 88%, so the final
    standings are different.
    A total of 1,876 competitors
    (in 883 teams) provided
    12,813 entries to the Leader
    board in 3 months.
    The source code of the 3
    winners has been delivered to
    Airbus together with short
    interviews.

    View Slide

  19. Winners submissions

    View Slide

  20. More hints and tricks in kernels
    • 4th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71667#latest-427690
    • 6th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71782#latest-437667
    • 9th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71595#latest-457550
    • 10th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71607#latest-423229
    • 11th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71659#latest-457548
    • 14th: https://www.kaggle.com/c/airbus-ship-detection/discussion/71664#latest-427689
    • https://www.youtube.com/watch?v=0Opb8gB1p4w
    • https://www.youtube.com/watch?v=MIbetMAnC04 (leak, russian)
    • https://www.youtube.com/watch?v=fZS-h8fwRIk (russian)
    • https://github.com/neptune-ml/open-solution-ship-detection
    • 8th: https://github.com/SeuTao/Airbus-Ship-Detection-Challenge-2018_8th_place_solution
    • 10th: https://github.com/tkuanlun350/Kaggle_Ship_Detection_2018
    • 21st: https://github.com/pascal1129/kaggle_airbus_ship_detection
    • https://github.com/toshi-k/kaggle-airbus-ship-detection-challenge (oriented SSD)
    And elsewhere…

    View Slide

  21. Speed Prize
    The top 100 best participants
    could take part in the special
    prize to make the algorithm
    super-fast.
    Resulting in the capacity to
    process one full SPOT image in
    less than 1 min.

    View Slide

  22. View Slide

  23. Measurables
    improvements
    Validation of the models has been done with (yet)
    another dataset composed of 30 images annotated by
    human image analyst in real use cases

    View Slide

  24. DEFENCE AND SPACE Improvements in detection

    View Slide

  25. DEFENCE AND SPACE Improvements in instance detection

    View Slide

  26. In production

    View Slide

  27. Thank you
    for your
    attention
    MAIN TAKEWAYS
    • Get your OBJECTIVE clear
    • Release a LARGE unpublished database
    • Beware of the SCORE that you choose
    • Select CAREFULLY your validation dataset
    • Spare time to READ discussions and code
    • Get ready for the LEAK
    • Implement corrective actions QUICKLY
    • Do not got to far off the beaten TRACKS
    Jeff Faudi
    @jeffaudi

    View Slide