Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BirdCLEF2021まとめ

start
June 12, 2021

 BirdCLEF2021まとめ

start

June 12, 2021
Tweet

More Decks by start

Other Decks in Programming

Transcript

  1. BirdCLEF2021·ͱΊ
    ίϯϖ֓ཁͱ্Ґऀղ๏


    English version also available
    (ΞΠίϯ୳͠த...)
    start

    (@startjapan)
    (Speaker Deckͷ֓ཁཝ͔Β֤ϦϯΫʹඈ΂·͢)

    View Slide

  2. ࣗݾ঺հ

    View Slide

  3. ޿ౡͷҩֶੜɽ



    ࠃࢼͷษڧͷ๣Βҩྍϕϯνϟʔ (ג)MNES ʹͯ
    ΠϯλʔϯΛ͓ͯ͠Γɼಉϕϯνϟʔ͕ޙԉ͢Δ
    LAIMEͱ͍͏ֶੜ޲͚ػցֶशษڧαʔΫϧͰ

    ݚᮎΛੵΜͰ͍·͢ɽ



    ຊίϯϖͰ༏উ͠ɼMasterͱͳΓ·ͨ͠ɽ

    (ଟ෼ʹνʔϜϝΠτͷ͓͔͛Ͱ͕͢...)



    ͋ͱɼྑ͍ΞΠίϯΛ୳͍ͯ͠·͢ɽ
    kaggleɿ@startjapan

    twitter ɿ@startjapanml
    ࣗݾ঺հ

    View Slide

  4. ίϯϖ֓ཁ

    View Slide

  5. ίϯϖ֓ཁ
    • 5ඵࠁΈͷԻ੠ηάϝϯτ͔Β໐͍͍ͯΔௗछΛಛఆ͢Δίϯϖ

    (2020೥ʹ΋ಉ͡ओ࠵ऀ͕ྨࣅίϯϖΛ։࠵͍ͯ͠Δ → 2020೥ͷௗίϯϖͱݺͼ·͢)


    • trainσʔλ͸xeno-cantoͱ͍͏ௗͷ໐͖੠ڞ༗αΠτ͔Βऔಘ͞ΕͨԻ੠

    (train_short_audio)


    • testσʔλ͸10෼×80݅ͷԻ੠ϑΝΠϧɼ͜ΕΛ5ඵ͝ͱʹ۠੾Γ༧ଌ͢Δ

    (test_soundscapes)


    • ্هͱ͸ผʹvalidation༻ͷԻ੠(10෼×20݅)΋༩͑ΒΕͨ

    (train_soundscapes)

    View Slide

  6. [test_soundscapes]


    testσʔλɽ10෼×80͕݅ͩఏग़͠ͳ͍ͱΞΫηεͰ͖ͳ͍ɽ

    τʔλϧ4ͭͷ৔ॴͰ࿥Ի͞Ε͍ͯΔɽ


    [train_short_audio]


    ֶशσʔλɽௗछ͝ͱʹԻ੠͕·ͱΊΒΕ͍ͯΔɽ

    ߹ܭͰ62874݅ͷԻ੠σʔλɽ


    [train_soundscapes]


    test_soundscapesʹ͍ۙԻڹυϝΠϯΛ࣋ͭɽ

    10෼×20݅͋Γɼtest_soundscapesΛ࿥Իͨ͠4ͭͷ৔ॴͷ

    ͏ͪ2ͭͷ৔ॴͰ࿥ΒΕͨԻ੠


    [train_metadata.csv]


    train_short_audioʹର͢Δmetadataɽshape͸(62784, 14)


    [train_soundscape_labels.csv / test.csv]


    10෼ͷϑΝΠϧΛ5ඵηάϝϯτʹ෼͚ͨࡍͷࠎ૊ΈΛఏڙɽ

    train_soundscape_labels.csv͸train_soundscapesʹɼ

    test.csv͸test_soundscapesʹରԠ͠ɼ

    લऀʹͷΈਖ਼ղϥϕϧ͕෇͍͍ͯΔɽ


    View Slide

  7. ఏग़ܗࣜ & ධՁࢦඪ
    • 1ͭͷηάϝϯτʹରͯ͠ෳ਺ͷௗछΛ༧ଌͱͯ͠ఏग़Մೳ


    • ௗ͕໐͍͍ͯͳ͍ηάϝϯτʹରͯ͠͸"nocall"ͱ͍͏จࣈྻΛఏग़


    • ධՁࢦඪ͸ߦ͝ͱͷmicro-F1είΞͷฏۉ

    View Slide

  8. 2020೥ͷௗίϯϖͱͷࠩҟ

    View Slide

  9. 2020೥ͷௗίϯϖͱͷࠩҟ
    • train_soundscapesͷଘࡏ

    ʔ train_short_audioͱtestσʔλͷؒʹ͸ԻڹυϝΠϯͷ͕ࠩେ͖͍

    ʔ ࠓճͷίϯϖͰ͸ΑΓtestσʔλʹ͍ۙԻڹυϝΠϯΛ࣋ͭtrain_soundscapes͕༩͑ΒΕͨ

    ʔ validation༻్Ͱ༻͍ΒΕΔ͜ͱ͕ଟ͔͕ͬͨɼதʹ͸޻෉ֶͯ͠शʹ༻͍Δਓ΋͍ͨ


    • testσʔλͷҐஔ৘ใʹΞΫηεͰ͖ͨ

    ʔ testσʔλͷ֤ϑΝΠϧ໊ʹ৔ॴͷ৘ใ͕ೖ͍ͬͯΔ͜ͱ͸อূ͞Ε͍ͯͨ (೔෇΋)

    ʔ ैͬͯɼ͜ΕΒͷ৘ใ΋ԿΒ͔ͷܗͰύΠϓϥΠϯʹ૊ΈࠐΉඞཁ͕͋ͬͨ
    (ࢀߟɿStarter and some thoughts by @hidehisaarai1213)

    View Slide

  10. EDA (train_short_audioฤ)

    View Slide

  11. 1ϑΝΠϧ͋ͨΓͷԻ੠ͷ௕͞ (train_short_audio)
    ※ train_short_audioͷ͏ͪ1000݅ ( / 62874݅) ͷԻ੠ϑΝΠϧΛϥϯμϜαϯϓϦϯά


    ※ ԣ࣠ : 1ϑΝΠϧ͋ͨΓͷԻ੠ͷ௕͞ [ඵ]


    ※ ॎ࣠ : ౓਺ (߹ܭ1000݅)

    View Slide

  12. 1छͷௗʹରͯ͠Կ݅ͷԻ੠ϑΝΠϧ͕͋Δʁ (train_short_audio)
    ※ ԣ࣠ : ֤ௗʹ͓͚ΔԻ੠ϑΝΠϧ਺ (train_short_audio಺)


    ※ ॎ࣠ : ౓਺ (߹ܭ397छ)

    View Slide

  13. secondary labelsʹ͸ܽଛ͕͋Δͱ໌ه͞Ε͍ͯΔ (train_short_audio)
    (BirdCLEF2021: Exploring the dataΑΓҾ༻)

    View Slide

  14. EDA (soundscapesฤ)

    View Slide

  15. શߦnocallఏग़ͰPublicLBͷnocall཰͸෼͔Δ
    BirdCLEF2021
    (ࢀߟɿ2020೥ͷௗίϯϖ)
    Private
    Private Public
    Public

    View Slide

  16. ҰํͰtrain_soundscapesͰ͸΍΍ߴ͍nocall཰

    View Slide

  17. train_soundscapesʹ͓͚Δ໨తม਺ͷ෼෍ (nocallࠐΈ)
    • ѹ౗తʹnocall͕ଟ͍


    • 2छҎ্໐͍͍ͯΔ5ඵηάϝϯτ΋͋Δ

    View Slide

  18. • Α͘؍ଌ͞ΕΔௗछͷ૊Έ߹Θͤ΋͋Δ
    train_soundscapesʹ͓͚Δ໨తม਺ͷ෼෍ (nocall࡟আ൛)

    View Slide

  19. train_soundscapesʹ͓͍ͯ5ඵηάϝϯτ಺Ͱಉ࣌ʹ໐͍͍ͯΔௗͷ਺

    View Slide

  20. Ի੠ೝࣝλεΫͷϕʔγοΫͳղ๏

    View Slide

  21. Ի੠ೝࣝλεΫͷϕʔγοΫͳղ๏
    Ի੠σʔλ͸ԣ͕࣠࣌ؒɼॎ͕࣠प೾਺ɼ

    ֤ϐΫηϧ͕৴߸੒෼ͷڧ౓Λࣔ͢ը૾

    (εϖΫτϩάϥϜ) ʹม׵ՄೳͰ͋Γɼ

    ͜Εʹରͯ͠CNNͳͲΛదԠ͢Δͱ

    ैདྷ௨Γͷը૾ॲཧͱͯ͠ѻ͑Δɽ
    ※ ຊίϯϖͰ͸ॎ࣠(प೾਺)ʹϝϧई౓Λ࢖༻ͨ͠ϝϧεϖΫτϩάϥϜ͕Α͘࢖ΘΕͨ

    ※ ϝϧई౓ͱ͸ɿԻͷप೾਺ʹؔͯ͠ɼ͜ͷई౓্Ͱͷ͕ࠩಉ͡Ͱ͋Ε͹ਓ͕ؒࣖͰײ͡ΔԻͷߴ͞ͷࠩ΋ಉ͡ʹͳΔ
    CNN
    (ը૾͸BirdCLEF2021: Processing audio dataΑΓҾ༻)

    View Slide

  22. ຊίϯϖಛ༗ͷΫη

    View Slide

  23. ຊίϯϖಛ༗ͷΫη
    • train_short_audioʹରͯ͠weak label͔͠ৼΒΕ͍ͯͳ͍ (weak label໰୊)

    ʔ ਺ेඵͷԻ੠σʔλશମʹରͯ͠ϥϕϧ͕෇༩͞Ε͍ͯΔ

    ʔ 5ඵ۠੾ΓͷηάϝϯτϨϕϧͰͲͷௗ͕໐͍͍ͯΔ͔͕෼͔Βͳ͍


    • train_short_audioͷҰ෦Ͱϥϕϧͷܽଛ͕͋Δ (noisy label໰୊)

    ʔ ಛʹsecondary_labels(※)ʹ͸ܽଛ͕͋Δͱ໌ه͞Ε͍ͯΔ


    • ࿥Ի೔΍৔ॴͷ৘ใͳͲͷmetadata΋ԿΒ͔ͷܗͰ৫ΓࠐΉඞཁ͕͋Δ (metadataͷ৫ࠐ)


    • ༧ଌର৅ͷલޙͷηάϝϯτͰௗ͕໐͍͍ͯΔ͔ͱ͍͏৘ใ΋ҙຯΛ࣋ͭՄೳੑ͕͋Δ

    (ηάϝϯτલޙ৘ใͷ৫ࠐ)


    • train_soundscapesͱtest_soundscapesͰnocall཰͕େ͖͘ҟͳΔ (CVઓཱུ֬ͷࠔ೉)
    ※ train_short_audioͷϥϕϧʹ͸primary_labelͱsedondary_labelsͷ2छྨ͕͋Δ

    View Slide

  24. ্Ґऀͷղ๏
    top solutions and approaches


    ্هͷdiscussionʹ্Ґऀղ๏΁ͷϦϯΫ͕·ͱ·͍ͬͯ·͢

    View Slide

  25. 1st place (ours!)
    [1st Place] Quick Solution


    [1st Place] Detailed Solution

    View Slide

  26. tl;dr
    1st stage : ֎෦σʔλ(freefield1010)Λ࢖ͬͯbinary nocall detector ࡞੒ (1 : Կ͔ௗ໐͍ͯΔ / 0 : nocall)

    2nd stage : nocall detectorΛ࢖ͬͯtrain_short_audio͔Βnocall෦෼ͷweightΛݮΒ্ͨ͠Ͱ397࣍ݩϚϧνϥϕϧ෼ྨثΛ࡞੒

    3rd stage : nocall detectorͷ݁Ռɼmetadataɼ2nd stageͷ݁ՌͳͲ͔Βࣗલtable competitionΛ࡞੒


    ࠷ऴతʹࣗલtable competitionʹ͢Δ͜ͱͰ

    weak label໰୊ɼnoisy label໰୊ɼmetadataͷ৫ࠐɼηάϝϯτલޙ৘ใͷ৫ࠐͳͲΛ·Δͬͱղܾʂʂ
    ※ Inference Part ͷΈͷུ֓Ͱ͋Γɼ1st stage෦෼͸লུ͍ͯ͠·͢

    View Slide

  27. ͳͥtableԽͰweak label໰୊ & noisy label໰୊͕ղܾ͞ΕΔʁ
    • 3rd stageͷtargetม਺ (0 : ͸ͣΕߦ / 1 : ͋ͨΓߦ) ͸ҎԼͷྲྀΕͰܾఆ͞ΕΔ
    • ਺ेඵͷԻ੠σʔλʹରͯ͠෇༩͞Εͨprimary & secondary labelsʹରͯ͠ηάϝϯτ୯ҐͰ༧ଌ஋Λग़ͤΔ

    nocall detectorͱϚϧνϥϕϧ෼ྨثͷग़ྗΛ૊Έ߹ΘͤΔ͜ͱͰweak label໰୊Λղܾ


    • Ծʹsecondary labelsʹܽଛ͕͋Δͱϥϕϧ0͕෇༩͞ΕΔ͕ϥϕϧ0ͷαϯϓϧ਺͸ൺֱతଟ͘

    noise͸͍͍ײ͡ʹຒ΋ΕΔ (noisy label໰୊ͷ؇࿨)

    View Slide

  28. more details...
    • νʔϜϝΠτͷkami͞Μ (twitter : @634kami / kaggle : @kami634) ͕ҎԼʹ೔ຊޠͰղ๏Λ·ͱΊͯ͘Ε·ͨ͠

    Kaggle ͷௗίϯϖͰ1ҐΛऔͬͨ࿩ɿBirdCLEF 2021 ༏উղ๏

    View Slide

  29. 2nd place
    2nd place solution

    View Slide

  30. (2nd place solutionΑΓҾ༻)
    2nd place

    View Slide

  31. 2nd place
    • train_short_audio͔Β30ඵ୯ҐͰநग़ͨ͠ͷͪɼ5ඵ͝ͱʹ۠੾Γɼmixup͢Δ

    (weak label໰୊΁ͷରԠ)


    • train_soundscapesͷ͏ͪ10෼ؒશ͘ௗ͕໐͔ͳ͍Ի੠ϑΝΠϧ3ͭͷআ֎ & ϒʔτετϥοϓαϯϓϦϯά

    (ϩόετͳCVઓུ)


    • label smoothing & metadataதͷratingྻΛ༻͍ͯॏΈ෇͚ (noisy label໰୊΁ͷରԠ)


    • ᮢ஋બ୒ͷࡍͷtips

    ʔ LBͰ͸CVΑΓnocall཰͕௿͍ͷͰᮢ஋ΛԼ͛ͯௗΛଟ͘༧ଌ

    ʔ ϞσϧؒͰ֬཰஋ͷ෼෍͕ҟͳΔͨΊ୯Ұͷ֬཰஋Λᮢ஋ͱ͢Δͷ͸φϯηϯε

    ΑͬͯɼύʔηϯλΠϧϕʔεͷᮢ஋Λ࢖༻


    • ͦͷଞ (ޙॲཧ)

    ʔ ௗ͝ͱͷฏۉ༧ଌ֬཰͔Βݸʑͷ֬཰஋Λमਖ਼

    ʔ લޙηάϝϯτ৘ใΛ࢖༻

    ʔ nocall detectorͷ݁ՌΛՃຯ

    ʔ ࣌ͱ৔ॴͷ৘ใ͔Β͋Γ͑ͳ͍ௗछΛ༧ଌΛ͍ͯ͠Δ৔߹͸࡟আ (metadataͷ৫ࠐ)

    View Slide

  32. 4th place
    4th place solution

    View Slide

  33. 4th place
    • SEDϞσϧΛ࢖༻ɼೖྗ͸10-30ඵ (weak label໰୊΁ͷରԠ)

    (ࢀߟɿIntroduction to Sound Event Detection by @hidehisaarai1213)


    • ͜ͷํ΋mixupΛ࢖༻


    • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ)


    • ༧ଌର৅ͷ5ඵηάϝϯτͱͦΕΛத৺ͱ͢Δ30ඵηάϝϯτͷͦΕͧΕʹର͢ΔSEDͷग़ྗ
    Λ૊Έ߹Θͤͯ࠷ऴग़ྗͱͨ͠ (ηάϝϯτલޙ৘ใͷ৫ࠐ)

    ʔ 5ඵηάϝϯτʹରͯ͠͸খ͞ͳᮢ஋ɼ30ඵηάϝϯτʹରͯ͠͸େ͖ͳᮢ஋Λ࢖༻


    • 2Ґͷղ๏ͱಉ༷ʹɼ࣌ͱ৔ॴͷ৘ใ͔Β؍ଌ͞ΕΔՄೳੑ͕௿͍ͱ൑அͨ͠ௗछ͸࡟আ

    (metadataͷ৫ࠐ)

    View Slide

  34. 5th place
    5th place solution

    View Slide

  35. 5th place
    • 2020೥ͷௗίϯϖͰ2ҐͩͬͨํͰ͋Γɼࠓճ΋ͦΕΛϕʔεͱ͍ͯͨ͠


    • લճ͔Βͷվળ఺ɿSEDʹมߋͰ +1% (※1) / ᮢ஋ௐ੔ʹΑΓ +1% / Ξϯαϯϒϧํ๏վྑͰ +1%

    (+ ஍Ҭ৘ใΛ΋ͱʹ༧ଌϥϕϧͷߜΓࠐΈ΋ͨ͠Έ͍ͨ(※2) )


    • augmentation͕ಛ௃తɿը૾Λ0.5-3৐ / nഒ଎ / Ӎ΍ձ࿩ͳͲͷԻΛ௥Ճ / ϊΠζ௥Ճ / 0.5ͷ֬཰Ͱप೾਺ௐ੔

    (1-4Ґ͸mixup΍ϊΠζ௥Ճʹཹ·Δҹ৅)


    • primary label͸ϥϕϧ1, secondary labels͸ϥϕϧ0.3Λ෇༩


    • 1ͭͷηάϝϯτͰ؍ଌ͞Εͨௗ͸10෼ͷԻ੠ϑΝΠϧશମͰर্͍͛΍͘͢ͳΔΑ͏ௐ੔ (※3)
    ※1 : weak label໰୊΁ͷରԠ


    ※2 : metadataͷ৫ࠐ


    ※3 : ηάϝϯτલޙ৘ใͷ৫ࠐ

    View Slide

  36. 8th place
    8th place writeup

    View Slide

  37. 8th place
    • 2020೥ͷௗίϯϖͰ6Ґͩͬͨํɼࠓճ΋SEDΛ࢖༻ (weak label໰୊΁ͷରԠ)


    • ֶश࣌͸5ඵ or 20ඵηάϝϯτɼਪ࿦࣌͸40ඵηάϝϯτΛ࢖༻ɼ௕͍΄͏͕Α͔ͬͨ

    ·ͨɼਪ࿦͸0-40ඵͰߦͬͨ࣍ʹ20-60ඵͱ͍͏෩ʹoverlapΛ΋ͨͤͨ (ηάϝϯτલޙ৘ใͷ৫ࠐ)


    • augmentationɿΨ΢γΞϯϊΠζɼϐϯΫϊΠζɼϘϦϡʔϜௐ੔ɼϐονγϑτ

    (mixup΋্ख͘ߦ͕ͬͨܭࢉࢿݯͷ໰୊Ͱ࠷ऴఏग़ʹ͸૊ΈࠐΊͳ͔ͬͨͦ͏)


    • ଛࣦؔ਺͕ಛ௃త (BCEFocal2WayLoss)


    • primary labelͱsecondary labels͸ಉ͡Α͏ʹѻͬͨ


    • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ)


    • ᮢ஋͸call thresholdͱnocall thresholdͷ2͕ͭଘࡏ͠ɼcall thresholdΛ௒͑ͨௗछ͸ཅੑͱ͢ΔҰํͰ

    શͯͷௗछʹ͓͍ͯnocall thresholdΛ௒͑ͳ͔ͬͨηάϝϯτʹ͸nocall΋෇༩ (ௗϥϕϧͱnocall͕ڞଘ͠͏Δ)


    • ஍Ҭ৘ใ͔Βଘࡏ͢Δ͸͕ͣͳ͍ௗछ͸༧ଌ͍ͯͯ͠΋আ֎ (metadataͷ৫ࠐ)


    • ௗ͕໐͍͍ͯΔߦͱnocallߦʹ෼͚ͯF1είΞΛࢉग़͠0.54 * nocall_f1 + 0.46 * call_f1ͰCVΛಋग़ (ϩόετͳCVઓུ)

    View Slide

  38. 9th place
    9th Place solution

    View Slide

  39. 9th place
    • ֶश࣌ͷೖྗ͸5-7ඵηάϝϯτ


    • secondary labelsͷॏΈ͸খͨ͘͞͠


    • mixup࢖༻


    • ՄೳͳݶΓͷଟ༷ੑΛ΋ͨͤͨ

    ʔ ࣌ؒ෼ղೳͷҟͳΔmel-spectrogramɼhop_length͸200ͱ320

    ʔ ༷ʑͳbackbone

    ʔ augmentationɿwhite noise, pink noise, band noise, nocall clipsɼmel-spectrogramը૾ͷྦྷ৐


    • ޙॲཧ

    ʔ 10෼ͷԻ੠σʔλશମʹ͓͚Δ֤ௗͷ໐͘࠷େ֬཰ or ฏۉ֬཰Ͱޙॲཧ (ηάϝϯτલޙ৘ใͷ৫ࠐ)

    ʔ ஍Ҭ৘ใ͔ΒͲΕ͚֤ͩௗ͕໐͘Մೳੑ͕͋Δ͔ධՁͯͦ͠ͷ݁ՌͰޙॲཧ (metadataͷ৫ࠐ)

    ʔ 1೔ͷؒͰ֤ௗ͕໐͘࠷େ֬཰Λ࢖ͬͯޙॲཧ (metadataͷ৫ࠐ)


    • Squeeze width of test soundscapes by 2-5% (mostly to reverse far field effects)


    • ͜ͷํ΋ᮢ஋Λ2ͭ(call, nocall)ઃఆ͠ɼௗϥϕϧͱnocallͷڞଘΛೝΊͨ

    View Slide

  40. 11th place
    My journey (11th solution)

    View Slide

  41. 11th place
    • Public LBͰ௕͍͜ͱटҐΛಠ઎͞Ε͍ͯͨCPMP͞Μ


    • 2020೥ͷௗίϯϖͰ18ҐɼRainforestίϯϖͰ11ҐΛͱΒΕͨํͰ͋Γ྆ऀͷղ๏Λmixͨ͠΋ͷΛϕʔεͱͨͦ͠͏

    ʔ 2020೥ͷௗίϯϖͷղ๏ : 18th place solution: efficientnet b3

    ʔ Rainforestίϯϖͷղ๏ : 11th place, The 0.931 Magic Explained: Image Classification


    • 8Ґͷղ๏ͱಉ͘͡0.54 * nocall_f1 + 0.46 * call_f1ʹͯCVΛࢉग़ (ϩόετͳCVઓུ)


    • ΞϯαϯϒϧͰ࣮֬ʹείΞ্͕ঢ͢Δͱա৴͓ͯ͠Γίϯϖऴྃ਺೔લ·ͰΞϯαϯϒϧverΛఏग़͠ͳ͔ͬͨ͜ͱ
    Λޙչͳ͍ͬͯ͞Δ (࣮ࡍʹ͸Ξϯαϯϒϧ͕ޮ͔ͳ͔ͬͨͦ͏)

    View Slide

  42. ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ

    View Slide