Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BirdCLEF2021まとめ

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for start start
June 12, 2021

 BirdCLEF2021まとめ

Avatar for start

start

June 12, 2021
Tweet

More Decks by start

Other Decks in Programming

Transcript

  1. ޿ౡͷҩֶੜɽ 
 ࠃࢼͷษڧͷ๣Βҩྍϕϯνϟʔ (ג)MNES ʹͯ ΠϯλʔϯΛ͓ͯ͠Γɼಉϕϯνϟʔ͕ޙԉ͢Δ LAIMEͱ͍͏ֶੜ޲͚ػցֶशษڧαʔΫϧͰ 
 ݚᮎΛੵΜͰ͍·͢ɽ 


    ຊίϯϖͰ༏উ͠ɼMasterͱͳΓ·ͨ͠ɽ 
 (ଟ෼ʹνʔϜϝΠτͷ͓͔͛Ͱ͕͢...) 
 ͋ͱɼྑ͍ΞΠίϯΛ୳͍ͯ͠·͢ɽ kaggleɿ@startjapan 
 twitter ɿ@startjapanml ࣗݾ঺հ
  2. ίϯϖ֓ཁ • 5ඵࠁΈͷԻ੠ηάϝϯτ͔Β໐͍͍ͯΔௗछΛಛఆ͢Δίϯϖ 
 (2020೥ʹ΋ಉ͡ओ࠵ऀ͕ྨࣅίϯϖΛ։࠵͍ͯ͠Δ → 2020೥ͷௗίϯϖͱݺͼ·͢) • trainσʔλ͸xeno-cantoͱ͍͏ௗͷ໐͖੠ڞ༗αΠτ͔Βऔಘ͞ΕͨԻ੠ 


    (train_short_audio) • testσʔλ͸10෼×80݅ͷԻ੠ϑΝΠϧɼ͜ΕΛ5ඵ͝ͱʹ۠੾Γ༧ଌ͢Δ 
 (test_soundscapes) • ্هͱ͸ผʹvalidation༻ͷԻ੠(10෼×20݅)΋༩͑ΒΕͨ 
 (train_soundscapes)
  3. [test_soundscapes] testσʔλɽ10෼×80͕݅ͩఏग़͠ͳ͍ͱΞΫηεͰ͖ͳ͍ɽ 
 τʔλϧ4ͭͷ৔ॴͰ࿥Ի͞Ε͍ͯΔɽ [train_short_audio] ֶशσʔλɽௗछ͝ͱʹԻ੠͕·ͱΊΒΕ͍ͯΔɽ 
 ߹ܭͰ62874݅ͷԻ੠σʔλɽ [train_soundscapes] test_soundscapesʹ͍ۙԻڹυϝΠϯΛ࣋ͭɽ

    
 10෼×20݅͋Γɼtest_soundscapesΛ࿥Իͨ͠4ͭͷ৔ॴͷ 
 ͏ͪ2ͭͷ৔ॴͰ࿥ΒΕͨԻ੠ [train_metadata.csv] train_short_audioʹର͢Δmetadataɽshape͸(62784, 14) [train_soundscape_labels.csv / test.csv] 10෼ͷϑΝΠϧΛ5ඵηάϝϯτʹ෼͚ͨࡍͷࠎ૊ΈΛఏڙɽ 
 train_soundscape_labels.csv͸train_soundscapesʹɼ 
 test.csv͸test_soundscapesʹରԠ͠ɼ 
 લऀʹͷΈਖ਼ղϥϕϧ͕෇͍͍ͯΔɽ
  4. 2020೥ͷௗίϯϖͱͷࠩҟ • train_soundscapesͷଘࡏ 
 ʔ train_short_audioͱtestσʔλͷؒʹ͸ԻڹυϝΠϯͷ͕ࠩେ͖͍ 
 ʔ ࠓճͷίϯϖͰ͸ΑΓtestσʔλʹ͍ۙԻڹυϝΠϯΛ࣋ͭtrain_soundscapes͕༩͑ΒΕͨ 


    ʔ validation༻్Ͱ༻͍ΒΕΔ͜ͱ͕ଟ͔͕ͬͨɼதʹ͸޻෉ֶͯ͠शʹ༻͍Δਓ΋͍ͨ • testσʔλͷҐஔ৘ใʹΞΫηεͰ͖ͨ 
 ʔ testσʔλͷ֤ϑΝΠϧ໊ʹ৔ॴͷ৘ใ͕ೖ͍ͬͯΔ͜ͱ͸อূ͞Ε͍ͯͨ (೔෇΋) 
 ʔ ैͬͯɼ͜ΕΒͷ৘ใ΋ԿΒ͔ͷܗͰύΠϓϥΠϯʹ૊ΈࠐΉඞཁ͕͋ͬͨ (ࢀߟɿStarter and some thoughts by @hidehisaarai1213)
  5. Ի੠ೝࣝλεΫͷϕʔγοΫͳղ๏ Ի੠σʔλ͸ԣ͕࣠࣌ؒɼॎ͕࣠प೾਺ɼ 
 ֤ϐΫηϧ͕৴߸੒෼ͷڧ౓Λࣔ͢ը૾ 
 (εϖΫτϩάϥϜ) ʹม׵ՄೳͰ͋Γɼ 
 ͜Εʹରͯ͠CNNͳͲΛదԠ͢Δͱ 


    ैདྷ௨Γͷը૾ॲཧͱͯ͠ѻ͑Δɽ ※ ຊίϯϖͰ͸ॎ࣠(प೾਺)ʹϝϧई౓Λ࢖༻ͨ͠ϝϧεϖΫτϩάϥϜ͕Α͘࢖ΘΕͨ 
 ※ ϝϧई౓ͱ͸ɿԻͷप೾਺ʹؔͯ͠ɼ͜ͷई౓্Ͱͷ͕ࠩಉ͡Ͱ͋Ε͹ਓ͕ؒࣖͰײ͡ΔԻͷߴ͞ͷࠩ΋ಉ͡ʹͳΔ CNN (ը૾͸BirdCLEF2021: Processing audio dataΑΓҾ༻)
  6. ຊίϯϖಛ༗ͷΫη • train_short_audioʹରͯ͠weak label͔͠ৼΒΕ͍ͯͳ͍ (weak label໰୊) 
 ʔ ਺ेඵͷԻ੠σʔλશମʹରͯ͠ϥϕϧ͕෇༩͞Ε͍ͯΔ 


    ʔ 5ඵ۠੾ΓͷηάϝϯτϨϕϧͰͲͷௗ͕໐͍͍ͯΔ͔͕෼͔Βͳ͍ • train_short_audioͷҰ෦Ͱϥϕϧͷܽଛ͕͋Δ (noisy label໰୊) 
 ʔ ಛʹsecondary_labels(※)ʹ͸ܽଛ͕͋Δͱ໌ه͞Ε͍ͯΔ • ࿥Ի೔΍৔ॴͷ৘ใͳͲͷmetadata΋ԿΒ͔ͷܗͰ৫ΓࠐΉඞཁ͕͋Δ (metadataͷ৫ࠐ) • ༧ଌର৅ͷલޙͷηάϝϯτͰௗ͕໐͍͍ͯΔ͔ͱ͍͏৘ใ΋ҙຯΛ࣋ͭՄೳੑ͕͋Δ 
 (ηάϝϯτલޙ৘ใͷ৫ࠐ) • train_soundscapesͱtest_soundscapesͰnocall཰͕େ͖͘ҟͳΔ (CVઓཱུ֬ͷࠔ೉) ※ train_short_audioͷϥϕϧʹ͸primary_labelͱsedondary_labelsͷ2छྨ͕͋Δ
  7. tl;dr 1st stage : ֎෦σʔλ(freefield1010)Λ࢖ͬͯbinary nocall detector ࡞੒ (1 :

    Կ͔ௗ໐͍ͯΔ / 0 : nocall) 
 2nd stage : nocall detectorΛ࢖ͬͯtrain_short_audio͔Βnocall෦෼ͷweightΛݮΒ্ͨ͠Ͱ397࣍ݩϚϧνϥϕϧ෼ྨثΛ࡞੒ 
 3rd stage : nocall detectorͷ݁Ռɼmetadataɼ2nd stageͷ݁ՌͳͲ͔Βࣗલtable competitionΛ࡞੒ ࠷ऴతʹࣗલtable competitionʹ͢Δ͜ͱͰ 
 weak label໰୊ɼnoisy label໰୊ɼmetadataͷ৫ࠐɼηάϝϯτલޙ৘ใͷ৫ࠐͳͲΛ·Δͬͱղܾʂʂ ※ Inference Part ͷΈͷུ֓Ͱ͋Γɼ1st stage෦෼͸লུ͍ͯ͠·͢
  8. ͳͥtableԽͰweak label໰୊ & noisy label໰୊͕ղܾ͞ΕΔʁ • 3rd stageͷtargetม਺ (0 :

    ͸ͣΕߦ / 1 : ͋ͨΓߦ) ͸ҎԼͷྲྀΕͰܾఆ͞ΕΔ • ਺ेඵͷԻ੠σʔλʹରͯ͠෇༩͞Εͨprimary & secondary labelsʹରͯ͠ηάϝϯτ୯ҐͰ༧ଌ஋Λग़ͤΔ 
 nocall detectorͱϚϧνϥϕϧ෼ྨثͷग़ྗΛ૊Έ߹ΘͤΔ͜ͱͰweak label໰୊Λղܾ • Ծʹsecondary labelsʹܽଛ͕͋Δͱϥϕϧ0͕෇༩͞ΕΔ͕ϥϕϧ0ͷαϯϓϧ਺͸ൺֱతଟ͘ 
 noise͸͍͍ײ͡ʹຒ΋ΕΔ (noisy label໰୊ͷ؇࿨)
  9. more details... • νʔϜϝΠτͷkami͞Μ (twitter : @634kami / kaggle :

    @kami634) ͕ҎԼʹ೔ຊޠͰղ๏Λ·ͱΊͯ͘Ε·ͨ͠ 
 Kaggle ͷௗίϯϖͰ1ҐΛऔͬͨ࿩ɿBirdCLEF 2021 ༏উղ๏
  10. 2nd place • train_short_audio͔Β30ඵ୯ҐͰநग़ͨ͠ͷͪɼ5ඵ͝ͱʹ۠੾Γɼmixup͢Δ 
 (weak label໰୊΁ͷରԠ) • train_soundscapesͷ͏ͪ10෼ؒશ͘ௗ͕໐͔ͳ͍Ի੠ϑΝΠϧ3ͭͷআ֎ &

    ϒʔτετϥοϓαϯϓϦϯά 
 (ϩόετͳCVઓུ) • label smoothing & metadataதͷratingྻΛ༻͍ͯॏΈ෇͚ (noisy label໰୊΁ͷରԠ) • ᮢ஋બ୒ͷࡍͷtips 
 ʔ LBͰ͸CVΑΓnocall཰͕௿͍ͷͰᮢ஋ΛԼ͛ͯௗΛଟ͘༧ଌ 
 ʔ ϞσϧؒͰ֬཰஋ͷ෼෍͕ҟͳΔͨΊ୯Ұͷ֬཰஋Λᮢ஋ͱ͢Δͷ͸φϯηϯε 
 ΑͬͯɼύʔηϯλΠϧϕʔεͷᮢ஋Λ࢖༻ • ͦͷଞ (ޙॲཧ) 
 ʔ ௗ͝ͱͷฏۉ༧ଌ֬཰͔Βݸʑͷ֬཰஋Λमਖ਼ 
 ʔ લޙηάϝϯτ৘ใΛ࢖༻ 
 ʔ nocall detectorͷ݁ՌΛՃຯ 
 ʔ ࣌ͱ৔ॴͷ৘ใ͔Β͋Γ͑ͳ͍ௗछΛ༧ଌΛ͍ͯ͠Δ৔߹͸࡟আ (metadataͷ৫ࠐ)
  11. 4th place • SEDϞσϧΛ࢖༻ɼೖྗ͸10-30ඵ (weak label໰୊΁ͷରԠ) 
 (ࢀߟɿIntroduction to Sound

    Event Detection by @hidehisaarai1213) • ͜ͷํ΋mixupΛ࢖༻ • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ) • ༧ଌର৅ͷ5ඵηάϝϯτͱͦΕΛத৺ͱ͢Δ30ඵηάϝϯτͷͦΕͧΕʹର͢ΔSEDͷग़ྗ Λ૊Έ߹Θͤͯ࠷ऴग़ྗͱͨ͠ (ηάϝϯτલޙ৘ใͷ৫ࠐ) 
 ʔ 5ඵηάϝϯτʹରͯ͠͸খ͞ͳᮢ஋ɼ30ඵηάϝϯτʹରͯ͠͸େ͖ͳᮢ஋Λ࢖༻ • 2Ґͷղ๏ͱಉ༷ʹɼ࣌ͱ৔ॴͷ৘ใ͔Β؍ଌ͞ΕΔՄೳੑ͕௿͍ͱ൑அͨ͠ௗछ͸࡟আ 
 (metadataͷ৫ࠐ)
  12. 5th place • 2020೥ͷௗίϯϖͰ2ҐͩͬͨํͰ͋Γɼࠓճ΋ͦΕΛϕʔεͱ͍ͯͨ͠ • લճ͔Βͷվળ఺ɿSEDʹมߋͰ +1% (※1) / ᮢ஋ௐ੔ʹΑΓ

    +1% / Ξϯαϯϒϧํ๏վྑͰ +1% 
 (+ ஍Ҭ৘ใΛ΋ͱʹ༧ଌϥϕϧͷߜΓࠐΈ΋ͨ͠Έ͍ͨ(※2) ) • augmentation͕ಛ௃తɿը૾Λ0.5-3৐ / nഒ଎ / Ӎ΍ձ࿩ͳͲͷԻΛ௥Ճ / ϊΠζ௥Ճ / 0.5ͷ֬཰Ͱप೾਺ௐ੔ 
 (1-4Ґ͸mixup΍ϊΠζ௥Ճʹཹ·Δҹ৅) • primary label͸ϥϕϧ1, secondary labels͸ϥϕϧ0.3Λ෇༩ • 1ͭͷηάϝϯτͰ؍ଌ͞Εͨௗ͸10෼ͷԻ੠ϑΝΠϧશମͰर্͍͛΍͘͢ͳΔΑ͏ௐ੔ (※3) ※1 : weak label໰୊΁ͷରԠ ※2 : metadataͷ৫ࠐ ※3 : ηάϝϯτલޙ৘ใͷ৫ࠐ
  13. 8th place • 2020೥ͷௗίϯϖͰ6Ґͩͬͨํɼࠓճ΋SEDΛ࢖༻ (weak label໰୊΁ͷରԠ) • ֶश࣌͸5ඵ or 20ඵηάϝϯτɼਪ࿦࣌͸40ඵηάϝϯτΛ࢖༻ɼ௕͍΄͏͕Α͔ͬͨ

    
 ·ͨɼਪ࿦͸0-40ඵͰߦͬͨ࣍ʹ20-60ඵͱ͍͏෩ʹoverlapΛ΋ͨͤͨ (ηάϝϯτલޙ৘ใͷ৫ࠐ) • augmentationɿΨ΢γΞϯϊΠζɼϐϯΫϊΠζɼϘϦϡʔϜௐ੔ɼϐονγϑτ 
 (mixup΋্ख͘ߦ͕ͬͨܭࢉࢿݯͷ໰୊Ͱ࠷ऴఏग़ʹ͸૊ΈࠐΊͳ͔ͬͨͦ͏) • ଛࣦؔ਺͕ಛ௃త (BCEFocal2WayLoss) • primary labelͱsecondary labels͸ಉ͡Α͏ʹѻͬͨ • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ) • ᮢ஋͸call thresholdͱnocall thresholdͷ2͕ͭଘࡏ͠ɼcall thresholdΛ௒͑ͨௗछ͸ཅੑͱ͢ΔҰํͰ 
 શͯͷௗछʹ͓͍ͯnocall thresholdΛ௒͑ͳ͔ͬͨηάϝϯτʹ͸nocall΋෇༩ (ௗϥϕϧͱnocall͕ڞଘ͠͏Δ) • ஍Ҭ৘ใ͔Βଘࡏ͢Δ͸͕ͣͳ͍ௗछ͸༧ଌ͍ͯͯ͠΋আ֎ (metadataͷ৫ࠐ) • ௗ͕໐͍͍ͯΔߦͱnocallߦʹ෼͚ͯF1είΞΛࢉग़͠0.54 * nocall_f1 + 0.46 * call_f1ͰCVΛಋग़ (ϩόετͳCVઓུ)
  14. 9th place • ֶश࣌ͷೖྗ͸5-7ඵηάϝϯτ • secondary labelsͷॏΈ͸খͨ͘͞͠ • mixup࢖༻ •

    ՄೳͳݶΓͷଟ༷ੑΛ΋ͨͤͨ 
 ʔ ࣌ؒ෼ղೳͷҟͳΔmel-spectrogramɼhop_length͸200ͱ320 
 ʔ ༷ʑͳbackbone 
 ʔ augmentationɿwhite noise, pink noise, band noise, nocall clipsɼmel-spectrogramը૾ͷྦྷ৐ • ޙॲཧ 
 ʔ 10෼ͷԻ੠σʔλશମʹ͓͚Δ֤ௗͷ໐͘࠷େ֬཰ or ฏۉ֬཰Ͱޙॲཧ (ηάϝϯτલޙ৘ใͷ৫ࠐ) 
 ʔ ஍Ҭ৘ใ͔ΒͲΕ͚֤ͩௗ͕໐͘Մೳੑ͕͋Δ͔ධՁͯͦ͠ͷ݁ՌͰޙॲཧ (metadataͷ৫ࠐ) 
 ʔ 1೔ͷؒͰ֤ௗ͕໐͘࠷େ֬཰Λ࢖ͬͯޙॲཧ (metadataͷ৫ࠐ) • Squeeze width of test soundscapes by 2-5% (mostly to reverse far field effects) • ͜ͷํ΋ᮢ஋Λ2ͭ(call, nocall)ઃఆ͠ɼௗϥϕϧͱnocallͷڞଘΛೝΊͨ
  15. 11th place • Public LBͰ௕͍͜ͱटҐΛಠ઎͞Ε͍ͯͨCPMP͞Μ • 2020೥ͷௗίϯϖͰ18ҐɼRainforestίϯϖͰ11ҐΛͱΒΕͨํͰ͋Γ྆ऀͷղ๏Λmixͨ͠΋ͷΛϕʔεͱͨͦ͠͏ 
 ʔ 2020೥ͷௗίϯϖͷղ๏

    : 18th place solution: efficientnet b3 
 ʔ Rainforestίϯϖͷղ๏ : 11th place, The 0.931 Magic Explained: Image Classification • 8Ґͷղ๏ͱಉ͘͡0.54 * nocall_f1 + 0.46 * call_f1ʹͯCVΛࢉग़ (ϩόετͳCVઓུ) • ΞϯαϯϒϧͰ࣮֬ʹείΞ্͕ঢ͢Δͱա৴͓ͯ͠Γίϯϖऴྃ਺೔લ·ͰΞϯαϯϒϧverΛఏग़͠ͳ͔ͬͨ͜ͱ Λޙչͳ͍ͬͯ͞Δ (࣮ࡍʹ͸Ξϯαϯϒϧ͕ޮ͔ͳ͔ͬͨͦ͏)