Slide 1

Slide 1 text

BirdCLEF2021·ͱΊ ίϯϖ֓ཁͱ্Ґऀղ๏ English version also available (ΞΠίϯ୳͠த...) start 
 (@startjapan) (Speaker Deckͷ֓ཁཝ͔Β֤ϦϯΫʹඈ΂·͢)

Slide 2

Slide 2 text

ࣗݾ঺հ

Slide 3

Slide 3 text

޿ౡͷҩֶੜɽ 
 ࠃࢼͷษڧͷ๣Βҩྍϕϯνϟʔ (ג)MNES ʹͯ ΠϯλʔϯΛ͓ͯ͠Γɼಉϕϯνϟʔ͕ޙԉ͢Δ LAIMEͱ͍͏ֶੜ޲͚ػցֶशษڧαʔΫϧͰ 
 ݚᮎΛੵΜͰ͍·͢ɽ 
 ຊίϯϖͰ༏উ͠ɼMasterͱͳΓ·ͨ͠ɽ 
 (ଟ෼ʹνʔϜϝΠτͷ͓͔͛Ͱ͕͢...) 
 ͋ͱɼྑ͍ΞΠίϯΛ୳͍ͯ͠·͢ɽ kaggleɿ@startjapan 
 twitter ɿ@startjapanml ࣗݾ঺հ

Slide 4

Slide 4 text

ίϯϖ֓ཁ

Slide 5

Slide 5 text

ίϯϖ֓ཁ • 5ඵࠁΈͷԻ੠ηάϝϯτ͔Β໐͍͍ͯΔௗछΛಛఆ͢Δίϯϖ 
 (2020೥ʹ΋ಉ͡ओ࠵ऀ͕ྨࣅίϯϖΛ։࠵͍ͯ͠Δ → 2020೥ͷௗίϯϖͱݺͼ·͢) • trainσʔλ͸xeno-cantoͱ͍͏ௗͷ໐͖੠ڞ༗αΠτ͔Βऔಘ͞ΕͨԻ੠ 
 (train_short_audio) • testσʔλ͸10෼×80݅ͷԻ੠ϑΝΠϧɼ͜ΕΛ5ඵ͝ͱʹ۠੾Γ༧ଌ͢Δ 
 (test_soundscapes) • ্هͱ͸ผʹvalidation༻ͷԻ੠(10෼×20݅)΋༩͑ΒΕͨ 
 (train_soundscapes)

Slide 6

Slide 6 text

[test_soundscapes] testσʔλɽ10෼×80͕݅ͩఏग़͠ͳ͍ͱΞΫηεͰ͖ͳ͍ɽ 
 τʔλϧ4ͭͷ৔ॴͰ࿥Ի͞Ε͍ͯΔɽ [train_short_audio] ֶशσʔλɽௗछ͝ͱʹԻ੠͕·ͱΊΒΕ͍ͯΔɽ 
 ߹ܭͰ62874݅ͷԻ੠σʔλɽ [train_soundscapes] test_soundscapesʹ͍ۙԻڹυϝΠϯΛ࣋ͭɽ 
 10෼×20݅͋Γɼtest_soundscapesΛ࿥Իͨ͠4ͭͷ৔ॴͷ 
 ͏ͪ2ͭͷ৔ॴͰ࿥ΒΕͨԻ੠ [train_metadata.csv] train_short_audioʹର͢Δmetadataɽshape͸(62784, 14) [train_soundscape_labels.csv / test.csv] 10෼ͷϑΝΠϧΛ5ඵηάϝϯτʹ෼͚ͨࡍͷࠎ૊ΈΛఏڙɽ 
 train_soundscape_labels.csv͸train_soundscapesʹɼ 
 test.csv͸test_soundscapesʹରԠ͠ɼ 
 લऀʹͷΈਖ਼ղϥϕϧ͕෇͍͍ͯΔɽ

Slide 7

Slide 7 text

ఏग़ܗࣜ & ධՁࢦඪ • 1ͭͷηάϝϯτʹରͯ͠ෳ਺ͷௗछΛ༧ଌͱͯ͠ఏग़Մೳ • ௗ͕໐͍͍ͯͳ͍ηάϝϯτʹରͯ͠͸"nocall"ͱ͍͏จࣈྻΛఏग़ • ධՁࢦඪ͸ߦ͝ͱͷmicro-F1είΞͷฏۉ

Slide 8

Slide 8 text

2020೥ͷௗίϯϖͱͷࠩҟ

Slide 9

Slide 9 text

2020೥ͷௗίϯϖͱͷࠩҟ • train_soundscapesͷଘࡏ 
 ʔ train_short_audioͱtestσʔλͷؒʹ͸ԻڹυϝΠϯͷ͕ࠩେ͖͍ 
 ʔ ࠓճͷίϯϖͰ͸ΑΓtestσʔλʹ͍ۙԻڹυϝΠϯΛ࣋ͭtrain_soundscapes͕༩͑ΒΕͨ 
 ʔ validation༻్Ͱ༻͍ΒΕΔ͜ͱ͕ଟ͔͕ͬͨɼதʹ͸޻෉ֶͯ͠शʹ༻͍Δਓ΋͍ͨ • testσʔλͷҐஔ৘ใʹΞΫηεͰ͖ͨ 
 ʔ testσʔλͷ֤ϑΝΠϧ໊ʹ৔ॴͷ৘ใ͕ೖ͍ͬͯΔ͜ͱ͸อূ͞Ε͍ͯͨ (೔෇΋) 
 ʔ ैͬͯɼ͜ΕΒͷ৘ใ΋ԿΒ͔ͷܗͰύΠϓϥΠϯʹ૊ΈࠐΉඞཁ͕͋ͬͨ (ࢀߟɿStarter and some thoughts by @hidehisaarai1213)

Slide 10

Slide 10 text

EDA (train_short_audioฤ)

Slide 11

Slide 11 text

1ϑΝΠϧ͋ͨΓͷԻ੠ͷ௕͞ (train_short_audio) ※ train_short_audioͷ͏ͪ1000݅ ( / 62874݅) ͷԻ੠ϑΝΠϧΛϥϯμϜαϯϓϦϯά ※ ԣ࣠ : 1ϑΝΠϧ͋ͨΓͷԻ੠ͷ௕͞ [ඵ] ※ ॎ࣠ : ౓਺ (߹ܭ1000݅)

Slide 12

Slide 12 text

1छͷௗʹରͯ͠Կ݅ͷԻ੠ϑΝΠϧ͕͋Δʁ (train_short_audio) ※ ԣ࣠ : ֤ௗʹ͓͚ΔԻ੠ϑΝΠϧ਺ (train_short_audio಺) ※ ॎ࣠ : ౓਺ (߹ܭ397छ)

Slide 13

Slide 13 text

secondary labelsʹ͸ܽଛ͕͋Δͱ໌ه͞Ε͍ͯΔ (train_short_audio) (BirdCLEF2021: Exploring the dataΑΓҾ༻)

Slide 14

Slide 14 text

EDA (soundscapesฤ)

Slide 15

Slide 15 text

શߦnocallఏग़ͰPublicLBͷnocall཰͸෼͔Δ BirdCLEF2021 (ࢀߟɿ2020೥ͷௗίϯϖ) Private Private Public Public

Slide 16

Slide 16 text

ҰํͰtrain_soundscapesͰ͸΍΍ߴ͍nocall཰

Slide 17

Slide 17 text

train_soundscapesʹ͓͚Δ໨తม਺ͷ෼෍ (nocallࠐΈ) • ѹ౗తʹnocall͕ଟ͍ • 2छҎ্໐͍͍ͯΔ5ඵηάϝϯτ΋͋Δ

Slide 18

Slide 18 text

• Α͘؍ଌ͞ΕΔௗछͷ૊Έ߹Θͤ΋͋Δ train_soundscapesʹ͓͚Δ໨తม਺ͷ෼෍ (nocall࡟আ൛)

Slide 19

Slide 19 text

train_soundscapesʹ͓͍ͯ5ඵηάϝϯτ಺Ͱಉ࣌ʹ໐͍͍ͯΔௗͷ਺

Slide 20

Slide 20 text

Ի੠ೝࣝλεΫͷϕʔγοΫͳղ๏

Slide 21

Slide 21 text

Ի੠ೝࣝλεΫͷϕʔγοΫͳղ๏ Ի੠σʔλ͸ԣ͕࣠࣌ؒɼॎ͕࣠प೾਺ɼ 
 ֤ϐΫηϧ͕৴߸੒෼ͷڧ౓Λࣔ͢ը૾ 
 (εϖΫτϩάϥϜ) ʹม׵ՄೳͰ͋Γɼ 
 ͜Εʹରͯ͠CNNͳͲΛదԠ͢Δͱ 
 ैདྷ௨Γͷը૾ॲཧͱͯ͠ѻ͑Δɽ ※ ຊίϯϖͰ͸ॎ࣠(प೾਺)ʹϝϧई౓Λ࢖༻ͨ͠ϝϧεϖΫτϩάϥϜ͕Α͘࢖ΘΕͨ 
 ※ ϝϧई౓ͱ͸ɿԻͷप೾਺ʹؔͯ͠ɼ͜ͷई౓্Ͱͷ͕ࠩಉ͡Ͱ͋Ε͹ਓ͕ؒࣖͰײ͡ΔԻͷߴ͞ͷࠩ΋ಉ͡ʹͳΔ CNN (ը૾͸BirdCLEF2021: Processing audio dataΑΓҾ༻)

Slide 22

Slide 22 text

ຊίϯϖಛ༗ͷΫη

Slide 23

Slide 23 text

ຊίϯϖಛ༗ͷΫη • train_short_audioʹରͯ͠weak label͔͠ৼΒΕ͍ͯͳ͍ (weak label໰୊) 
 ʔ ਺ेඵͷԻ੠σʔλશମʹରͯ͠ϥϕϧ͕෇༩͞Ε͍ͯΔ 
 ʔ 5ඵ۠੾ΓͷηάϝϯτϨϕϧͰͲͷௗ͕໐͍͍ͯΔ͔͕෼͔Βͳ͍ • train_short_audioͷҰ෦Ͱϥϕϧͷܽଛ͕͋Δ (noisy label໰୊) 
 ʔ ಛʹsecondary_labels(※)ʹ͸ܽଛ͕͋Δͱ໌ه͞Ε͍ͯΔ • ࿥Ի೔΍৔ॴͷ৘ใͳͲͷmetadata΋ԿΒ͔ͷܗͰ৫ΓࠐΉඞཁ͕͋Δ (metadataͷ৫ࠐ) • ༧ଌର৅ͷલޙͷηάϝϯτͰௗ͕໐͍͍ͯΔ͔ͱ͍͏৘ใ΋ҙຯΛ࣋ͭՄೳੑ͕͋Δ 
 (ηάϝϯτલޙ৘ใͷ৫ࠐ) • train_soundscapesͱtest_soundscapesͰnocall཰͕େ͖͘ҟͳΔ (CVઓཱུ֬ͷࠔ೉) ※ train_short_audioͷϥϕϧʹ͸primary_labelͱsedondary_labelsͷ2छྨ͕͋Δ

Slide 24

Slide 24 text

্Ґऀͷղ๏ top solutions and approaches ্هͷdiscussionʹ্Ґऀղ๏΁ͷϦϯΫ͕·ͱ·͍ͬͯ·͢

Slide 25

Slide 25 text

1st place (ours!) [1st Place] Quick Solution [1st Place] Detailed Solution

Slide 26

Slide 26 text

tl;dr 1st stage : ֎෦σʔλ(freefield1010)Λ࢖ͬͯbinary nocall detector ࡞੒ (1 : Կ͔ௗ໐͍ͯΔ / 0 : nocall) 
 2nd stage : nocall detectorΛ࢖ͬͯtrain_short_audio͔Βnocall෦෼ͷweightΛݮΒ্ͨ͠Ͱ397࣍ݩϚϧνϥϕϧ෼ྨثΛ࡞੒ 
 3rd stage : nocall detectorͷ݁Ռɼmetadataɼ2nd stageͷ݁ՌͳͲ͔Βࣗલtable competitionΛ࡞੒ ࠷ऴతʹࣗલtable competitionʹ͢Δ͜ͱͰ 
 weak label໰୊ɼnoisy label໰୊ɼmetadataͷ৫ࠐɼηάϝϯτલޙ৘ใͷ৫ࠐͳͲΛ·Δͬͱղܾʂʂ ※ Inference Part ͷΈͷུ֓Ͱ͋Γɼ1st stage෦෼͸লུ͍ͯ͠·͢

Slide 27

Slide 27 text

ͳͥtableԽͰweak label໰୊ & noisy label໰୊͕ղܾ͞ΕΔʁ • 3rd stageͷtargetม਺ (0 : ͸ͣΕߦ / 1 : ͋ͨΓߦ) ͸ҎԼͷྲྀΕͰܾఆ͞ΕΔ • ਺ेඵͷԻ੠σʔλʹରͯ͠෇༩͞Εͨprimary & secondary labelsʹରͯ͠ηάϝϯτ୯ҐͰ༧ଌ஋Λग़ͤΔ 
 nocall detectorͱϚϧνϥϕϧ෼ྨثͷग़ྗΛ૊Έ߹ΘͤΔ͜ͱͰweak label໰୊Λղܾ • Ծʹsecondary labelsʹܽଛ͕͋Δͱϥϕϧ0͕෇༩͞ΕΔ͕ϥϕϧ0ͷαϯϓϧ਺͸ൺֱతଟ͘ 
 noise͸͍͍ײ͡ʹຒ΋ΕΔ (noisy label໰୊ͷ؇࿨)

Slide 28

Slide 28 text

more details... • νʔϜϝΠτͷkami͞Μ (twitter : @634kami / kaggle : @kami634) ͕ҎԼʹ೔ຊޠͰղ๏Λ·ͱΊͯ͘Ε·ͨ͠ 
 Kaggle ͷௗίϯϖͰ1ҐΛऔͬͨ࿩ɿBirdCLEF 2021 ༏উղ๏

Slide 29

Slide 29 text

2nd place 2nd place solution

Slide 30

Slide 30 text

(2nd place solutionΑΓҾ༻) 2nd place

Slide 31

Slide 31 text

2nd place • train_short_audio͔Β30ඵ୯ҐͰநग़ͨ͠ͷͪɼ5ඵ͝ͱʹ۠੾Γɼmixup͢Δ 
 (weak label໰୊΁ͷରԠ) • train_soundscapesͷ͏ͪ10෼ؒશ͘ௗ͕໐͔ͳ͍Ի੠ϑΝΠϧ3ͭͷআ֎ & ϒʔτετϥοϓαϯϓϦϯά 
 (ϩόετͳCVઓུ) • label smoothing & metadataதͷratingྻΛ༻͍ͯॏΈ෇͚ (noisy label໰୊΁ͷରԠ) • ᮢ஋બ୒ͷࡍͷtips 
 ʔ LBͰ͸CVΑΓnocall཰͕௿͍ͷͰᮢ஋ΛԼ͛ͯௗΛଟ͘༧ଌ 
 ʔ ϞσϧؒͰ֬཰஋ͷ෼෍͕ҟͳΔͨΊ୯Ұͷ֬཰஋Λᮢ஋ͱ͢Δͷ͸φϯηϯε 
 ΑͬͯɼύʔηϯλΠϧϕʔεͷᮢ஋Λ࢖༻ • ͦͷଞ (ޙॲཧ) 
 ʔ ௗ͝ͱͷฏۉ༧ଌ֬཰͔Βݸʑͷ֬཰஋Λमਖ਼ 
 ʔ લޙηάϝϯτ৘ใΛ࢖༻ 
 ʔ nocall detectorͷ݁ՌΛՃຯ 
 ʔ ࣌ͱ৔ॴͷ৘ใ͔Β͋Γ͑ͳ͍ௗछΛ༧ଌΛ͍ͯ͠Δ৔߹͸࡟আ (metadataͷ৫ࠐ)

Slide 32

Slide 32 text

4th place 4th place solution

Slide 33

Slide 33 text

4th place • SEDϞσϧΛ࢖༻ɼೖྗ͸10-30ඵ (weak label໰୊΁ͷରԠ) 
 (ࢀߟɿIntroduction to Sound Event Detection by @hidehisaarai1213) • ͜ͷํ΋mixupΛ࢖༻ • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ) • ༧ଌର৅ͷ5ඵηάϝϯτͱͦΕΛத৺ͱ͢Δ30ඵηάϝϯτͷͦΕͧΕʹର͢ΔSEDͷग़ྗ Λ૊Έ߹Θͤͯ࠷ऴग़ྗͱͨ͠ (ηάϝϯτલޙ৘ใͷ৫ࠐ) 
 ʔ 5ඵηάϝϯτʹରͯ͠͸খ͞ͳᮢ஋ɼ30ඵηάϝϯτʹରͯ͠͸େ͖ͳᮢ஋Λ࢖༻ • 2Ґͷղ๏ͱಉ༷ʹɼ࣌ͱ৔ॴͷ৘ใ͔Β؍ଌ͞ΕΔՄೳੑ͕௿͍ͱ൑அͨ͠ௗछ͸࡟আ 
 (metadataͷ৫ࠐ)

Slide 34

Slide 34 text

5th place 5th place solution

Slide 35

Slide 35 text

5th place • 2020೥ͷௗίϯϖͰ2ҐͩͬͨํͰ͋Γɼࠓճ΋ͦΕΛϕʔεͱ͍ͯͨ͠ • લճ͔Βͷվળ఺ɿSEDʹมߋͰ +1% (※1) / ᮢ஋ௐ੔ʹΑΓ +1% / Ξϯαϯϒϧํ๏վྑͰ +1% 
 (+ ஍Ҭ৘ใΛ΋ͱʹ༧ଌϥϕϧͷߜΓࠐΈ΋ͨ͠Έ͍ͨ(※2) ) • augmentation͕ಛ௃తɿը૾Λ0.5-3৐ / nഒ଎ / Ӎ΍ձ࿩ͳͲͷԻΛ௥Ճ / ϊΠζ௥Ճ / 0.5ͷ֬཰Ͱप೾਺ௐ੔ 
 (1-4Ґ͸mixup΍ϊΠζ௥Ճʹཹ·Δҹ৅) • primary label͸ϥϕϧ1, secondary labels͸ϥϕϧ0.3Λ෇༩ • 1ͭͷηάϝϯτͰ؍ଌ͞Εͨௗ͸10෼ͷԻ੠ϑΝΠϧશମͰर্͍͛΍͘͢ͳΔΑ͏ௐ੔ (※3) ※1 : weak label໰୊΁ͷରԠ ※2 : metadataͷ৫ࠐ ※3 : ηάϝϯτલޙ৘ใͷ৫ࠐ

Slide 36

Slide 36 text

8th place 8th place writeup

Slide 37

Slide 37 text

8th place • 2020೥ͷௗίϯϖͰ6Ґͩͬͨํɼࠓճ΋SEDΛ࢖༻ (weak label໰୊΁ͷରԠ) • ֶश࣌͸5ඵ or 20ඵηάϝϯτɼਪ࿦࣌͸40ඵηάϝϯτΛ࢖༻ɼ௕͍΄͏͕Α͔ͬͨ 
 ·ͨɼਪ࿦͸0-40ඵͰߦͬͨ࣍ʹ20-60ඵͱ͍͏෩ʹoverlapΛ΋ͨͤͨ (ηάϝϯτલޙ৘ใͷ৫ࠐ) • augmentationɿΨ΢γΞϯϊΠζɼϐϯΫϊΠζɼϘϦϡʔϜௐ੔ɼϐονγϑτ 
 (mixup΋্ख͘ߦ͕ͬͨܭࢉࢿݯͷ໰୊Ͱ࠷ऴఏग़ʹ͸૊ΈࠐΊͳ͔ͬͨͦ͏) • ଛࣦؔ਺͕ಛ௃త (BCEFocal2WayLoss) • primary labelͱsecondary labels͸ಉ͡Α͏ʹѻͬͨ • psudo labelingΛ࣮ߦ (noisy label໰୊΁ͷରԠ) • ᮢ஋͸call thresholdͱnocall thresholdͷ2͕ͭଘࡏ͠ɼcall thresholdΛ௒͑ͨௗछ͸ཅੑͱ͢ΔҰํͰ 
 શͯͷௗछʹ͓͍ͯnocall thresholdΛ௒͑ͳ͔ͬͨηάϝϯτʹ͸nocall΋෇༩ (ௗϥϕϧͱnocall͕ڞଘ͠͏Δ) • ஍Ҭ৘ใ͔Βଘࡏ͢Δ͸͕ͣͳ͍ௗछ͸༧ଌ͍ͯͯ͠΋আ֎ (metadataͷ৫ࠐ) • ௗ͕໐͍͍ͯΔߦͱnocallߦʹ෼͚ͯF1είΞΛࢉग़͠0.54 * nocall_f1 + 0.46 * call_f1ͰCVΛಋग़ (ϩόετͳCVઓུ)

Slide 38

Slide 38 text

9th place 9th Place solution

Slide 39

Slide 39 text

9th place • ֶश࣌ͷೖྗ͸5-7ඵηάϝϯτ • secondary labelsͷॏΈ͸খͨ͘͞͠ • mixup࢖༻ • ՄೳͳݶΓͷଟ༷ੑΛ΋ͨͤͨ 
 ʔ ࣌ؒ෼ղೳͷҟͳΔmel-spectrogramɼhop_length͸200ͱ320 
 ʔ ༷ʑͳbackbone 
 ʔ augmentationɿwhite noise, pink noise, band noise, nocall clipsɼmel-spectrogramը૾ͷྦྷ৐ • ޙॲཧ 
 ʔ 10෼ͷԻ੠σʔλશମʹ͓͚Δ֤ௗͷ໐͘࠷େ֬཰ or ฏۉ֬཰Ͱޙॲཧ (ηάϝϯτલޙ৘ใͷ৫ࠐ) 
 ʔ ஍Ҭ৘ใ͔ΒͲΕ͚֤ͩௗ͕໐͘Մೳੑ͕͋Δ͔ධՁͯͦ͠ͷ݁ՌͰޙॲཧ (metadataͷ৫ࠐ) 
 ʔ 1೔ͷؒͰ֤ௗ͕໐͘࠷େ֬཰Λ࢖ͬͯޙॲཧ (metadataͷ৫ࠐ) • Squeeze width of test soundscapes by 2-5% (mostly to reverse far field effects) • ͜ͷํ΋ᮢ஋Λ2ͭ(call, nocall)ઃఆ͠ɼௗϥϕϧͱnocallͷڞଘΛೝΊͨ

Slide 40

Slide 40 text

11th place My journey (11th solution)

Slide 41

Slide 41 text

11th place • Public LBͰ௕͍͜ͱटҐΛಠ઎͞Ε͍ͯͨCPMP͞Μ • 2020೥ͷௗίϯϖͰ18ҐɼRainforestίϯϖͰ11ҐΛͱΒΕͨํͰ͋Γ྆ऀͷղ๏Λmixͨ͠΋ͷΛϕʔεͱͨͦ͠͏ 
 ʔ 2020೥ͷௗίϯϖͷղ๏ : 18th place solution: efficientnet b3 
 ʔ Rainforestίϯϖͷղ๏ : 11th place, The 0.931 Magic Explained: Image Classification • 8Ґͷղ๏ͱಉ͘͡0.54 * nocall_f1 + 0.46 * call_f1ʹͯCVΛࢉग़ (ϩόετͳCVઓུ) • ΞϯαϯϒϧͰ࣮֬ʹείΞ্͕ঢ͢Δͱա৴͓ͯ͠Γίϯϖऴྃ਺೔લ·ͰΞϯαϯϒϧverΛఏग़͠ͳ͔ͬͨ͜ͱ Λޙչͳ͍ͬͯ͞Δ (࣮ࡍʹ͸Ξϯαϯϒϧ͕ޮ͔ͳ͔ͬͨͦ͏)

Slide 42

Slide 42 text

͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ