Upgrade to Pro — share decks privately, control downloads, hide ads and more …

vak: software for automated annotation of vocalizations with neural networks

vak: software for automated annotation of vocalizations with neural networks

David Nicholson

October 21, 2019
Tweet

More Decks by David Nicholson

Other Decks in Research

Transcript

  1. David Nicholson, Emory University, Biology Dept. NickleDave @nicholdav vak वाच्

    vāc software for automated annotation of vocalizations with neural networks
  2. Acknowledgements Gardner lab - Yarden Cohen - Alexa Sanchioni -

    Emily Mallaber - Vika Skidanova Sober lab - Jonah Queen Vika Alexa Emily Yarden Jonah
  3. Outline 1. Introduction ◦ Why automate annotation of vocalizations? 2.

    Methods ◦ software we made ◦ machine learning 3. Results 4. Discussion ◦ Plans for software development (now that you care)
  4. Introduction What do I mean by "annotation"? Example: birdsong (but

    this definition and this software can apply to any type of vocalization)
  5. Introduction Why automate annotation of vocalizations? 1. save time 2.

    answer research questions a. answer usual questions, but increase statistical power
  6. Introduction Why automate annotation of vocalizations? 1. save time 2.

    answer research questions a. answer usual questions, but increase statistical power b. answer new questions we couldn't answer before
  7. Introduction What would a good auto-labeler do for us? Criterion

    Software we developed to meet this criterion • segment audio into vocalizations • predict labels for segments TweetyNet (neural network)
  8. Introduction What would a good auto-labeler do for us? Criterion

    Software we developed to meet this criterion • segment audio into vocalizations • predict labels for segments TweetyNet (neural network) • make it easy for anyone to use vak (library)
  9. Introduction What would a good auto-labeler do for us? Criterion

    Software we developed to meet this criterion • segment audio into vocalizations • predict labels for segments TweetyNet (neural network) • make it easy for anyone to use vak (library) • work with many different data formats vak, crowsetta (libraries)
  10. TweetyNet: a hybrid convolutional-recurrent neural network that segments and labels

    birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet Methods convolutional layers
  11. Methods TweetyNet: a hybrid convolutional-recurrent neural network that segments and

    labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet convolutional layers recurrent layers
  12. Methods TweetyNet: a hybrid convolutional-recurrent neural network that segments and

    labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet convolutional layers recurrent layers output layers Labels
  13. Methods Question: how do I use TweetyNet? Doing science is

    already hard enough, I don't want to have to learn how to program neural networks on top of that
  14. vak Methods vak: automated annotation of vocalizations for everybody spectrograms

    in array files audio files train annotation files Dataset predict
  15. Methods What would a good auto-labeler do for us? A

    case study: Annotate many songs sung by several individuals of two species of songbirds, Bengalese finches and canaries
  16. Methods 1. benchmark our software on publicly available repositories of

    Bengalese Finch song ◦ compare error with previously published hybrid HMM-neural network model 2. apply our software to canary song (lengthy bouts, large vocabulary of song syllables) ◦ currently no automated annotation methods available
  17. Methods 1. Measure error on a separate test set 2.

    Plot error as a function of training set size a. what's the lowest error we can get with the least amount of data
  18. Methods Error metrics: • frame error ◦ "For every time

    bin, does the predicted label equal the true label?" ◦ Frame error ranges between 0 (no mistakes) and 1 (all mistakes) ◦ shown as a percent
  19. Methods (again) Error metrics: • syllable error rate ◦ analogous

    to word error rate used in speech recognition / speech-to-text
  20. Methods (again) Error metrics: • syllable error rate ◦ analogous

    to word error rate used in speech recognition / speech-to-text ◦ an edit distance: how many edits do I have to make to the predicted sequence to recover the true sequence?
  21. Methods (again) Error metrics: • syllable error rate ◦ analogous

    to word error rate used in speech recognition / speech-to-text ◦ an edit distance: how many edits do I have to make to the predictioned sequence to recover the true sequence? https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f
  22. Methods (again) Error metrics: • syllable error rate ◦ analogous

    to word error rate used in speech recognition / speech-to-text ◦ an edit distance: how many edits do I have to make to the predictioned sequence to recover the true sequence? https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f
  23. Methods (again) Error metrics: • syllable error rate ◦ Syllable

    error rate can be greater than one. It is a distance.
  24. Conclusion • Using the vak library, we show that TweetyNet

    ◦ achieves low frame error across individual Bengalese finches ◦ achieves a lower syllable error rate with les data than a previously proposed model ◦ can for the first time automate annotation of Bengalese • Our vision for vak is: ◦ an open-source, community-developed tool ◦ that will enable researchers studying vocalizations to perform high-throughput automated annotation ◦ using neural networks, without needing detailed knowledge of machine learning
  25. vak वाच् vāc • https://github.com/NickleDave/vak • https://github.com/yardencsGitHub/tweetynet • https://crowsetta.readthedocs.io/en/latest/ David

    Nicholson, Emory University, Biology Dept. NickleDave @nicholdav Yarden Cohen, Boston University, Biology Dept. yardencsGitHub @YardenJCohen