Slide 1

Slide 1 text

David Nicholson, Emory University, Biology Dept. NickleDave @nicholdav vak वाच् vāc software for automated annotation of vocalizations with neural networks

Slide 2

Slide 2 text

Acknowledgements Gardner lab - Yarden Cohen - Alexa Sanchioni - Emily Mallaber - Vika Skidanova Sober lab - Jonah Queen Vika Alexa Emily Yarden Jonah

Slide 3

Slide 3 text

Outline 1. Introduction ○ Why automate annotation of vocalizations? 2. Methods ○ software we made ○ machine learning 3. Results 4. Discussion ○ Plans for software development (now that you care)

Slide 4

Slide 4 text

Introduction What do I mean by "annotation"? Example: birdsong (but this definition and this software can apply to any type of vocalization)

Slide 5

Slide 5 text

Introduction Why automate annotation of vocalizations? 1. save time

Slide 6

Slide 6 text

Introduction Why automate annotation of vocalizations? 1. save time 2. answer research questions a. answer usual questions, but increase statistical power

Slide 7

Slide 7 text

Introduction Why automate annotation of vocalizations? 1. save time 2. answer research questions a. answer usual questions, but increase statistical power b. answer new questions we couldn't answer before

Slide 8

Slide 8 text

Introduction Do we really need to automate annotation of vocalizations?

Slide 9

Slide 9 text

Introduction What would a good auto-labeler do for us? Criterion Software we developed to meet this criterion ● segment audio into vocalizations ● predict labels for segments TweetyNet (neural network)

Slide 10

Slide 10 text

Introduction What would a good auto-labeler do for us? Criterion Software we developed to meet this criterion ● segment audio into vocalizations ● predict labels for segments TweetyNet (neural network) ● make it easy for anyone to use vak (library)

Slide 11

Slide 11 text

Introduction What would a good auto-labeler do for us? Criterion Software we developed to meet this criterion ● segment audio into vocalizations ● predict labels for segments TweetyNet (neural network) ● make it easy for anyone to use vak (library) ● work with many different data formats vak, crowsetta (libraries)

Slide 12

Slide 12 text

TweetyNet: a hybrid convolutional-recurrent neural network that segments and labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet Methods convolutional layers

Slide 13

Slide 13 text

Methods TweetyNet: a hybrid convolutional-recurrent neural network that segments and labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet convolutional layers recurrent layers

Slide 14

Slide 14 text

Methods TweetyNet: a hybrid convolutional-recurrent neural network that segments and labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet convolutional layers recurrent layers output layers Labels

Slide 15

Slide 15 text

Methods Question: how do I use TweetyNet? Doing science is already hard enough, I don't want to have to learn how to program neural networks on top of that

Slide 16

Slide 16 text

Methods vak: automated annotation of vocalizations for everybody

Slide 17

Slide 17 text

vak Methods vak: automated annotation of vocalizations for everybody spectrograms in array files audio files train annotation files Dataset predict

Slide 18

Slide 18 text

Methods What would a good auto-labeler do for us? A case study: Annotate many songs sung by several individuals of two species of songbirds, Bengalese finches and canaries

Slide 19

Slide 19 text

Methods 1. benchmark our software on publicly available repositories of Bengalese Finch song ○ compare error with previously published hybrid HMM-neural network model 2. apply our software to canary song (lengthy bouts, large vocabulary of song syllables) ○ currently no automated annotation methods available

Slide 20

Slide 20 text

Methods 1. Measure error on a separate test set 2. Plot error as a function of training set size a. what's the lowest error we can get with the least amount of data

Slide 21

Slide 21 text

Methods Error metrics: ● frame error ○ "For every time bin, does the predicted label equal the true label?" ○ Frame error ranges between 0 (no mistakes) and 1 (all mistakes) ○ shown as a percent

Slide 22

Slide 22 text

Methods Error metrics: ● frame error

Slide 23

Slide 23 text

Results TweetyNet achieves low frame error across individuals

Slide 24

Slide 24 text

Methods (again) Error metrics: ● syllable error rate ○ analogous to word error rate used in speech recognition / speech-to-text

Slide 25

Slide 25 text

Methods (again) Error metrics: ● syllable error rate ○ analogous to word error rate used in speech recognition / speech-to-text ○ an edit distance: how many edits do I have to make to the predicted sequence to recover the true sequence?

Slide 26

Slide 26 text

Methods (again) Error metrics: ● syllable error rate ○ analogous to word error rate used in speech recognition / speech-to-text ○ an edit distance: how many edits do I have to make to the predictioned sequence to recover the true sequence? https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f

Slide 27

Slide 27 text

Methods (again) Error metrics: ● syllable error rate ○ analogous to word error rate used in speech recognition / speech-to-text ○ an edit distance: how many edits do I have to make to the predictioned sequence to recover the true sequence? https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f

Slide 28

Slide 28 text

Methods (again) Error metrics: ● syllable error rate ○ Syllable error rate can be greater than one. It is a distance.

Slide 29

Slide 29 text

Methods (again) Error metrics: ● syllable error rate

Slide 30

Slide 30 text

Results TweetyNet achieves lower syllable error rate with less training data

Slide 31

Slide 31 text

Results TweetyNet achieves lower syllable error rate with less training data

Slide 32

Slide 32 text

Results TweetyNet achieves low error across days when trained with just the first minute of song

Slide 33

Slide 33 text

Results TweetyNet is accurate across large datasets of canary song with many syllables and lengthy bouts

Slide 34

Slide 34 text

Conclusion ● Using the vak library, we show that TweetyNet ○ achieves low frame error across individual Bengalese finches ○ achieves a lower syllable error rate with les data than a previously proposed model ○ can for the first time automate annotation of Bengalese ● Our vision for vak is: ○ an open-source, community-developed tool ○ that will enable researchers studying vocalizations to perform high-throughput automated annotation ○ using neural networks, without needing detailed knowledge of machine learning

Slide 35

Slide 35 text

vak वाच् vāc ● https://github.com/NickleDave/vak ● https://github.com/yardencsGitHub/tweetynet ● https://crowsetta.readthedocs.io/en/latest/ David Nicholson, Emory University, Biology Dept. NickleDave @nicholdav Yarden Cohen, Boston University, Biology Dept. yardencsGitHub @YardenJCohen