vak: software for automated annotation of vocalizations with neural networks

David Nicholson, Emory University, Biology Dept. NickleDave @nicholdav vak वाच्
vāc software for automated annotation of vocalizations with neural networks

Acknowledgements Gardner lab - Yarden Cohen - Alexa Sanchioni -
Emily Mallaber - Vika Skidanova Sober lab - Jonah Queen Vika Alexa Emily Yarden Jonah

Outline 1. Introduction ◦ Why automate annotation of vocalizations? 2.
Methods ◦ software we made ◦ machine learning 3. Results 4. Discussion ◦ Plans for software development (now that you care)

Introduction What do I mean by "annotation"? Example: birdsong (but
this deﬁnition and this software can apply to any type of vocalization)

Introduction Why automate annotation of vocalizations? 1. save time

Introduction Why automate annotation of vocalizations? 1. save time 2.
answer research questions a. answer usual questions, but increase statistical power

Introduction Why automate annotation of vocalizations? 1. save time 2.
answer research questions a. answer usual questions, but increase statistical power b. answer new questions we couldn't answer before

Introduction Do we really need to automate annotation of vocalizations?

Introduction What would a good auto-labeler do for us? Criterion
Software we developed to meet this criterion • segment audio into vocalizations • predict labels for segments TweetyNet (neural network)

Software we developed to meet this criterion • segment audio into vocalizations • predict labels for segments TweetyNet (neural network) • make it easy for anyone to use vak (library)

Software we developed to meet this criterion • segment audio into vocalizations • predict labels for segments TweetyNet (neural network) • make it easy for anyone to use vak (library) • work with many different data formats vak, crowsetta (libraries)

TweetyNet: a hybrid convolutional-recurrent neural network that segments and labels
birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet Methods convolutional layers

Methods TweetyNet: a hybrid convolutional-recurrent neural network that segments and
labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet convolutional layers recurrent layers

Methods TweetyNet: a hybrid convolutional-recurrent neural network that segments and
labels birdsong and other vocalizations https://github.com/yardencsGitHub/tweetynet convolutional layers recurrent layers output layers Labels

Methods Question: how do I use TweetyNet? Doing science is
already hard enough, I don't want to have to learn how to program neural networks on top of that

Methods vak: automated annotation of vocalizations for everybody

vak Methods vak: automated annotation of vocalizations for everybody spectrograms
in array files audio files train annotation files Dataset predict

Methods What would a good auto-labeler do for us? A
case study: Annotate many songs sung by several individuals of two species of songbirds, Bengalese ﬁnches and canaries

Methods 1. benchmark our software on publicly available repositories of
Bengalese Finch song ◦ compare error with previously published hybrid HMM-neural network model 2. apply our software to canary song (lengthy bouts, large vocabulary of song syllables) ◦ currently no automated annotation methods available

Methods 1. Measure error on a separate test set 2.
Plot error as a function of training set size a. what's the lowest error we can get with the least amount of data

Methods Error metrics: • frame error ◦ "For every time
bin, does the predicted label equal the true label?" ◦ Frame error ranges between 0 (no mistakes) and 1 (all mistakes) ◦ shown as a percent

Methods Error metrics: • frame error

Results TweetyNet achieves low frame error across individuals

Methods (again) Error metrics: • syllable error rate ◦ analogous
to word error rate used in speech recognition / speech-to-text

to word error rate used in speech recognition / speech-to-text ◦ an edit distance: how many edits do I have to make to the predicted sequence to recover the true sequence?

to word error rate used in speech recognition / speech-to-text ◦ an edit distance: how many edits do I have to make to the predictioned sequence to recover the true sequence? https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f

Methods (again) Error metrics: • syllable error rate ◦ Syllable
error rate can be greater than one. It is a distance.

Methods (again) Error metrics: • syllable error rate

Results TweetyNet achieves lower syllable error rate with less training
data

Results TweetyNet achieves low error across days when trained with
just the ﬁrst minute of song

Results TweetyNet is accurate across large datasets of canary song
with many syllables and lengthy bouts

Conclusion • Using the vak library, we show that TweetyNet
◦ achieves low frame error across individual Bengalese ﬁnches ◦ achieves a lower syllable error rate with les data than a previously proposed model ◦ can for the ﬁrst time automate annotation of Bengalese • Our vision for vak is: ◦ an open-source, community-developed tool ◦ that will enable researchers studying vocalizations to perform high-throughput automated annotation ◦ using neural networks, without needing detailed knowledge of machine learning

vak वाच् vāc • https://github.com/NickleDave/vak • https://github.com/yardencsGitHub/tweetynet • https://crowsetta.readthedocs.io/en/latest/ David
Nicholson, Emory University, Biology Dept. NickleDave @nicholdav Yarden Cohen, Boston University, Biology Dept. yardencsGitHub @YardenJCohen

vak: software for automated annotation of vocal...

vak: software for automated annotation of vocalizations with neural networks

More Decks by David Nicholson

Other Decks in Research

Featured

Transcript