Upgrade to Pro — share decks privately, control downloads, hide ads and more …

hybrid-vocal-classifier tutorial

David Nicholson
November 13, 2017
180

hybrid-vocal-classifier tutorial

presentation part of tutorial: why and what hybrid-vocal-classifier is

David Nicholson

November 13, 2017
Tweet

Transcript

  1. • Birdsong • consists of elements called syllables • segment

    sound file into syllables by threshold crossings of amplitude
  2. • Each bird’s song similar to song of its tutor

    • But each individual of a species will have a unique song • So syllable “c” for Bengalese finch #1 is not syllable “c” for Bengalese finch #2
  3. • Problem: • Birds sing 100s of songs a day

    • Many more than can be labeled by hand
  4. • Problem: • Birds sing 100s of songs a day

    • Many more than can be labeled by hand • Previous work: • Sound Analysis Pro (soundanalysispro.com): • Great software! Open source! Drove field forward! • Avoids labeling song, produces similarity scores based on cross-correlation of spectrograms • Some groups automate labeling with clustering • Clustering doesn’t work very well for some bird species similarity spectrogram 1 spectrogram 2 100% 0%
  5. • Problem: • Birds sing 100s of songs a day

    • Many more than can be labeled by hand • Previous work: • Sound Analysis Pro • Other machine learning methods applied to Bengalese finch song • K nearest neighbors (k-NN) • Support Vector Machines (SVM) • Convolutional Neural Network (CNN)
  6. • Problem: • Birds sing 100s of songs a day

    • Many more than can be labeled by hand • Previous work: • Sound Analysis Pro • Other machine learning methods applied to Bengalese finch song • Hard to compare different machine learning methods • not all open-source • not all well-documented software • very little in the way of publicly available repository of song
  7. • Enter • a library that automates labeling vocalizations •

    what it is not: • Shazam for songbirds • what it is: • like voice to text but for songbirds
  8. • • open source • built on scipy-numpy stack •

    implements previously proposed approaches: • SVM and k-NN via scikit-learn • neural nets using Keras • easy to use, run on YAML config scripts • released with a large data set • hand-labeled data • well-segmented song • days, ~20k data points/day
  9. • • goals of library: 1. make it easy to

    label song in an automated way 2. make it easy to compare different previously-proposed machine learning methods for automated labeling of song 3. make it easy to test new machine learning methods
  10. • Comparing previous models: • Support Vector Machine Tachibana et

    al. 2014 https://en.wikipedia.org/wiki/Support_vector_machine#/media/File:Svm_max_sep_hyperplane_with_margin.png
  11. • Comparing previous models: • Support Vector Machine Tachibana et

    al. 2014 • features: • average spectra, cepstra plus • many from CUIDADO feature set (Peeters 2004): • spectral centroid • spectral spread • etc. https://en.wikipedia.org/wiki/Support_vector_machine#/media/File:Svm_max_sep_hyperplane_with_margin.png
  12. • Comparing previous models: • Convolutional neural net • architecture

    • convolutional layer • max-pooling • “window” layer • goal: segmentation + classification
  13. • Comparisons • Plot learning curves • accuracy v. training

    data (# of hand-labeled songs) • I want the best model for the least data
  14. • Comparisons • Plot learning curves • accuracy v. training

    data (# of hand-labeled songs) • I want the best model for the least data • 5-fold cross validation
  15. • Comparisons • Plot learning curves • accuracy v. training

    data (# of hand-labeled songs) • I want the best model for the least data • 5-fold cross validation • For each fold: random grab of n songs from training set, measure average accuracy across syllables
  16. • Using hybrid-vocal- classifier on our data set, I find:

    • SVM outperforms k- NN if a radial basis function is used
  17. • Using hybrid-vocal- classifier on our data set, I find:

    • A simple convolutional neural net with minimal training data outperforms SVM