Upgrade to Pro — share decks privately, control downloads, hide ads and more …

About Google Magenta

Taein Kim
August 19, 2021

About Google Magenta

This is introductory presentation on Google Magenta project.

Taein Kim

August 19, 2021
Tweet

More Decks by Taein Kim

Other Decks in Research

Transcript

  1. Music AI with Google Magenta DSP Lab, Inha University Aug

    19, 2021 Taein Kim ([email protected]) Department of Electronic Engineering Inha University, South Korea
  2. 2

  3. Music AI with Google Magenta DSP Lab, Inha University Aug

    19, 2021 Taein Kim ([email protected]) Department of Electronic Engineering Inha University, South Korea
  4. Outline • Music and Audio Data • Google Magenta –

    Wave2Midi2Wave – MAESTRO dataset – Music Transformer • Conclusion
  5. Music and Audio Data 6 Symbolic Domain • Symbolic representation

    • (Human readable) • Sequence of: • Notes • Rhythm • Duration • Intensity Time Pitch • Musical Score
  6. Music and Audio Data 7 Symbolic Domain • Symbolic representation

    • (Computer readable) • List of events • Pitch • Start time (Onsets) • End time (Offsets) • Volume (Velocity) Time Events • Piano Roll / Midi
  7. Music and Audio Data 8 Time-Frequency Domain Time Frequency Magnitude

    Image: https://stackoverflow.com/questions/41457036/make-matplotlib-pyplot-color-bar-span-two-rows- alongside-waveform-and-specgram • Audio as an image • Lossy transformation • Multiple parameters need to be tuned
  8. Music and Audio Data 9 Data Summary (Raw) Audio Score

    Piano Roll (Midi) Spectrogram 1D N/A 2D 2D Time Symbolic Symbolic Time- Frequency Example Dimensionality Domain
  9. Music and Audio Data 10 Challenges Image: https://deepmind.com/blog/wavenet-generative-model-raw-audio/ • High

    dimensionality • 1 sec = 44 k data points • 1 min = 2.5 M data points • 1 song= 10 M data points • (Average novel = 100 k words) * • Multilevel dependencies • Short time – Timbre and pitch • Medium term – Rhythm • Long term – Songstructure • Non-linear perception of sound • Similar waveforms can sound very different • Dissimilar waveforms can sound the same * Source: https://self-publishingschool.com/how-many-words-in-a-novel/
  10. Music and Audio Data 11 Challenges • High dimensionality •

    1 sec = 44 k data points • 1 min = 2.5 M data points • 1 song= 10 M data points • (Average novel = 100 k words) * • Multilevel dependencies • Short time – Timbre and pitch • Medium term – Rhythm • Long term – Songstructure • Non-linear perception of sound • Similar waveforms can sound very different • Dissimilar waveforms can sound the same Image adapted from Hawthrone, 2019 :https://youtube.videoken.com/embed/1ohtSlux9EQ?tocitem=38
  11. Music and Audio Data 12 Challenges • High dimensionality •

    1 sec = 44 k data points • 1 min = 2.5 M data points • 1 song= 10 M data points • (Average novel = 100 k words) * • Multilevel dependencies • Short time – Timbre and pitch • Medium term – Rhythm • Long term – Songstructure • Non-linear perception of sound • Similar waveforms can sound very different • Dissimilar waveforms can sound the same Which one of these waveforms sound different? Adapted from Jordi Pons, Jesse Engel slides, 2019
  12. Google Magenta 13 "An open-source research project exploring the role

    of machine learning as a tool in the creative process." • Focus on generative models • Tools for artists and developers • Open-source code • Standalone demos
  13. Wave2Midi2Wave 15 Music audio with structured prior (notes) Score Performance

    (PianoRoll / Midi) Audio Slide adapted from https://youtube.videoken.com/embed/1ohtSlux9EQ?tocitem=38 Image:jp.Fotolia.com
  14. Wave2Midi2Wave 18 Performance matters • MIDI data of scores is

    vastly available • The score is quantized • Human performance adds: • Micro timings (variations in time) • Expression (variations in note velocity) • Very few datasets available Score (Quantized) Performance (Unquantized)
  15. Wave2Midi2Wave 19 MAESTRO Dataset Midi and Audio Edited for Synchronous

    TRacks and Organization Data: • Recorded performances of virtuoso piano competitions • Audio andMidi recordings • Midi data collected using Yamaha Disklavier* pianos • Audio and midi aligned with ~3 ms accuracy *Disklavier is a piano with a high-res MIDI capture system https://en.wikipedia.org/wiki/Disklavier http://piano-e-competition.com/ 1,814 Performances 430 Compositions 172.3 Hours of Audio and MIDI 102.8 GB 6.18 Million Notes
  16. Wave2Midi2Wave 20 MAESTRO Dataset 1,814 Performances 430 Compositions 172.3 Hours

    of Audio and MIDI 102.8 GB 6.18 Million Notes Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C. Z. A., Dieleman, S., ... & Eck, D. (2018). Enabling factorized piano music modeling and generation with the MAESTRO dataset. arXiv preprint arXiv:1810.12247.
  17. Wave2Midi2Wave 21 Onsets and Frames • Encoder, basedon an improved

    Onsets and Framesmodel. • Translates a mel-spectrogram into a piano roll. Loffsets = Similar to onsets (Mel-Spectrogram)
  18. Wave2Midi2Wave 22 Purple – Correct Estimation Red – Missed Estimations

    (False Negative) Blue – Incorrect Estimation (False Positive) Magenta - Overlapping Onset and Frames Black - Onset Predictions Cyan - Frames w/o Onset
  19. Wave2Midi2Wave 23 Magenta - Overlapping Onset and Frames Black -

    Onset Predictions Cyan - Frames without Onset Purple – Correct Estimation Red – Missed Estimations (False Negative) Blue – Incorrect Estimation (False Positive)
  20. Wave2Midi2Wave 24 • Wave2Midi – Translates raw audio into midi

    • Captures performance nuances • Problem: Very few datasets with matched audio and midi
  21. Music Transformer 26 as Language Model • Transformer with relative

    attention • Modified to work with very long sequences • Memory consumption from O(L2D) to O(LD). L = sequence length, D = hidden-state size Vanilla Attention Relative Attention
  22. Music Transformer 28 Sequence generation with an initial prime PerformanceRNN

    (LSTM) Vanilla Transformer Music Transformer (Relative Attention)
  23. Music Transformer 30 Piano Synthesis - WaveNet based model Other

    methods for synthesis (not based on machine-learning): • Concatenative synthesis / sampling • Physical Modeling (Animated) https://storage.googleapis.com/deepmind-live-cms/documents/BlogPost-Fig2-Anim-160908-r01.gif Image adapted from Aaron van den Oord, et al, 2016
  24. Music Transformer 31 WaveNet Demos Frédéric Chopin - Mazurka in

    D Major, Op. 33, No. 2 Original Audio WaveNet Other Synthesis https://storage.googleapis.com/magentadata/papers/maestro/index.html
  25. Conclusion 33 • Google Magenta project provides researches and codes

    to study music AI • DDSP makes you easily manipulate audio and music data • Wave2Midi can read music and convert to MIDI transcript • MAESTRO dataset could be a starting point to learn piano performances • Music Transformer can generate expert-level improvise with maintaining initial motive
  26. References • Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon,

    Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse H. Engel, Douglas Eck:, ”Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset”, ICLR 2019 • Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse H. Engel, Sageev Oore, Douglas Eck, ”Onsets and Frames: Dual-Objective Piano Transcription”, ISMIR 2018 • Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck: ”Music Transformer: Generating Music with Long-Term Structure”. ICLR (Poster) 2019 • J. W. Kim and J. P. Bello, “ADVERSARIAL LEARNING FOR IMPROVED ONSETS AND FRAMES MUSIC TRANSCRIPTION,” p. 8, 2019. http://archives.ismir.net/ismir2019/paper/000081.pdf • Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. “WaveNet: A Generative Model for Raw Audio.” ArXiv:1609.03499 [Cs], September 12, 2016. http://arxiv.org/abs/1609.03499 • Curtis Hawthorne, Talk at ICLR 2019, https://youtube.videoken.com/embed/1ohtSlux9EQ?tocitem=38 35