Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tokyo Music AI Gathering - Feb 2025. Music AI ...

Avatar for Taketo Akama Taketo Akama
February 06, 2025
240

Tokyo Music AI Gathering - Feb 2025. Music AI that Resonates with People: Compositional Modeling and Controllability

This is the slide of Taketo Akama's talk at Tokyo Music AI Gathering - Feb 2025. https://lu.ma/gmzgnkkh

Avatar for Taketo Akama

Taketo Akama

February 06, 2025
Tweet

Transcript

  1. 2025/02/05 Taketo Akama Project Researcher Flow Machines Studio Tokyo, Sony

    Computer Science Laboratories, Inc. Tokyo Music AI Gathering - Feb 2025 Music AI that Resonates with People: Compositional Modeling and Controllability
  2. 2 AI and humans collaborate and co-create ⚫Individuals focus on

    value-added activities task1 task2 task3 task1 task2 task3 Above task images are generated by DALL-E 3
  3. 3 AI and humans collaborate and co-create ⚫AI assists challenging

    activities for humans complex task overwhelming task audio transformation synthesizer parameter manipulation statistical analysis causal inference Music library Audio on the web Above images are generated by DALL-E 3
  4. 4 Music AI elevates the value of music ⚫Our music

    understanding/expressivity are enhanced analysis generation understanding expressivity Music Music Components MIDI Timbre Lyrics ・・・
  5. 5 Music AI elevates the value of music ⚫Expand opportunities

    for application of music analysis generation context of music usage video theme ・・・ Music Music Components MIDI Timbre Lyrics ・・・
  6. 6 To realize such a future, what challenges for Music

    AI? ⚫Understanding music compositionally like humans ⚫Interpreting human intent (=controllability)
  7. 7 Our focus: compositionality and controllability Generation Analysis Generation Music

    Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・
  8. 8 Issue: out-of-domain performance Generation Analysis Generation Music Music Library

    Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・
  9. 10 Learning without MIDI-audio paired data ⚫CoSaRef: MIDI to Audio

    Synthesis ✓Annotation-free: MIDI-audio paired data is NOT required ☺ ✓Performance comparable to MIDI-DDSP ☺ ✓Timbre controllability ☺ ✓Align with DAW music creation ☺ Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement, Osamu Take and Taketo Akama, arXiv 2024 Paper: https://arxiv.org/abs/2410.16785
  10. 11 Learning without MIDI-audio paired data ⚫Audio to MIDI Transcription

    ✓Annotation-free: MIDI-audio paired data is NOT required ☺ ✓Performance comparable to SOTAs in out-of-domain setting ☺ Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion, Gakusei Sato and Taketo Akama, ICME 2024 DEMO: https://complex-degree-f38.notion.site/Annotation-free-AMT-Demo-33db7972162846059c2118e3fbc9db75
  11. 12 Our focus: compositionality and controllability Generation Analysis Generation Music

    Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・
  12. 13 Our focus: compositionality and controllability Generation Analysis Generation Music

    Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・
  13. 17 Our focus: compositionality and controllability Generation Analysis Generation Music

    Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・
  14. 18 Lower hierarchical level generation: MIDI & Timbre Generation Analysis

    Generation Music Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・
  15. 19 MIDI generation controllability ⚫Music Proofreading with RefinPaint ✓inpainting doesn’t

    tell us where to modify • RefinPaint does ☺ ✓Annotation-free: Generated data to simulate lower quality composition ☺ Music Proofreading with RefinPaint: Where and How to Modify Compositions given Context, Pedro Ramoneda, Martin Rocamora, and Taketo Akama, ISMIR 2024 DEMO: https://refinpaint.github.io/
  16. 20 Timbre generation: Neural Synthesizer ⚫GANStrument ✓CGANs with encoder •

    Accept user input & interpolate to create new timbre ✓Outperform GAN inversion baselines GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning, Gaku Narita, Junichi Shimizu, and Taketo Akama, ICASSP 2023 DEMO: https://ganstrument.github.io/ganstrument-demo/ Website: https://www.flow-machines.com/flow-machines_update_ganstrument/ https://cslmusicteam.sony.fr/prototypes/ganstrument/ Article: https://the-decoder.com/sonys-ganstrument-turns-a-rooster-into-a-cello/
  17. 22 Timbre generation: Neural Synthesizer ⚫HyperGANStrument ✓Improve the trade-off: Reconstruction

    vs Interpolation HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks, Zhe Zhang and Taketo Akama, ICASSP2024 DEMO: https://lukibibi.notion.site/Demo-Page-for-HyperGANStrument-b7cf7b02ddfd4831ac8c64e9e246642b
  18. 23 Our focus: compositionality and controllability Generation Analysis Generation Music

    Music Library Brain & Body Signal Retrieval Generation, Analysis Music Components MIDI Timbre Lyrics ・・・
  19. 24 More natural control of music? Generation Analysis Generation Music

    Music Library Brain & Body Signal Retrieval Generation, Analysis Music Components MIDI Timbre Lyrics ・・・
  20. 25 Brainwave to Music generation ⚫general music reconstruction with non-invasive

    EEG ⚫No manual preprocessing Reconstructed with our model Ground Truth Naturalistic Music Decoding from EEG Data via Latent Diffusion Models, Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Luca Cosmo, and Taketo Akama, ICASSP 2025 DEMO: https://emilianpostolache.com/brainwave/
  21. 26 Brainwave representation learning Predicting Artificial Neural Network Representations to

    Learn Recognition Model for Music Identification from Brain Recordings, Taketo Akama, Zhuohao Zhang, Pengcheng Li, Kotaro Hongo, Hiroaki Kitano, Shun Minamikawa, Natalia Polouliakh, arXiv 2024 ⚫Cortical & ANN representation are reported to be similar ⚫Predict ANN representation to learn brainwave representation Paper: https://arxiv.org/abs/2412.15560
  22. 27 Dance-to-Music project ⚫Artist & Scientist collaboration ✓Challenge: reversing the

    relationship • movement creates music X post: https://x.com/tuktoe/status/1790338620051873851
  23. 29 Interested? Please join us! ⚫Positions ✓Research Assistant Students ✓Part-time/Full-time

    ⚫Flexible working hours/days ⚫Remote work is possible Please DM to Taketo Akama