Tokyo Music AI Gathering - Feb 2025. Music AI that Resonates with People: Compositional Modeling and Controllability

2025/02/05 Taketo Akama Project Researcher Flow Machines Studio Tokyo, Sony
Computer Science Laboratories, Inc. Tokyo Music AI Gathering - Feb 2025 Music AI that Resonates with People: Compositional Modeling and Controllability

2 AI and humans collaborate and co-create ⚫Individuals focus on
value-added activities task1 task2 task3 task1 task2 task3 Above task images are generated by DALL-E 3

3 AI and humans collaborate and co-create ⚫AI assists challenging
activities for humans complex task overwhelming task audio transformation synthesizer parameter manipulation statistical analysis causal inference Music library Audio on the web Above images are generated by DALL-E 3

4 Music AI elevates the value of music ⚫Our music
understanding/expressivity are enhanced analysis generation understanding expressivity Music Music Components MIDI Timbre Lyrics ・・・

5 Music AI elevates the value of music ⚫Expand opportunities
for application of music analysis generation context of music usage video theme ・・・ Music Music Components MIDI Timbre Lyrics ・・・

6 To realize such a future, what challenges for Music
AI? ⚫Understanding music compositionally like humans ⚫Interpreting human intent (=controllability)

7 Our focus: compositionality and controllability Generation Analysis Generation Music
Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・

8 Issue: out-of-domain performance Generation Analysis Generation Music Music Library
Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・

9 Annotation-free Approaches ⚫Learning without MIDI-audio paired data ✓MIDI to
Audio Synthesis ✓Audio to MIDI Transcription

10 Learning without MIDI-audio paired data ⚫CoSaRef: MIDI to Audio
Synthesis ✓Annotation-free: MIDI-audio paired data is NOT required ☺ ✓Performance comparable to MIDI-DDSP ☺ ✓Timbre controllability ☺ ✓Align with DAW music creation ☺ Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement, Osamu Take and Taketo Akama, arXiv 2024 Paper: https://arxiv.org/abs/2410.16785

11 Learning without MIDI-audio paired data ⚫Audio to MIDI Transcription
✓Annotation-free: MIDI-audio paired data is NOT required ☺ ✓Performance comparable to SOTAs in out-of-domain setting ☺ Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion, Gakusei Sato and Taketo Akama, ICME 2024 DEMO: https://complex-degree-f38.notion.site/Annotation-free-AMT-Demo-33db7972162846059c2118e3fbc9db75

14 Deep12: Music Analysis AI

15 LINK: https://www.sonycsl.co.jp/tokyo/14621/

16 Interview Articles available on the web https://cocotame.jp/series/039733/ https://cocotame.jp/series/039751/

18 Lower hierarchical level generation: MIDI & Timbre Generation Analysis
Generation Music Music Library Brain & Body Signal Retrieval Generation Music Components MIDI Timbre Lyrics ・・・

19 MIDI generation controllability ⚫Music Proofreading with RefinPaint ✓inpainting doesn’t
tell us where to modify • RefinPaint does ☺ ✓Annotation-free: Generated data to simulate lower quality composition ☺ Music Proofreading with RefinPaint: Where and How to Modify Compositions given Context, Pedro Ramoneda, Martin Rocamora, and Taketo Akama, ISMIR 2024 DEMO: https://refinpaint.github.io/

20 Timbre generation: Neural Synthesizer ⚫GANStrument ✓CGANs with encoder •
Accept user input & interpolate to create new timbre ✓Outperform GAN inversion baselines GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning, Gaku Narita, Junichi Shimizu, and Taketo Akama, ICASSP 2023 DEMO: https://ganstrument.github.io/ganstrument-demo/ Website: https://www.flow-machines.com/flow-machines_update_ganstrument/ https://cslmusicteam.sony.fr/prototypes/ganstrument/ Article: https://the-decoder.com/sonys-ganstrument-turns-a-rooster-into-a-cello/

21 GANStrument available in Flow Machines app https://www.youtube.com/watch?v=jz4KECx3lPk

22 Timbre generation: Neural Synthesizer ⚫HyperGANStrument ✓Improve the trade-off: Reconstruction
vs Interpolation HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks, Zhe Zhang and Taketo Akama, ICASSP2024 DEMO: https://lukibibi.notion.site/Demo-Page-for-HyperGANStrument-b7cf7b02ddfd4831ac8c64e9e246642b

Music Library Brain & Body Signal Retrieval Generation, Analysis Music Components MIDI Timbre Lyrics ・・・

24 More natural control of music? Generation Analysis Generation Music
Music Library Brain & Body Signal Retrieval Generation, Analysis Music Components MIDI Timbre Lyrics ・・・

25 Brainwave to Music generation ⚫general music reconstruction with non-invasive
EEG ⚫No manual preprocessing Reconstructed with our model Ground Truth Naturalistic Music Decoding from EEG Data via Latent Diffusion Models, Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Luca Cosmo, and Taketo Akama, ICASSP 2025 DEMO: https://emilianpostolache.com/brainwave/

26 Brainwave representation learning Predicting Artificial Neural Network Representations to
Learn Recognition Model for Music Identification from Brain Recordings, Taketo Akama, Zhuohao Zhang, Pengcheng Li, Kotaro Hongo, Hiroaki Kitano, Shun Minamikawa, Natalia Polouliakh, arXiv 2024 ⚫Cortical & ANN representation are reported to be similar ⚫Predict ANN representation to learn brainwave representation Paper: https://arxiv.org/abs/2412.15560

27 Dance-to-Music project ⚫Artist & Scientist collaboration ✓Challenge: reversing the
relationship • movement creates music X post: https://x.com/tuktoe/status/1790338620051873851

28 Invited Talk at JSAI2025 @ Osaka

29 Interested? Please join us! ⚫Positions ✓Research Assistant Students ✓Part-time/Full-time
⚫Flexible working hours/days ⚫Remote work is possible Please DM to Taketo Akama

SONY CSLは株式会社ソニーコンピュータサイエンス研究所の商標です。

Tokyo Music AI Gathering - Feb 2025. Music AI ...

Tokyo Music AI Gathering - Feb 2025. Music AI that Resonates with People: Compositional Modeling and Controllability

Taketo Akama

More Decks by Taketo Akama

Featured

Transcript

2025/02/05 Taketo Akama Project Researcher Flow Machines Studio Tokyo, Sony

2 AI and humans collaborate and co-create ⚫Individuals focus on

3 AI and humans collaborate and co-create ⚫AI assists challenging

4 Music AI elevates the value of music ⚫Our music

5 Music AI elevates the value of music ⚫Expand opportunities

6 To realize such a future, what challenges for Music

7 Our focus: compositionality and controllability Generation Analysis Generation Music

8 Issue: out-of-domain performance Generation Analysis Generation Music Music Library

9 Annotation-free Approaches ⚫Learning without MIDI-audio paired data ✓MIDI to

10 Learning without MIDI-audio paired data ⚫CoSaRef: MIDI to Audio

11 Learning without MIDI-audio paired data ⚫Audio to MIDI Transcription

12 Our focus: compositionality and controllability Generation Analysis Generation Music

13 Our focus: compositionality and controllability Generation Analysis Generation Music

14 Deep12: Music Analysis AI

15 LINK: https://www.sonycsl.co.jp/tokyo/14621/

16 Interview Articles available on the web https://cocotame.jp/series/039733/ https://cocotame.jp/series/039751/

17 Our focus: compositionality and controllability Generation Analysis Generation Music

18 Lower hierarchical level generation: MIDI & Timbre Generation Analysis

19 MIDI generation controllability ⚫Music Proofreading with RefinPaint ✓inpainting doesn’t

20 Timbre generation: Neural Synthesizer ⚫GANStrument ✓CGANs with encoder •

21 GANStrument available in Flow Machines app https://www.youtube.com/watch?v=jz4KECx3lPk

22 Timbre generation: Neural Synthesizer ⚫HyperGANStrument ✓Improve the trade-off: Reconstruction

23 Our focus: compositionality and controllability Generation Analysis Generation Music

24 More natural control of music? Generation Analysis Generation Music

25 Brainwave to Music generation ⚫general music reconstruction with non-invasive

26 Brainwave representation learning Predicting Artificial Neural Network Representations to

27 Dance-to-Music project ⚫Artist & Scientist collaboration ✓Challenge: reversing the

28 Invited Talk at JSAI2025 @ Osaka

29 Interested? Please join us! ⚫Positions ✓Research Assistant Students ✓Part-time/Full-time

SONY CSLは株式会社ソニーコンピュータサイエンス研究所の商標です。