Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism

Taein Kim
September 29, 2023

Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism

日本音響学会の2023年秋季研究発表会で発表しました。
発表時間:3-4-16 (14:45~15:00)
題目:Piano Velocity prediction Using a Seq2Seq Model with Attention Mechanism
発表者:☆Taein Kim(Inha Univ.)

Code: https://github.com/sappho192/midi-velocity-infer-v2
Demo: http://demo1.sapphosound.com/

Code(demo): https://github.com/sappho192/midi-velocity-infer-demo

Taein Kim

September 29, 2023
Tweet

More Decks by Taein Kim

Other Decks in Programming

Transcript

  1. Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism

    ☆Taein Kim¹, Yunho Kim² ¹ Inha University ² Andong National University Sep 28, 2023
  2. 2

  3. Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism

    ☆Taein Kim¹, Yunho Kim² ¹ Inha University ² Andong National University Sep 28, 2023
  4. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 5 Composer as a hobby いわゆるDTM (Desktop Music)
  5. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 6 You need to set hundreds or thousands of velocities for each notes!
  6. 7 Gymnopédie No. 1 – Satie Written by ClassicMan https://musescore.com/classicman/satie-gymnopedie-no-1

    Concert Etude No. 2, "Gnomenreigen", S. 145/2 - Franz Liszt Performed by Xing Yu Lu https://www.piano-e-competition.com/midi_2011.asp Velocity information is artificial and unnatural Intensive & dynamic changes of velocity in a short time Yet they have periodical tendencies Score created by random user Score created by professional pianist Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion Difference between two groups of MIDI scores
  7. 8 Objective of this research Performance data of pro-pianist MIDI

    with predicted velocities • Any public piano MIDI data • Score data created by me! Training velocity prediction model Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion
  8. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 9 Music Information in perspective of event • Event-based notation • Easy to process in program • Key event: • Pitch • Onset time (NoteOn) • Offset time (NoteOff) • Volume (Velocity) • MIDI / Piano Roll Time Event → Play MIDI with your instruments Real-time notation of e-instruments Offline notation of the score
  9. 10 Characteristics of Velocity data (From a musical perspective) •

    What can determine the velocity? • Timestamp • Note pitch • Note duration • Adjacent notes have similar & gradual velocity • It is hard to press the keys with exactly same force • Nearly simultaneous notes have large difference of vel. • In the long term, velocity changes with a periodic trend. ① ② ③ ③ ② ① ② • Velocity: How fast(strong) the note should be played Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion → Multivariate Prediction
  10. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 11 Characteristics of Velocity (From a data perspective) time time_diff note_num length velocity 755 0 66 209 46 939 184 70 45 55 1096 157 73 60 65 1263 167 70 195 68 1442 179 66 40 65 1596 154 73 73 76 1752 156 54 186 48 1770 18 75 103 66 1879 109 73 47 64 1929 50 58 45 56 1938 9 75 55 60 2031 93 73 34 47 2078 47 75 41 63 ... ... ... ... ... • MIDI data correlates with velocity information: • Time (timestamp) [0~N] • Note number (pitch) [21~108] = [𝐶1 ~𝐶6 ] • Length (duration) [0~N] • Velocity (intensity) [0~127] • Time information would be inappropriate to use as a feature since it can be increased indefinitely • Use the time difference from previous note alternatively • Note pressed nearly same time has very little time diff. • Length information can theoretically grow indefinitely, but since it is human-created, it has a finite range in reality
  11. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 12 MAESTRO Dataset 12 Midi and Audio Edited for Synchronous TRacks and Organization About the dataset: • Recorded competitive performances by world-class pianists • Collected both audio(PCM) and MIDI media • Using a Yamaha Disklavier* piano to acquire MIDI information as you play • Audio and MIDI data are aligned with a tolerance of ~3ms *Disklavier: A grand piano that can record high-resolution MIDI information https://en.wikipedia.org/wiki/Disklavier http://piano-e-competition.com/ 1,814 Performances 430 Compositions 172.3 Hours of Audio and MIDI 6.18 Million Notes by Google Magenta project Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. "Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset." In International Conference on Learning Representations, 2019. Link: https://magenta.tensorflow.org/datasets/maestro
  12. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 13 seq2seq model with Luong¹ attention TimeDiff NoteNum NoteNumDiff Length 0 74 0 104 444 74 0 140 223 78 4 223 280 74 -4 137 Softmax 𝑪𝒐𝒎𝒃𝒊𝒏𝒆 (0) 57 57 69 Velocity 57 || 𝒑(𝟏) 57 || 𝒑(𝟐) 69 || 𝒑(𝟑) 61 𝒑(𝟒) ℎ𝑡 𝒄(𝒕) ෪ 𝒉𝒕 𝑐(𝑡) Context vector 𝑺𝒄𝒐𝒓𝒆𝒂𝒍𝒊𝒈𝒏𝒎𝒆𝒏𝒕 ¹ Minh-Thang Luong et al., “Effective Approaches to Attention-based Neural Machine Translation,” arXiv, 2015
  13. 14 Development environment Programming environment • OS: Ubuntu 20.04 LTS

    • CPU: AMD Ryzen 3900x (12c24t) • RAM: DDR4 128GB 2666MHz • GPU: NVIDIA RTX A6000 48GB • GPU Driver: 515.65.01 • CUDA: 11.7 Experiment environment • Deep Learning • Tensorflow: 2.11.0 • Language • Python: 3.9.0 • C#: 10.0 • .NET: 6.0.1 • Dataset: MAESTRO (MIDI only) Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion
  14. 15 Details on the baseline model • Base network: Uni-LSTM

    seq2seq (4+4 cell) • Dropout: 0.2 • Loss function: MSE & Cosine Similarity (𝛼=0.15) 𝑙𝑜𝑠𝑠 𝑦𝑙𝑎𝑏𝑒𝑙 , 𝑦𝑝𝑟𝑒𝑑 = 𝛼 ∗ (1 + 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑦𝑙𝑎𝑏𝑒𝑙 , 𝑦𝑝𝑟𝑒𝑑 ) + 1 − 𝛼 ∗ 𝑀𝑆𝐸(𝑦𝑙𝑎𝑏𝑒𝑙 , 𝑦𝑝𝑟𝑒𝑑 ) Most balanced model between short-term and long-term performance Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion MAE: 0.1062 F1 Score: 0.4795 𝐹1¹ = 2 × 𝑆𝐷𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑆𝐷𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 × (1 − 3 × 𝑀𝐴𝐸𝑝𝑟𝑒𝑑𝑖𝑐𝑡 ) 𝑆𝐷𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑆𝐷𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 + (1 − 3 × 𝑀𝐴𝐸𝑝𝑟𝑒𝑑𝑖𝑐𝑡 ) ¹ Chien-Sheng Kuo et al., “Velocity Prediction for MIDI Notes with Deep Learning,” 2021 ICCE-TW. IEEE, 2021
  15. 16 Ablation study Introduction | Theoretic Approach | Proposed Model

    | Experiments | Discussion H4-attention H32-attention Increase of LSTM cell shows the better quantitative performance even in the F1 score, but actual expressiveness gets diminished
  16. 17 Quantitative comparison (dataset) Introduction | Theoretic Approach | Proposed

    Model | Experiments | Discussion Artist Title Duration MAE F1 Johann Sebastian Bach Prelude and Fugue in F-sharp Major, WTC I , BWV 858 3m47s 0.0658 0.6550 Joseph Haydn Sonata in D Major, Hob. XVI:24 4m32s 0.0746 0.6514 Johann Sebastian Bach Prelude and Fugue in D Major, WTC I, BWV 850 2m54s 0.0789 0.6294 Domenico Scarlatti Sonata K. 54 3m25s 0.0870 0.6202 Claude Debussy Prelude No. 6, Book I, "Des pas sur la neige" 4m7s 0.0917 0.6169 Artist Title Duration MAE F1 Franz Schubert Erlkönig (魔王) 4m34s 0.1183 0.3484 Ludwig van Beethoven Sonata No. 16 Op. 31 No. 1 in G Major, I. Allegro vivace 4m41s 0.1192 0.3523 Franz Liszt Transcendental Etude No. 8 "Wilde Jagd" 4m58s 0.1383 0.3614 Sergei Rachmaninoff Etude-Tableau in C Minor, Op. 39, No. 1 3m17s 0.1216 0.3640 Ludwig van Beethoven Sonata No. 16 in G Major, Op. 31 No. 1, First movement 4m37s 0.1084 0.3656
  17. 18 Quantitative comparison ¹ Neural Translation of Musical Style (Bi-LSTM)

    by Iman Malik, Carl Henrik Ek, arXiv, 2017 All the metrics except “Proposed” method are referred from the Kuo’s paper. Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion ² Velocity Prediction for MIDI Notes with Deep Learning (ConvAE) by Chien-Sheng Kuo et al., 2021 ICCE-TW. IEEE ¹ ² • The MAE and MSE of our model are lowest • But the F1 score is worse than the others • These can be interpreted as training on average trends rather than the detailed expressiveness of the original data, which is not intended.
  18. 19 Qualitative comparison (Musical Turing Test¹) Introduction | Theoretic Approach

    | Proposed Model | Experiments | Discussion Both musician and non-musician groups favored human performance more than AI, but the musician group showed a more significant gap. ¹ Iman Malik, Carl Henrik Ek, “Neural Translation of Musical Style,” arXiv, 2017 • 42 participants • 27 non-musicians • 15 musicians • Held in 7/24~29, 2023 • 32 KR, 10 JP participated • Survey provided in KR/JP language • 2 best, 2 worst F1 song + 1 special
  19. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 20 Demo (Melodic progression) Original performance Prediction by the model (Better dynamics) Original MIDI source: https://bitmidi.com/deb_clai_format0-mid * MIDI score acquired from public web
  20. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 21 Original MIDI source: https://bitmidi.com/deb_clai_format0-mid * MIDI score acquired from public web Demo (Chord progression) Original performance Prediction by the model (Similar performance)
  21. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 22 Demo (Live demo application) https://musescore.com/user/24069/scores/2322751 Beethoven Symphony No. 6 "Pastoral" (1st movement) Arranged by Frantz Liszt Note count: 10,203
  22. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 23 Demo (Live demo application) demo1.sapphosound.com Processing 10,203 notes got roughly 3 seconds • Converted Tensorflow model to ONNX model (17.8 KB, 352 parameters) • ASP.NET Core web server natively runs the model with ONNXRuntime (ONNX model ↔ Web server ↔ Web browser)
  23. Introduction | Theoretic Approach | Proposed Model | Experiments |

    Discussion 24 Demo (Live demo application)
  24. 25 • The model's quantitative assessment reveals limitations in its

    ability to capture velocity changes compared to other researches • New metrics for velocity performance is needed since small error can be achieved without considering the F1 score. • Since the current model is deterministic, it is not suitable for practical use. • Potential solutions can be achieved by implementing: • Classification-based model, especially within the velocity range of [0, 127] • Probabilistic model based on Bayesian LSTM Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion
  25. 27

  26. 28 Comparison between MSE and Cosine Similarity function MSE tracks

    long-term trends well Cosine Similarity shows a better represention of changes in detail Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion