Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism

Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism
☆Taein Kim¹, Yunho Kim² ¹ Inha University ² Andong National University Sep 28, 2023

Index • Introduction • Theoretic Approach • Proposed Model •
Experiments & Results • Discussion

Introduction | Theoretic Approach | Proposed Model | Experiments |
Discussion 5 Composer as a hobby いわゆるDTM (Desktop Music)

Discussion 6 You need to set hundreds or thousands of velocities for each notes!

7 Gymnopédie No. 1 – Satie Written by ClassicMan https://musescore.com/classicman/satie-gymnopedie-no-1
Concert Etude No. 2, "Gnomenreigen", S. 145/2 - Franz Liszt Performed by Xing Yu Lu https://www.piano-e-competition.com/midi_2011.asp Velocity information is artificial and unnatural Intensive & dynamic changes of velocity in a short time Yet they have periodical tendencies Score created by random user Score created by professional pianist Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion Difference between two groups of MIDI scores

8 Objective of this research Performance data of pro-pianist MIDI
with predicted velocities • Any public piano MIDI data • Score data created by me! Training velocity prediction model Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion

Discussion 9 Music Information in perspective of event • Event-based notation • Easy to process in program • Key event: • Pitch • Onset time (NoteOn) • Offset time (NoteOff) • Volume (Velocity) • MIDI / Piano Roll Time Event → Play MIDI with your instruments Real-time notation of e-instruments Offline notation of the score

10 Characteristics of Velocity data (From a musical perspective) •
What can determine the velocity? • Timestamp • Note pitch • Note duration • Adjacent notes have similar & gradual velocity • It is hard to press the keys with exactly same force • Nearly simultaneous notes have large difference of vel. • In the long term, velocity changes with a periodic trend. ① ② ③ ③ ② ① ② • Velocity: How fast(strong) the note should be played Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion → Multivariate Prediction

Discussion 11 Characteristics of Velocity (From a data perspective) time time_diff note_num length velocity 755 0 66 209 46 939 184 70 45 55 1096 157 73 60 65 1263 167 70 195 68 1442 179 66 40 65 1596 154 73 73 76 1752 156 54 186 48 1770 18 75 103 66 1879 109 73 47 64 1929 50 58 45 56 1938 9 75 55 60 2031 93 73 34 47 2078 47 75 41 63 ... ... ... ... ... • MIDI data correlates with velocity information: • Time (timestamp) [0~N] • Note number (pitch) [21~108] = [𝐶1 ~𝐶6 ] • Length (duration) [0~N] • Velocity (intensity) [0~127] • Time information would be inappropriate to use as a feature since it can be increased indefinitely • Use the time difference from previous note alternatively • Note pressed nearly same time has very little time diff. • Length information can theoretically grow indefinitely, but since it is human-created, it has a finite range in reality

Discussion 12 MAESTRO Dataset 12 Midi and Audio Edited for Synchronous TRacks and Organization About the dataset: • Recorded competitive performances by world-class pianists • Collected both audio(PCM) and MIDI media • Using a Yamaha Disklavier* piano to acquire MIDI information as you play • Audio and MIDI data are aligned with a tolerance of ~3ms *Disklavier: A grand piano that can record high-resolution MIDI information https://en.wikipedia.org/wiki/Disklavier http://piano-e-competition.com/ 1,814 Performances 430 Compositions 172.3 Hours of Audio and MIDI 6.18 Million Notes by Google Magenta project Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. "Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset." In International Conference on Learning Representations, 2019. Link: https://magenta.tensorflow.org/datasets/maestro

Discussion 13 seq2seq model with Luong¹ attention TimeDiff NoteNum NoteNumDiff Length 0 74 0 104 444 74 0 140 223 78 4 223 280 74 -4 137 Softmax 𝑪𝒐𝒎𝒃𝒊𝒏𝒆 (0) 57 57 69 Velocity 57 || 𝒑(𝟏) 57 || 𝒑(𝟐) 69 || 𝒑(𝟑) 61 𝒑(𝟒) ℎ𝑡 𝒄(𝒕) ෪ 𝒉𝒕 𝑐(𝑡) Context vector 𝑺𝒄𝒐𝒓𝒆𝒂𝒍𝒊𝒈𝒏𝒎𝒆𝒏𝒕 ¹ Minh-Thang Luong et al., “Effective Approaches to Attention-based Neural Machine Translation,” arXiv, 2015

14 Development environment Programming environment • OS: Ubuntu 20.04 LTS
• CPU: AMD Ryzen 3900x (12c24t) • RAM: DDR4 128GB 2666MHz • GPU: NVIDIA RTX A6000 48GB • GPU Driver: 515.65.01 • CUDA: 11.7 Experiment environment • Deep Learning • Tensorflow: 2.11.0 • Language • Python: 3.9.0 • C#: 10.0 • .NET: 6.0.1 • Dataset: MAESTRO (MIDI only) Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion

15 Details on the baseline model • Base network: Uni-LSTM
seq2seq (4+4 cell) • Dropout: 0.2 • Loss function: MSE & Cosine Similarity (𝛼=0.15) 𝑙𝑜𝑠𝑠 𝑦𝑙𝑎𝑏𝑒𝑙 , 𝑦𝑝𝑟𝑒𝑑 = 𝛼 ∗ (1 + 𝐶𝑜𝑠𝑖𝑛𝑒𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑦𝑙𝑎𝑏𝑒𝑙 , 𝑦𝑝𝑟𝑒𝑑 ) + 1 − 𝛼 ∗ 𝑀𝑆𝐸(𝑦𝑙𝑎𝑏𝑒𝑙 , 𝑦𝑝𝑟𝑒𝑑 ) Most balanced model between short-term and long-term performance Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion MAE: 0.1062 F1 Score: 0.4795 𝐹1¹ = 2 × 𝑆𝐷𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑆𝐷𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 × (1 − 3 × 𝑀𝐴𝐸𝑝𝑟𝑒𝑑𝑖𝑐𝑡 ) 𝑆𝐷𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑆𝐷𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 + (1 − 3 × 𝑀𝐴𝐸𝑝𝑟𝑒𝑑𝑖𝑐𝑡 ) ¹ Chien-Sheng Kuo et al., “Velocity Prediction for MIDI Notes with Deep Learning,” 2021 ICCE-TW. IEEE, 2021

16 Ablation study Introduction | Theoretic Approach | Proposed Model
| Experiments | Discussion H4-attention H32-attention Increase of LSTM cell shows the better quantitative performance even in the F1 score, but actual expressiveness gets diminished

17 Quantitative comparison (dataset) Introduction | Theoretic Approach | Proposed
Model | Experiments | Discussion Artist Title Duration MAE F1 Johann Sebastian Bach Prelude and Fugue in F-sharp Major, WTC I , BWV 858 3m47s 0.0658 0.6550 Joseph Haydn Sonata in D Major, Hob. XVI:24 4m32s 0.0746 0.6514 Johann Sebastian Bach Prelude and Fugue in D Major, WTC I, BWV 850 2m54s 0.0789 0.6294 Domenico Scarlatti Sonata K. 54 3m25s 0.0870 0.6202 Claude Debussy Prelude No. 6, Book I, "Des pas sur la neige" 4m7s 0.0917 0.6169 Artist Title Duration MAE F1 Franz Schubert Erlkönig (魔王) 4m34s 0.1183 0.3484 Ludwig van Beethoven Sonata No. 16 Op. 31 No. 1 in G Major, I. Allegro vivace 4m41s 0.1192 0.3523 Franz Liszt Transcendental Etude No. 8 "Wilde Jagd" 4m58s 0.1383 0.3614 Sergei Rachmaninoff Etude-Tableau in C Minor, Op. 39, No. 1 3m17s 0.1216 0.3640 Ludwig van Beethoven Sonata No. 16 in G Major, Op. 31 No. 1, First movement 4m37s 0.1084 0.3656

18 Quantitative comparison ¹ Neural Translation of Musical Style (Bi-LSTM)
by Iman Malik, Carl Henrik Ek, arXiv, 2017 All the metrics except “Proposed” method are referred from the Kuo’s paper. Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion ² Velocity Prediction for MIDI Notes with Deep Learning (ConvAE) by Chien-Sheng Kuo et al., 2021 ICCE-TW. IEEE ¹ ² • The MAE and MSE of our model are lowest • But the F1 score is worse than the others • These can be interpreted as training on average trends rather than the detailed expressiveness of the original data, which is not intended.

19 Qualitative comparison (Musical Turing Test¹) Introduction | Theoretic Approach
| Proposed Model | Experiments | Discussion Both musician and non-musician groups favored human performance more than AI, but the musician group showed a more significant gap. ¹ Iman Malik, Carl Henrik Ek, “Neural Translation of Musical Style,” arXiv, 2017 • 42 participants • 27 non-musicians • 15 musicians • Held in 7/24~29, 2023 • 32 KR, 10 JP participated • Survey provided in KR/JP language • 2 best, 2 worst F1 song + 1 special

Discussion 20 Demo (Melodic progression) Original performance Prediction by the model (Better dynamics) Original MIDI source: https://bitmidi.com/deb_clai_format0-mid * MIDI score acquired from public web

Discussion 21 Original MIDI source: https://bitmidi.com/deb_clai_format0-mid * MIDI score acquired from public web Demo (Chord progression) Original performance Prediction by the model (Similar performance)

Discussion 22 Demo (Live demo application) https://musescore.com/user/24069/scores/2322751 Beethoven Symphony No. 6 "Pastoral" (1st movement) Arranged by Frantz Liszt Note count: 10,203

Discussion 23 Demo (Live demo application) demo1.sapphosound.com Processing 10,203 notes got roughly 3 seconds • Converted Tensorflow model to ONNX model (17.8 KB, 352 parameters) • ASP.NET Core web server natively runs the model with ONNXRuntime (ONNX model ↔ Web server ↔ Web browser)

Discussion 24 Demo (Live demo application)

25 • The model's quantitative assessment reveals limitations in its
ability to capture velocity changes compared to other researches • New metrics for velocity performance is needed since small error can be achieved without considering the F1 score. • Since the current model is deterministic, it is not suitable for practical use. • Potential solutions can be achieved by implementing: • Classification-based model, especially within the velocity range of [0, 127] • Probabilistic model based on Bayesian LSTM Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion

26 Try my code and live demo in Github! https://github.com/sappho192/midi-velocity-infer-v2

28 Comparison between MSE and Cosine Similarity function MSE tracks
long-term trends well Cosine Similarity shows a better represention of changes in detail Introduction | Theoretic Approach | Proposed Model | Experiments | Discussion

Piano Velocity Prediction Using a Seq2Seq Model...

Piano Velocity Prediction Using a Seq2Seq Model with Attention Mechanism

Taein Kim

More Decks by Taein Kim

Other Decks in Programming

Featured

Transcript