Score Transformer (MMAsia'21)

Score Transformer R&D Division, Yamaha Corporation MMAsia’21 | Dec 1-3,
2021 Masahiro Suzuki Generating Musical Score from Note-level Representation

Representations of Music 2 Musical Score MIDI Notation-level Representation Note-level
Representation Musical symbols enable us to visually comprehend music.

3 From MIDI to Score This work focuses on musical
score generation, which needs to estimate various kinds of musical elements.

MIDI-level token representations have been actively explored in music generation.
• Music Transformer [CZ Huang et al, 2019] • MIDI-like representation × Transformer • Generates long and coherent music • Pop Music Transformer [YS Huang et al, 2020] • REMI representation × Transformer-XL • Generates beat-aligned music 4 MIDI × Transformer Transformers work well with MIDI-level token representations. (w/ relative pos encoding) How about score-level token representation?

Score-level token representations have been unexplored. This work explores score
token representations in two ways: 1. Design a new token representation Tokenizing each musical symbol or attribute into a token 2. Utilize existing score formats Existing text-like score formats: ABC notation, Humdrum, and LilyPond 5 Score × Transformer Questions: Can transformers generate musical scores? Which token representation is effective?

6 Proposed Score Token Representation L bar clef_bass key_sharp_1 time_2/4
<voice> note_E3 len_1/2 stem_up beam_start note_F#3 len_1/2 stem_up beam_continue note_C#4 stem_up beam_continue note_B3 len_1/2 stem_up beam_stop </voice> <voice> note_B2 len_1 stem_down note_E3 len_1 stem_down </voice> bar … R bar clef_treble key_sharp_1 time_2/4 note_E4 note_D4 note_G#3 len_2 stem_up bar … bar </voice> <voice> R L concatenate Design Principles • One token per symbol or attribute • Combined sequences of staves • Compatible with music21 attributes → to make scores consistent → to build scores easily clef_treble key_sharp_1 time_2/4 clef_bass Score Tokens

7 Training the Score Transformer Score MIDI Music XML SMF
1. down-convert Score token sequence MIDI token sequence 3. restore 2. tokenize R bar clef_treble key_sharp_1 time_2/4 note_E4 note_D4 note_G#3 len_2 stem_up bar … L bar clef_bass key_sharp_1 … bar note_64 len_48 note_62 len_48 note_56 len_48 note_52 len_12 note_47 len_24 pos_12 note_54 len_12 beat … 2. tokenize Model: A small vanilla Transformer model (~4M params) Dataset: Piano scores (~7k scores, split by ~4 measures) Train the model to restore musical scores from down-converted MIDI in a sequence-to-sequence manner. This training scheme needs musical scores only. Training Scheme

8 Generated Example Generated Score Original Score Input MIDI The
Transformer model learned to generate readable scores in the form of score token representation. Infer

9 Evaluation Metric Original Score Generated Score Stem Direction Voice
separation Stem Direction Beam Count the differences of notations → Calculate error rates Stem Direction Measure differences The metric is based on “A Metric for music notation transcription accuracy [Cogiliati et al, 2017].” on 12 musical aspects

10 Score Transformer vs. Baselines Baselines Proposed Score Transformer outperforms
baselines on all 12 musical aspects, with much higher performances. Error rates in % (smaller is better) ・ CTD : Automatic music transcription framework [Cogiliati et al, 2016] ・ Finale 26 ・ MuseScore 3 Music Notation Software Baselines

11 Score Token vs. Score Formats Error rates in %
(smaller is better) Score format based Designed Designed score token shows the most stable performances over the aspects. Some score formats are prone to format error, which leads to practical issues. Length disagreement Corrupted format

12 Robustness on Timing Deviation Clean MIDI (quantized) Noisy MIDI
(unquantized) Add some noises to onset timing and note length Training with temporally deviated data shows that Generated Score Train & Infer Score Transformer also works well with unquantized MIDI.

13 Conclusion Can transformers generate musical scores? ✓ Yes. Transformers
work greatly better than existing methods. Which token representation is effective? ✓ Designed score token is the most effective. + The tokenization tools are publicly available -> https://github.com/suzuqn/ ScoreTransformer Get the tokenization tools! Possible future works: • Extending score token representation ex.) to various instruments, multi-part scores, or other symbols • Application to score-related tasks ex.) Score generation form scratch; Score-to-performance generation

Score Transformer (MMAsia'21)

Score Transformer (MMAsia'21)

Masahiro Suzuki

More Decks by Masahiro Suzuki

Other Decks in Research

Featured

Transcript

Score Transformer R&D Division, Yamaha Corporation MMAsia’21 | Dec 1-3,

Representations of Music 2 Musical Score MIDI Notation-level Representation Note-level

3 From MIDI to Score This work focuses on musical

MIDI-level token representations have been actively explored in music generation.

Score-level token representations have been unexplored. This work explores score

6 Proposed Score Token Representation L bar clef_bass key_sharp_1 time_2/4

7 Training the Score Transformer Score MIDI Music XML SMF

8 Generated Example Generated Score Original Score Input MIDI The

9 Evaluation Metric Original Score Generated Score Stem Direction Voice

10 Score Transformer vs. Baselines Baselines Proposed Score Transformer outperforms

11 Score Token vs. Score Formats Error rates in %

12 Robustness on Timing Deviation Clean MIDI (quantized) Noisy MIDI

13 Conclusion Can transformers generate musical scores? ✓ Yes. Transformers