Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Score Transformer (MMAsia'21)

Score Transformer (MMAsia'21)

MMAsia'21 Oral Presentation

Masahiro Suzuki

July 08, 2022
Tweet

More Decks by Masahiro Suzuki

Other Decks in Research

Transcript

  1. Score Transformer R&D Division, Yamaha Corporation MMAsia’21 | Dec 1-3,

    2021 Masahiro Suzuki Generating Musical Score from Note-level Representation
  2. Representations of Music 2 Musical Score MIDI Notation-level Representation Note-level

    Representation Musical symbols enable us to visually comprehend music.
  3. 3 From MIDI to Score This work focuses on musical

    score generation, which needs to estimate various kinds of musical elements.
  4. MIDI-level token representations have been actively explored in music generation.

    • Music Transformer [CZ Huang et al, 2019] • MIDI-like representation × Transformer • Generates long and coherent music • Pop Music Transformer [YS Huang et al, 2020] • REMI representation × Transformer-XL • Generates beat-aligned music 4 MIDI × Transformer Transformers work well with MIDI-level token representations. (w/ relative pos encoding) How about score-level token representation?
  5. Score-level token representations have been unexplored. This work explores score

    token representations in two ways: 1. Design a new token representation Tokenizing each musical symbol or attribute into a token 2. Utilize existing score formats Existing text-like score formats: ABC notation, Humdrum, and LilyPond 5 Score × Transformer Questions: Can transformers generate musical scores? Which token representation is effective?
  6. 6 Proposed Score Token Representation L bar clef_bass key_sharp_1 time_2/4

    <voice> note_E3 len_1/2 stem_up beam_start note_F#3 len_1/2 stem_up beam_continue note_C#4 stem_up beam_continue note_B3 len_1/2 stem_up beam_stop </voice> <voice> note_B2 len_1 stem_down note_E3 len_1 stem_down </voice> bar … R bar clef_treble key_sharp_1 time_2/4 note_E4 note_D4 note_G#3 len_2 stem_up bar … bar </voice> <voice> R L concatenate Design Principles • One token per symbol or attribute • Combined sequences of staves • Compatible with music21 attributes → to make scores consistent → to build scores easily clef_treble key_sharp_1 time_2/4 clef_bass Score Tokens
  7. 7 Training the Score Transformer Score MIDI Music XML SMF

    1. down-convert Score token sequence MIDI token sequence 3. restore 2. tokenize R bar clef_treble key_sharp_1 time_2/4 note_E4 note_D4 note_G#3 len_2 stem_up bar … L bar clef_bass key_sharp_1 … bar note_64 len_48 note_62 len_48 note_56 len_48 note_52 len_12 note_47 len_24 pos_12 note_54 len_12 beat … 2. tokenize Model: A small vanilla Transformer model (~4M params) Dataset: Piano scores (~7k scores, split by ~4 measures) Train the model to restore musical scores from down-converted MIDI in a sequence-to-sequence manner. This training scheme needs musical scores only. Training Scheme
  8. 8 Generated Example Generated Score Original Score Input MIDI The

    Transformer model learned to generate readable scores in the form of score token representation. Infer
  9. 9 Evaluation Metric Original Score Generated Score Stem Direction Voice

    separation Stem Direction Beam Count the differences of notations → Calculate error rates Stem Direction Measure differences The metric is based on “A Metric for music notation transcription accuracy [Cogiliati et al, 2017].” on 12 musical aspects
  10. 10 Score Transformer vs. Baselines Baselines Proposed Score Transformer outperforms

    baselines on all 12 musical aspects, with much higher performances. Error rates in % (smaller is better) ・ CTD : Automatic music transcription framework [Cogiliati et al, 2016] ・ Finale 26 ・ MuseScore 3 Music Notation Software Baselines
  11. 11 Score Token vs. Score Formats Error rates in %

    (smaller is better) Score format based Designed Designed score token shows the most stable performances over the aspects. Some score formats are prone to format error, which leads to practical issues. Length disagreement Corrupted format
  12. 12 Robustness on Timing Deviation Clean MIDI (quantized) Noisy MIDI

    (unquantized) Add some noises to onset timing and note length Training with temporally deviated data shows that Generated Score Train & Infer Score Transformer also works well with unquantized MIDI.
  13. 13 Conclusion Can transformers generate musical scores? ✓ Yes. Transformers

    work greatly better than existing methods. Which token representation is effective? ✓ Designed score token is the most effective. + The tokenization tools are publicly available -> https://github.com/suzuqn/ ScoreTransformer Get the tokenization tools! Possible future works: • Extending score token representation ex.) to various instruments, multi-part scores, or other symbols • Application to score-related tasks ex.) Score generation form scratch; Score-to-performance generation