Slide 1

Slide 1 text

Brian McFee Dan Ellis Analyzing song structure with spectral clustering

Slide 2

Slide 2 text

Musical structure analysis 1. Detect change-points verse → chorus 2. Label repeated sections ABAC ● … also representation and visualization

Slide 3

Slide 3 text

2012

Slide 4

Slide 4 text

1. Build a chain graph, each beat is a vertex 2. Connect repeated beats 3. Playback follows a random walk The Infinite Jukebox [Lamere, 2012]

Slide 5

Slide 5 text

What does this have to do with structure analysis?

Slide 6

Slide 6 text

Structure from the ∞ Jukebox 1. Sequential repetitions form dense subgraphs ○ Ignore edge directions, examine connectivity

Slide 7

Slide 7 text

Structure from the ∞ Jukebox 1. Sequential repetitions form dense subgraphs ○ Ignore edge directions, examine connectivity 2. Links between subgraphs are sparse ○ Pruning these should reveal structure

Slide 8

Slide 8 text

Our strategy 1. Construct a graph over beats 2. Partition the graph to recover structure 3. Vary the partition size to expose multi-level structure

Slide 9

Slide 9 text

Building the graph [1/3]: The local graph 1. Add a vertex for each beat 2. Add local edges (i, i ± 1) 3. Weight edges by MFCC similarity 1 2 3 1 2 3 1 2 3

Slide 10

Slide 10 text

1. Link k-nearest neighbors (in CQT space) Building the graph [2/3]: The repetition graph 3 1 2 5 6 4 8 7

Slide 11

Slide 11 text

1. Link k-nearest neighbors (in CQT space) 2. Enhance sequences by windowed majority vote Building the graph [2/3]: The repetition graph 3 1 2 5 6 4 8 7

Slide 12

Slide 12 text

1. Link k-nearest neighbors (in CQT space) 2. Enhance sequences by windowed majority vote Building the graph [2/3]: The repetition graph 3 1 2 5 6 4 8 7

Slide 13

Slide 13 text

1. Link k-nearest neighbors (in CQT space) 2. Enhance sequences by windowed majority vote 3. Weight edges by feature similarity Building the graph [2/3]: The repetition graph 3 1 2 5 6 4 8 7

Slide 14

Slide 14 text

Building the graph [3/3]: The combination 1. Take a weighted combination of local and repetition A = μ *Local + (1-μ) *Repetition

Slide 15

Slide 15 text

Building the graph [3/3]: The combination 1. Take a weighted combination of local and repetition A = μ *Local + (1-μ) *Repetition 2. Optimize μ : P[Local move] ≅ P[Repetition move]

Slide 16

Slide 16 text

Building the graph [3/3]: The combination 1. Take a weighted combination of local and repetition A = μ *Local + (1-μ) *Repetition 2. Optimize μ : P[Local move] ≅ P[Repetition move] 3. μ* has a closed-form solution ∀ i: μ ∑ j Local[i,j] ≅ (1-μ) ∑ j Repetition[i,j]

Slide 17

Slide 17 text

Example: The Beatles - Come Together

Slide 18

Slide 18 text

Partitioning via spectral clustering ● Affinity matrix A, degree matrix D ii = ∑ j A ij ● Normalized Laplacian L = I - D-1A ○ Bottom eigenvectors encode component membership for each beat ○ … i.e., the regions likely to trap a random walk ● Cluster the eigenvectors of L to reveal structure

Slide 19

Slide 19 text

Example: The Beatles - Come Together

Slide 20

Slide 20 text

Example: The Beatles - Come Together Low-rank reconstructions expose structure L ≅ Y[:n, :m]Y[:n, :m]T

Slide 21

Slide 21 text

Multi-level segmentation 1. Construct the n-by-n graph A 2. Compute Laplacian eigenvectors Y 3. for m in [2, 3, …] a. Partitions[m] := spectral_clustering(Y[:n, :m], n_components=m) 4. Return Partitions One discrete parameter controls complexity

Slide 22

Slide 22 text

Interactive visualization demo

Slide 23

Slide 23 text

Quantitative evaluation ● Metrics: boundary detection and pairwise frame labeling ○ Beatles_TUT (174 tracks) ○ SALAMI small (735 tracks) and functions ● Choosing the number of components ○ Maximize label entropy with duration constraints ○ Oracle: Best m per track, per metric (simulates interactive display) ● Baseline: [Serrà, Müller, Grosche & Arcos, 2012]

Slide 24

Slide 24 text

Results: Beatles F 0.5s F 3.0s F-Pairwise Automatic m 0.312 +- 0.15 0.579 +- 0.16 0.628 +- 0.13 Oracle 0.414 +- 0.14 0.684 +- 0.13 0.694 +- 0.12 SMGA 0.293 +- 0.13 0.699 +- 0.16 0.715 +- 0.15

Slide 25

Slide 25 text

Results: SALAMI small F 0.5s F 3.0s F-Pairwise Automatic m 0.192 +- 0.11 0.344 +- 0.15 0.448 +- 0.16 Oracle 0.292 +- 0.15 0.525 +- 0.19 0.561 +- 0.16 SMGA 0.173 +- 0.08 0.518 +- 0.12 0.493 +- 0.16

Slide 26

Slide 26 text

Results: SALAMI functions F 0.5s F 3.0s F-Pairwise Automatic m 0.304 +- 0.13 0.455 +- 0.16 0.546 +- 0.14 Oracle 0.406 +- 0.13 0.579 +- 0.15 0.652 +- 0.13 SMGA 0.224 +- 0.11 0.550 +- 0.18 0.553 +- 0.15

Slide 27

Slide 27 text

Summary ● We demonstrated a novel graphical method for musical structure analysis ● Laplacian eigenvectors encode relevant musical information ● Future directions: ○ Improve edge prediction/weighting ○ Enforce consistency between layers

Slide 28

Slide 28 text

Thanks! [email protected] https://github.com/bmcfee/laplacian_segmentation https://github.com/urinieto/msaf