Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Language Model for Music Recommendation

raghothams
September 20, 2023

Language Model for Music Recommendation

Talk at EuroPython 2023 on alternate methods of music discovery and recommendation.

raghothams

September 20, 2023
Tweet

More Decks by raghothams

Other Decks in Technology

Transcript

  1. Who We are Nischal HP VP Data and ML at

    scoutbee, Berlin Building Large Language Models, Knowledge Graphs and MLOps. Twitter : @nischalhp linkedin.com/in/nischalhp https:/ /github.com/deep-learning-for-humans Raghotham Sripadraj AI Architect at PayPal, Bangalore Building Document AI, Large Language and Computer Vision Models at PayPal. Twitter : @raghothams linkedin.com/in/raghothams/
  2. Introduction Le us circa 2022 - There are so many

    cool music producers. How can we find them based on what we like? Le us circa 2023 - Here is shazam.nearest_neighbors, music discovery using language models and graphs. https:/ /github.com/deep-learning-for-humans
  3. Problem with Music Discovery #1 Finding artists in your city

    who perform the music you like is a challenge. Genres can be hard to work with as a filter.
  4. You like a certain guitar solo from David Gilmour's rendition

    of Comfortably Numb and you would love to find other tracks that have similar solos, but can you? Problem with Music Discovery #2
  5. As a music producer, would it not be cool to

    have an assistant who could help me build great DJ sets? Problem with Music Discovery #3
  6. Music Discovery Recommendation and discovery on these platforms happens via

    Content Filtering + Collaborative Filtering Explore vs Exploit
  7. Music Discovery > Music exploration via search is limited. >

    Memorisation of tracks from your history of listening. > New artists are not often recommended.
  8. Our Goal Given a 10 second sample, can we identify

    other songs and artists that contain a similar pattern of music?
  9. Dataset Dataset - 600 songs, mostly songs without lyrics. 10

    second samples of 16000 and 48000 sample rate.
  10. Trivia : Sampling Rate Sampling rate refers to the number

    of samples taken for given time period. Higher the sampling rate, better the quality.
  11. Our Experiments with Transformers We looked at 3 Transformer based

    audio models. Wav2Vec2 Audio Spectrum Transformer CLAP
  12. We took a 10 second sample, generated embeddings using the

    transformers model and tried a quick similarity using a vector store, to find the most similar samples. Wav2Vec2 Audio Spectrum Transformer CLAP Our Experiments with Transformers
  13. Our hypothesis was that the closest matches will be other

    parts of the same track, Wav2Vec2 completely surprised us. We wanted to understand a bit further using visualisation. Input Sample spectrum analysis Our Experiments with Transformers
  14. Our hypothesis was that the closest matches will be other

    parts of the same track, Wav2Vec2 completely surprised us. We wanted to understand a bit further using visualisation. Our Experiments with Transformers Wav2Vec2
  15. Our hypothesis was that the closest matches will be other

    parts of the same track, Wav2Vec2 completely surprised us. We wanted to understand a bit further using visualisation. Our Experiments with Transformers Audio Spectrum Transformer
  16. Our hypothesis was that the closest matches will be other

    parts of the same track, Wav2Vec2 completely surprised us. We wanted to understand a bit further using visualisation. Our Experiments with Transformers CLAP
  17. We realised Wav2Vec2 was different, and went deeper to understand

    the datasets used to train those models and their intended purposes. Audio Transformers - Our Findings
  18. Wav2Vec2, was trained for speech recognition and is trained on

    Librispeech corpus that contains 960 hours of audio with speech. Audio Transformers - Our Findings
  19. AST, was trained for Audio classification and used AudioSet dataset,

    is a collection of over 2 million 10-second audio clips excised from YouTube videos and labeled with the sounds that the clip contains from a set of 527 labels. Audio Transformers - Our Findings
  20. CLAP, was designed to build good audio representation and was

    trained using the Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing. It consists mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. Audio Transformers - Our Findings
  21. Dataset - 600 songs, 10 second samples, 16000 and 48000

    sampling rates used. Embeddings generated for all the samples using Wav2Vec, Clap, AST transformers model Indexed the embeddings into a vector store (FAISS) to generate similarity Spectrum analysis of the samples and similar samples to better understand the transformers Audio Transformers - Summary
  22. We are not looking to replace current methods of recommendation,

    we just wanted to see if we can build something to change how users can discover music. Just to clarify
  23. Given a sample, we can now identify other samples that

    sound similar. How do we enable discovery though? Music Discovery
  24. We introduced graphs as a way for users to discover

    and explore similar tracks and artists. Music Discovery
  25. Using simple ontology we generated a graph data model with

    artist and audio entities. Audio can refer to a song or a sample. Audio has relationships with other Audio entities and Artist entities. Music Discovery
  26. Graphs make discovery fun. For starters, it helps us find

    other songs that are similar that are not from the same song / artist easier. We saw some interesting patterns, where we could see certain 10 second samples used by the same artists in different songs, with the help of graphs. Music Discovery
  27. After enabling the discovery of music with Language models and

    graph, our goal is to train a generative model to create DJ sets from playlists. Future is wild