Upgrade to Pro — share decks privately, control downloads, hide ads and more …

「LINE MUSIC」におけるハイブリッド検索や略称抽出を用いた曖昧検索への挑戦 / Cha...

「LINE MUSIC」におけるハイブリッド検索や略称抽出を用いた曖昧検索への挑戦 / Challenges in Ambiguous Search Using Hybrid Search and Abbreviation Extraction in "LINE MUSIC"

ミスタイプや略称などのクエリに対してもユーザの望む検索結果を届けたい。これは多くのサービスが抱える課題だと考えており、LINE MUSICもその一つでした。このセッションでは、この問題を解決するために機械学習技術を用いた曖昧検索の取り組みについてお話いたします。私達が実践した単語埋め込みモデルを活用したベクトル検索と生成AIを活用したタイアップの略称抽出の2つのアプローチについて、アーキテクチャ・実装上の工夫などの技術的解説と本番導入までのプロセスを取り上げる予定です。ぜひこのセッションをお聴きいただき、サービス開発に役立てていただければと思います。

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Transcript

  1. About Speakers Tada Taku Music Business Group / LY Corporation

    Backend Engineer Music Business Group / LY Corporation Backend Engineer Mori Jumpei
  2. Ø Keyword Search Ø Issue Ø Traditional Method Ø Vector

    Search Ø Hybrid Search Ø Architecture Ø Reranking Ø Summary Hybrid Search
  3. Issue 7BSJPVTFYQSFTTJPOT :VLJOPIBOB ઇͷՖ 4OPX'MPXFS :VLJOPIBOB ઇͷ՚ :VLJOPIBOB Ώ͖ͷ͸ͳ :VLJOPIBOB

    ઇͷϋφ 4OPX'SPXFS εϊ΢ϑϥϫʔ 8IJUF'MPXFS *This is a fictitious song title and bears no relation to any existing or real-world music.
  4. Traditional Method 'V[[Z4FBSDI /PSNBMJ[BUJPO &YQBOEJOH%JDUJPOBSZ ÅˠB ͋ˠΞ G PXFS ˣ

    qPXFS GSPXFS < TOPXqPXFS XIJUFqPXFS TOPXqPVS ઇͷՖ ઇͷϋφ ʜ UPPNBOZ > MˠS GQPXFS
  5. How to ensure quality in this major change? )ZCSJE4FBSDI 

    ,FZXPSE4FBSDI 7FDUPS4FBSDI in this minor change
  6. Hybrid Search $POUFOU" $POUFOUT# $POUFOUT% $POUFOU# $POUFOU$ $POUFOU" $POUFOUT# $POUFOU$

    $POUFOUT% ,FZXPSE4FBSDI3FTVMU 7FDUPS4FBSDI3FTVMU )ZCSJE4FBSDI3FTVMU
  7. Architecture Search Server Text Embedding Server OpenSearch ,FZXPSE3FTVMU Keyword Search

    )ZCSJE3FTVMU Query Query Vector 7FDUPS3FTVMU Vector Search /POpOFUVOFE NVMUJMJOHVBMFTNBMM Reranking
  8. Reranking (Motivation) TOPXGSPXFS 4OPX'MPXFS 4OPX1PXEFS The order of vector search

    results ≠ search results wanted by users Similarity (snow frower, Snow Flower) < Similarity (snow frower, Snow Powder) But users may want “Snow Flower”
  9. Summary ⎯ Users input various expressions for a same content.

    ⎯ Vector Search is powerful method to resolve kind problem. ⎯ Hybrid Search can ensure minimum quality (= keyword search quality). ⎯ With rule based reranking, we can integrate non-fine-tuned ML model. ⎯ Our core logic is Longest Common Subsequence in reranking logic  )ZCSJE4FBSDI
  10. Problem: Search songs by tie-up aliases %3"(0/#"-- $)"-")&"%$)"-" An opening

    song of the anime DRAGON BALL. %# Not found One of aliases of DRAGON BALL
  11. First idea: Just using text generation AI *OTVGpDJFOUQFSGPSNBODFJOBMJBT HFOFSBUJPOʜ DRAGON

    BALL DB 4FBSDI &OHJOF Search by “DB” ① ② List the aliases for the anime DRAGON BALL. (FOFSBUFUJFVQBMJBTFTVTJOHUFYU HFOFSBUJPO"*
  12. Core idea &YUSBDUBMJBTFTGSPN8JLJQFEJBBSUJDMFVTJOHUFYU HFOFSBUJPO"* DRAGON BALL is Japanese manga. It

    was published in 1984. It is abbreviated as “DB”. Its Story is … "OFYBNQMFBSUJDMFGPS%3"(0/#"-- 5FYUHFOFSBUJPO"* * This article defers from the actual Wikipedia article.
  13.  $PMMFDUUJFVQJOGPSNBUJPOGSPN8JLJQFEJB &YUSBDUUJFVQBMJBTFTVTJOHUFYU HFOFSBUJPO"* *OEFYUJFVQBMJBTFTBOETFBSDICZUIFN Solution: Extract from external knowledge

    DRAGON BALL DB 4FBSDI &OHJOF Search by “DB” ① ② ③ Extract aliases for DRAGON BALL from following text. Text: DRAGON BALL is Japanese manga. It was published in 1984. It is abbreviated as “DB”. Its Story is …
  14. Prompt technique: Few-shot prompting Note that you must follow these:

    - Output empty if there are no tie-up aliases Example of text: ECHO-ECHO is Japanese anime. Story of Mount Fuji. Your answer: Output empty because there are no aliases. (JWFGFXTIPUFYBNQMFTJTNPSFFGGFDUJWFUIBOHJWJOHEFUBJMFEUBTL FYQMBOBUJPO * ECHO-ECHO is a fictional anime title.
  15. Prompt technique: Chain-of-thought Output your thought before answer. Example of

    text: ECHO-ECHO is Japanese anime. Story of Mount Fuji. Your answer: <thinking> Mount fuji is just object appearing in the story. </thinking> <answer></answer> -FU"*UPPVUQVUIJTIFSUIPVHIUCFGPSFBOTXFS * ECHO-ECHO is a fictional anime title.
  16. Result and summary ⎯ 1SFDJTJPO ⎯ 3FDBMM 5FYUHFOFSBUJPO"*XJUIFYUFSOBMLOPXMFEHFIBTUIFQPUFOUJBMUPUBDLMF OJDIFUBTLT ⎯

    'FXTIPUQSPNQUJOH ⎯ $IBJOPG5IPVHIU +VTUVTJOHTJNQMFQSPNQUUFDIOJRVFDBOJNQSPWFQFSGPSNBODF )JHI1SFDJTJPOBOE3FDBMMPODPMMFDUJPOPGUJFVQBMJBTFT
  17. Conclusion ⎯ By using a non-fine-tuned model and LLM, we

    were able to launch in a short period ⎯ The cancel rate decreased by 15% ⎯ Fine-tuning for a text embedding model ⎯ Other external knowledges Future work
  18. Thank you *The song titles, anime titles and related information

    mentioned in this document are used for the sake of explanation and are not associated with the respective copyright or trademark holders.