Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Utilizing BERT to make keyword-based search smart

Utilizing BERT to make keyword-based search smart

Ching-Hsiang Tsai (Shawn Tsai)
LINE Taiwan Data Team Team Lead
https://linedevday.linecorp.com/jp/2019/sessions/S1-12

Avatar for LINE DevDay 2019

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay Utilizing BERT To Make Keyword- Based Search Smart

    > Ching-Hsiang Tsai (Shawn Tsai) > LINE Taiwan Data Team Team Lead
  2. Search Result Relevance > The main goal is to reduce

    the semantic gap between user query and documents. > The key points: semantic features and ranking function. > Search is a ranking problem. The ordering is more important than the predicted probability of a single instance.
  3. >  %   >  & %$ Limitation:

    Different description Limitation: No shared keywords > &!$# > $"   Search Scoring & Limitation > Sometimes, LINE doesn’t show notifications > Why messages don’t display > Chat history is still compressing > No matter how I click backup button, it doesn’t work > "# " $% & = #()*+)%,- , $&# " = 1 + 012 3456789:; <78=>?@:; > _B,1() = "# ∗ $&# Standard similarity function: TF-IDF
  4. Word Embedding > Vector representation > Capturing context of a

    word in a document, semantic/syntactic similarity, relation with other words Source: Efficient Estimation of Word Representations in Vector Space
  5. BERT > BERT is a new method of pre-training language

    representations which obtains state-of-the-art results on a wide array of NLP tasks. Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bidirectional Encoder Representations from Transformers
  6. Querying By Vector Representation Sent. Encoding By Pre-trained BERT Model

    Query Document Vecs Index Query Vec Documents Document Vecs Online Offline Nearest Neighbor Search Build NN Index
  7. Learning To Rank > Applying machine learning to construct ranking

    models for information retrieval systems > Caring more about ranking rather than rating prediction > Scoring by machine learning • Creating document index by Elasticsearch • Using embeddings to train ranking models • Serving search queries by Elasticsearch with ranking models
  8. Filters Search Architecture Documents Query Filter Index ES + Re-ranking

    BERT Matches Ranked Results NER … Scoring Index Ranking Models
  9. Search Workflow With Learning To Rank User’s Needs Measure Relevance

    Pre-process Inverted-index Features Selection Ranking Models Scoring Function NDCG Precision@k MAP Deploy Monitoring Feedback Evaluation Build Index Learning To Rank Serve Data
  10. More Consideration > Good judge lists matching user needs of

    search quality > Good metrics measuring search results > Incorporating with embeddings into scoring function > Synchronizing the version between indexing and serving layers > A/B testing