Utilizing BERT to make keyword-based search smart

Utilizing BERT to make keyword-based search smart

Ching-Hsiang Tsai (Shawn Tsai)
LINE Taiwan Data Team Team Lead
https://linedevday.linecorp.com/jp/2019/sessions/S1-12

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 20, 2019
Tweet

Transcript

  1. 2019 DevDay Utilizing BERT To Make Keyword- Based Search Smart

    > Ching-Hsiang Tsai (Shawn Tsai) > LINE Taiwan Data Team Team Lead
  2. Agenda > Search Everywhere > Search Result Relevance > Embeddings

    > Learning To Rank > Search Workflow
  3. Search Everywhere Life on LINE

  4. Search Result Relevance > The main goal is to reduce

    the semantic gap between user query and documents. > The key points: semantic features and ranking function. > Search is a ranking problem. The ordering is more important than the predicted probability of a single instance.
  5. >  %   >  & %$ Limitation:

    Different description Limitation: No shared keywords > &!$# > $"   Search Scoring & Limitation > Sometimes, LINE doesn’t show notifications > Why messages don’t display > Chat history is still compressing > No matter how I click backup button, it doesn’t work > "# " $% & = #()*+)%,- , $&# " = 1 + 012 3456789:; <78=>?@:; > _B,1() = "# ∗ $&# Standard similarity function: TF-IDF
  6. Embeddings

  7. Word Embedding > Vector representation > Capturing context of a

    word in a document, semantic/syntactic similarity, relation with other words Source: Efficient Estimation of Word Representations in Vector Space
  8. BERT > BERT is a new method of pre-training language

    representations which obtains state-of-the-art results on a wide array of NLP tasks. Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bidirectional Encoder Representations from Transformers
  9. Querying By Vector Representation Sent. Encoding By Pre-trained BERT Model

    Query Document Vecs Index Query Vec Documents Document Vecs Online Offline Nearest Neighbor Search Build NN Index
  10. Learning To Rank

  11. Learning To Rank > Applying machine learning to construct ranking

    models for information retrieval systems > Caring more about ranking rather than rating prediction > Scoring by machine learning • Creating document index by Elasticsearch • Using embeddings to train ranking models • Serving search queries by Elasticsearch with ranking models
  12. Filters Search Architecture Documents Query Filter Index ES + Re-ranking

    BERT Matches Ranked Results NER … Scoring Index Ranking Models
  13. Custom Scoring Function

  14. Search Workflow User’s Needs Measure Relevance Pre-process Inverted-index Deploy Monitoring

    Feedback Evaluation Serve Data Build Index
  15. Search Workflow With Learning To Rank User’s Needs Measure Relevance

    Pre-process Inverted-index Features Selection Ranking Models Scoring Function NDCG Precision@k MAP Deploy Monitoring Feedback Evaluation Build Index Learning To Rank Serve Data
  16. More Consideration > Good judge lists matching user needs of

    search quality > Good metrics measuring search results > Incorporating with embeddings into scoring function > Synchronizing the version between indexing and serving layers > A/B testing
  17. Thank you