Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Utilizing BERT to make keyword-based search smart

Utilizing BERT to make keyword-based search smart

Ching-Hsiang Tsai (Shawn Tsai)
LINE Taiwan Data Team Team Lead


LINE DevDay 2019

November 20, 2019

More Decks by LINE DevDay 2019

Other Decks in Technology


  1. 2019 DevDay Utilizing BERT To Make Keyword- Based Search Smart

    > Ching-Hsiang Tsai (Shawn Tsai) > LINE Taiwan Data Team Team Lead
  2. Agenda > Search Everywhere > Search Result Relevance > Embeddings

    > Learning To Rank > Search Workflow
  3. Search Everywhere Life on LINE

  4. Search Result Relevance > The main goal is to reduce

    the semantic gap between user query and documents. > The key points: semantic features and ranking function. > Search is a ranking problem. The ordering is more important than the predicted probability of a single instance.
  5. >  %   >  & %$ Limitation:

    Different description Limitation: No shared keywords > &!$# > $"   Search Scoring & Limitation > Sometimes, LINE doesn’t show notifications > Why messages don’t display > Chat history is still compressing > No matter how I click backup button, it doesn’t work > "# " $% & = #()*+)%,- , $&# " = 1 + 012 3456789:; <78=>?@:; > _B,1() = "# ∗ $&# Standard similarity function: TF-IDF
  6. Embeddings

  7. Word Embedding > Vector representation > Capturing context of a

    word in a document, semantic/syntactic similarity, relation with other words Source: Efficient Estimation of Word Representations in Vector Space
  8. BERT > BERT is a new method of pre-training language

    representations which obtains state-of-the-art results on a wide array of NLP tasks. Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bidirectional Encoder Representations from Transformers
  9. Querying By Vector Representation Sent. Encoding By Pre-trained BERT Model

    Query Document Vecs Index Query Vec Documents Document Vecs Online Offline Nearest Neighbor Search Build NN Index
  10. Learning To Rank

  11. Learning To Rank > Applying machine learning to construct ranking

    models for information retrieval systems > Caring more about ranking rather than rating prediction > Scoring by machine learning • Creating document index by Elasticsearch • Using embeddings to train ranking models • Serving search queries by Elasticsearch with ranking models
  12. Filters Search Architecture Documents Query Filter Index ES + Re-ranking

    BERT Matches Ranked Results NER … Scoring Index Ranking Models
  13. Custom Scoring Function

  14. Search Workflow User’s Needs Measure Relevance Pre-process Inverted-index Deploy Monitoring

    Feedback Evaluation Serve Data Build Index
  15. Search Workflow With Learning To Rank User’s Needs Measure Relevance

    Pre-process Inverted-index Features Selection Ranking Models Scoring Function NDCG Precision@k MAP Deploy Monitoring Feedback Evaluation Build Index Learning To Rank Serve Data
  16. More Consideration > Good judge lists matching user needs of

    search quality > Good metrics measuring search results > Incorporating with embeddings into scoring function > Synchronizing the version between indexing and serving layers > A/B testing
  17. Thank you