Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Utilizing BERT to make keyword-based search smart

Utilizing BERT to make keyword-based search smart

Ching-Hsiang Tsai (Shawn Tsai)
LINE Taiwan Data Team Team Lead
https://linedevday.linecorp.com/jp/2019/sessions/S1-12

LINE DevDay 2019

November 20, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. 2019 DevDay Utilizing BERT To Make Keyword- Based Search Smart

    > Ching-Hsiang Tsai (Shawn Tsai) > LINE Taiwan Data Team Team Lead
  2. Search Result Relevance > The main goal is to reduce

    the semantic gap between user query and documents. > The key points: semantic features and ranking function. > Search is a ranking problem. The ordering is more important than the predicted probability of a single instance.
  3. >  %   >  & %$ Limitation:

    Different description Limitation: No shared keywords > &!$# > $"   Search Scoring & Limitation > Sometimes, LINE doesn’t show notifications > Why messages don’t display > Chat history is still compressing > No matter how I click backup button, it doesn’t work > "# " $% & = #()*+)%,- , $&# " = 1 + 012 3456789:; <78=>?@:; > _B,1() = "# ∗ $&# Standard similarity function: TF-IDF
  4. Word Embedding > Vector representation > Capturing context of a

    word in a document, semantic/syntactic similarity, relation with other words Source: Efficient Estimation of Word Representations in Vector Space
  5. BERT > BERT is a new method of pre-training language

    representations which obtains state-of-the-art results on a wide array of NLP tasks. Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bidirectional Encoder Representations from Transformers
  6. Querying By Vector Representation Sent. Encoding By Pre-trained BERT Model

    Query Document Vecs Index Query Vec Documents Document Vecs Online Offline Nearest Neighbor Search Build NN Index
  7. Learning To Rank > Applying machine learning to construct ranking

    models for information retrieval systems > Caring more about ranking rather than rating prediction > Scoring by machine learning • Creating document index by Elasticsearch • Using embeddings to train ranking models • Serving search queries by Elasticsearch with ranking models
  8. Filters Search Architecture Documents Query Filter Index ES + Re-ranking

    BERT Matches Ranked Results NER … Scoring Index Ranking Models
  9. Search Workflow With Learning To Rank User’s Needs Measure Relevance

    Pre-process Inverted-index Features Selection Ranking Models Scoring Function NDCG Precision@k MAP Deploy Monitoring Feedback Evaluation Build Index Learning To Rank Serve Data
  10. More Consideration > Good judge lists matching user needs of

    search quality > Good metrics measuring search results > Incorporating with embeddings into scoring function > Synchronizing the version between indexing and serving layers > A/B testing