Utilizing BERT to make keyword-based search smart

2019 DevDay Utilizing BERT To Make Keyword- Based Search Smart
> Ching-Hsiang Tsai (Shawn Tsai) > LINE Taiwan Data Team Team Lead

Agenda > Search Everywhere > Search Result Relevance > Embeddings
> Learning To Rank > Search Workflow

Search Everywhere Life on LINE

Search Result Relevance > The main goal is to reduce
the semantic gap between user query and documents. > The key points: semantic features and ranking function. > Search is a ranking problem. The ordering is more important than the predicted probability of a single instance.

> % > & %$ Limitation:
Different description Limitation: No shared keywords > &!$# > $" Search Scoring & Limitation > Sometimes, LINE doesn’t show notifications > Why messages don’t display > Chat history is still compressing > No matter how I click backup button, it doesn’t work > "# " $% & = #()*+)%,- , $&# " = 1 + 012 3456789:; <78=>?@:; > _B,1() = "# ∗ $&# Standard similarity function: TF-IDF

Embeddings

Word Embedding > Vector representation > Capturing context of a
word in a document, semantic/syntactic similarity, relation with other words Source: Efficient Estimation of Word Representations in Vector Space

BERT > BERT is a new method of pre-training language
representations which obtains state-of-the-art results on a wide array of NLP tasks. Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Bidirectional Encoder Representations from Transformers

Querying By Vector Representation Sent. Encoding By Pre-trained BERT Model
Query Document Vecs Index Query Vec Documents Document Vecs Online Offline Nearest Neighbor Search Build NN Index

Learning To Rank

Learning To Rank > Applying machine learning to construct ranking
models for information retrieval systems > Caring more about ranking rather than rating prediction > Scoring by machine learning • Creating document index by Elasticsearch • Using embeddings to train ranking models • Serving search queries by Elasticsearch with ranking models

Filters Search Architecture Documents Query Filter Index ES + Re-ranking
BERT Matches Ranked Results NER … Scoring Index Ranking Models

Custom Scoring Function

Search Workflow User’s Needs Measure Relevance Pre-process Inverted-index Deploy Monitoring
Feedback Evaluation Serve Data Build Index

Search Workflow With Learning To Rank User’s Needs Measure Relevance
Pre-process Inverted-index Features Selection Ranking Models Scoring Function NDCG Precision@k MAP Deploy Monitoring Feedback Evaluation Build Index Learning To Rank Serve Data

More Consideration > Good judge lists matching user needs of
search quality > Good metrics measuring search results > Incorporating with embeddings into scoring function > Synchronizing the version between indexing and serving layers > A/B testing

Thank you

Utilizing BERT to make keyword-based search smart

Utilizing BERT to make keyword-based search smart

LINE DevDay 2019

More Decks by LINE DevDay 2019

Other Decks in Technology

Featured

Transcript

2019 DevDay Utilizing BERT To Make Keyword- Based Search Smart

Agenda > Search Everywhere > Search Result Relevance > Embeddings

Search Everywhere Life on LINE

Search Result Relevance > The main goal is to reduce

> % > & %$ Limitation:

Embeddings

Word Embedding > Vector representation > Capturing context of a

BERT > BERT is a new method of pre-training language

Querying By Vector Representation Sent. Encoding By Pre-trained BERT Model

Learning To Rank

Learning To Rank > Applying machine learning to construct ranking

Filters Search Architecture Documents Query Filter Index ES + Re-ranking

Custom Scoring Function

Search Workflow User’s Needs Measure Relevance Pre-process Inverted-index Deploy Monitoring

Search Workflow With Learning To Rank User’s Needs Measure Relevance

More Consideration > Good judge lists matching user needs of

Thank you