Search Quality at LinkedIn

Recruiting Solutions Recruiting Solutions Recruiting Solutions Abhimanyu Lad Satya Kanduri
Senior Software Engineer Senior Software Engineer Abhi Satya Search Quality at LinkedIn

2 tag: skill OR title related skills: search, ranking, …
tag: company id: 1337 industry: internet verticals: people, jobs intent: exploratory

3 SEARCH USE CASES How do people use LinkedIn’s search?

4 PEOPLE SEARCH Search for people by name

5 PEOPLE SEARCH Search for people by other attributes

6 EXPLORATORY PEOPLE SEARCH

7 JOB SEARCH

8 COMPANY SEARCH

9 AND MUCH MORE…

10 OUR GOAL  Universal Search – Single search box
 High Recall – Spelling correction, synonym expansion, …  High Precision – Entity-oriented search: match things, not strings

11 QUERY UNDERSTANDING PIPELINE

12 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction
Query Expansion Raw query Structured query + Annotations

14 SPELLING CORRECTION Fix obvious typos Help users spell names

15 SPELLING OUT THE DETAILS PEOPLE NAMES COMPANIES TITLES PAST
QUERIES N-grams marissa => ma ar ri is ss sa Metaphone mark/marc => MRK Co-occurrence counts marissa:mayer = 1000 marisa meyer yahoo marissa marisa meyer mayer yahoo

16 SPELLING OUT THE DETAILS PROBLEM: Corpus as well as
query logs contain many spelling errors Certain spelling errors are quite frequent While genuine words (especially names) might be infrequent

17 SPELLING OUT THE DETAILS PROBLEM: Corpus as well as
query logs contain many spelling errors SOLUTION: Use query chains to infer correct spelling [product manger] [product manager] CLICK [marissa mayer] CLICK

19 QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE CO
GEO TITLE-237 software engineer software developer programmer … CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

20 QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE CO
GEO MORE PRECISE MATCHING WITH DOCUMENTS

21 ENTITY-BASED FILTERING BEFORE

22 AFTER ENTITY-BASED FILTERING BEFORE

23 BEFORE ENTITY-BASED FILTERING

24 AFTER ENTITY-BASED FILTERING BEFORE

25 ENTITY-BASED SUGGESTIONS

26 ENTITY-BASED SUGGESTIONS

27 QUERY TAGGING : SEQUENTIAL MODEL EMISSION PROBABILITIES (Learned from
user profiles) TRANSITION PROBABILITIES (Learned from query logs) TRAINING

28 QUERY TAGGING : SEQUENTIAL MODEL INFERENCE Given a query,
find the most likely sequence of tags

30 VERTICAL INTENT PREDICTION JOBS PEOPLE COMPANIES (Probability distribution over
verticals)

31 VERTICAL INTENT PREDICTION : SIGNALS [Company] 1. Past query
counts in each vertical + Query tags 2. Personalization: User’s search history [Employees] [Jobs] [Name Search] (TAG:COMPANY) (TAG:NAME)

33 QUERY EXPANSION GOAL: Improve recall through synonym expansion

34 QUERY EXPANSION : NAME SYNONYMS

35 QUERY EXPANSION : JOB TITLE SYNONYMS

36 QUERY EXPANSION : SIGNALS [jon] [jonathan] CLICK Trained using
query chains: [programmer] [developer] CLICK Symmetric but not transitive! [francis] [frank] ⇔ [franklin] [frank] ⇔ [francis] ≠ [franklin] [software engineer] [software developer] CLICK Context based! [software engineer] => [software developer] [civil engineer] ≠ [civil developer]

38 QUERY UNDERSTANDING: SUMMARY  High degree of structure in
queries as well as corpus (user profiles, job postings, companies, …)  Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search  Query tagging and query log analysis play a big role in query understanding

39 ranking

WHAT’S IN A NAME QUERY?

kevin scott ≠ BUT NAMES CAN BE AMBIGUOUS

SEARCHING FOR A COMPANY’S EMPLOYEES

SEARCHING FOR PEOPLE WITH A SKILL

RANKING IS COMPLICATED  Seemingly similar queries require dissimilar scoring
functions  Personalization matters – Multiple dimensions to personalize on – Dimensions vary with query class

TRAINING Documents for training F e a t u r
e s Human evaluation L a b e l s Machine learning model

ASSESSING RELEVANCE

RELEVANCE DEPENDS ON WHO’S SEARCHING What if the searcher is
a job seeker? Or a recruiter? Or…

THE QUERY IS NOT ENOUGH

WE NEED USER FEATURES  Non-personalized relevance model: score =
f(Document | Query)  Personalized relevance model: score = f(Document | Query, User)

COLLECTING RELEVANCE JUDGMENTS WON’T SCALE

TRAINING Documents for training F e a t u r
e s Human evaluation Search logs L a b e l s Machine learning model

CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Not-Clicked =
Not Relevant

CLICKS AS TRAINING DATA Unfairly penalized?  Good results not
seen are marked Not Relevant. Approach: Clicked = Relevant, Not-Clicked = Not Relevant User eye scan direction

CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped =
Not Relevant • Only penalize results that the user has seen but ignored

CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped =
Not Relevant • Only penalize results that the user has seen but ignored • Risks inverting model by overweighing low-ranked results

FAIR PAIRS [Radlinski and Joachims, AAAI’06] • Fair Pairs: •
Randomize, Clicked= R, Skipped= NR

FAIR PAIRS Flippe d [Radlinski and Joachims, AAAI’06] • Fair
Pairs: • Randomize, Clicked= R, Skipped= NR

FAIR PAIRS Flippe d [Radlinski and Joachims, AAAI’06] • Fair
Pairs: • Randomize, Clicked= R, Skipped= NR • Great at dealing with position bias • Does not invert models

EASY NEGATIVES Page 1 Page 99 • Assumption: A decent
current model would push out bad results to the very end. • Easy Negatives: Some of the results at the end are picked up as negative examples

EASY NEGATIVES • Use strategies that sample across the feature
space • Searches with less results preferred • Always sample from a given page, say page 10 2 pages 90+ pages

PUTTING IT ALL TOGETHER  Human evaluation is not practical
for personalized searches  Learn from user behavior – Multiple heuristics depending on the need – Different pros and cons

66 EFFICIENCY VS EXPRESSIVENESS  Build tree with logistic regression
leaves.  By restricting decision nodes to (Query, User) segments, only one regression model can be evaluated for each document. X 2 =0 X 2 =? X 2 =1 X 4 ? X 4 =0 X 4 =1

SCORING New document F e a t u r e
s Machin e learning model score New document F e a t u r e s Machin e learning model score New document F e a t u r e s Machine learning model score Ordered list Ordered list Ordered list

68 A SIMPLIFIED EXAMPLE Yes Name Query? N o Skill
Query? Yes N o

69 TEST, TEST, TEST a b c d g h
b e a f g h Model 1 Model 2 a b c e d f Interleaved [Radlinski et al., CIKM 2008] Interleaving

SUMMARY  Query understanding leverages the rich structure of LinkedIn’s
content and information needs.  Query tagging and rewriting allows us to deliver precision and recall.  For ranking, personalization is both the biggest challenge and the core of our solution.  Segmenting relevance models by query type helps us efficiently address the diversity of search needs.

71 Abhimanyu Lad Satya Kanduri [email protected] [email protected] https://linkedin.com/in/abhilad https://linkedin.com/in/skanduri

Search Quality at LinkedIn

Search Quality at LinkedIn

More Decks by Daniel Tunkelang

Other Decks in Technology

Featured

Transcript