Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Search Quality at LinkedIn

Search Quality at LinkedIn

This 2014 Bay Area Search Meetup presentations discusses LinkedIn's challenges in delivering high quality search results to 277M+ members. Results are highly personalized, requiring machine-learned relevance models that combine document, query, and user features. And emphasis on entities (names, companies, job titles, etc.) affects query processing and understanding. This presentation discusses these challenges in detail and describes some of the solutions to address them.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 26, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. Recruiting Solutions Recruiting Solutions Recruiting Solutions Abhimanyu Lad Satya Kanduri

    Senior Software Engineer Senior Software Engineer Abhi Satya Search Quality at LinkedIn
  2. 2 tag: skill OR title related skills: search, ranking, …

    tag: company id: 1337 industry: internet verticals: people, jobs intent: exploratory
  3. 10 OUR GOAL  Universal Search – Single search box

     High Recall – Spelling correction, synonym expansion, …  High Precision – Entity-oriented search: match things, not strings
  4. 12 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction

    Query Expansion Raw query Structured query + Annotations
  5. 13 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction

    Query Expansion Raw query Structured query + Annotations
  6. 15 SPELLING OUT THE DETAILS PEOPLE NAMES COMPANIES TITLES PAST

    QUERIES N-grams marissa => ma ar ri is ss sa Metaphone mark/marc => MRK Co-occurrence counts marissa:mayer = 1000 marisa meyer yahoo marissa marisa meyer mayer yahoo
  7. 16 SPELLING OUT THE DETAILS PROBLEM: Corpus as well as

    query logs contain many spelling errors Certain spelling errors are quite frequent While genuine words (especially names) might be infrequent
  8. 17 SPELLING OUT THE DETAILS PROBLEM: Corpus as well as

    query logs contain many spelling errors SOLUTION: Use query chains to infer correct spelling [product manger] [product manager] CLICK [marissa mayer] CLICK
  9. 18 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction

    Query Expansion Raw query Structured query + Annotations
  10. 19 QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE CO

    GEO TITLE-237 software engineer software developer programmer … CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
  11. 20 QUERY TAGGING IDENTIFYING ENTITIES IN THE QUERY TITLE CO

    GEO MORE PRECISE MATCHING WITH DOCUMENTS
  12. 27 QUERY TAGGING : SEQUENTIAL MODEL EMISSION PROBABILITIES (Learned from

    user profiles) TRANSITION PROBABILITIES (Learned from query logs) TRAINING
  13. 28 QUERY TAGGING : SEQUENTIAL MODEL INFERENCE Given a query,

    find the most likely sequence of tags
  14. 29 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction

    Query Expansion Raw query Structured query + Annotations
  15. 31 VERTICAL INTENT PREDICTION : SIGNALS [Company] 1. Past query

    counts in each vertical + Query tags 2. Personalization: User’s search history [Employees] [Jobs] [Name Search] (TAG:COMPANY) (TAG:NAME)
  16. 32 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction

    Query Expansion Raw query Structured query + Annotations
  17. 36 QUERY EXPANSION : SIGNALS [jon] [jonathan] CLICK Trained using

    query chains: [programmer] [developer] CLICK Symmetric but not transitive! [francis] [frank] ⇔ [franklin] [frank] ⇔ [francis] ≠ [franklin] [software engineer] [software developer] CLICK Context based! [software engineer] => [software developer] [civil engineer] ≠ [civil developer]
  18. 37 QUERY UNDERSTANDING PIPELINE Spellcheck Query Tagging Vertical Intent Prediction

    Query Expansion Raw query Structured query + Annotations
  19. 38 QUERY UNDERSTANDING: SUMMARY  High degree of structure in

    queries as well as corpus (user profiles, job postings, companies, …)  Query understanding allows us to optimally balance recall and precision by supporting entity-oriented search  Query tagging and query log analysis play a big role in query understanding
  20. RANKING IS COMPLICATED  Seemingly similar queries require dissimilar scoring

    functions  Personalization matters – Multiple dimensions to personalize on – Dimensions vary with query class
  21. TRAINING Documents for training F e a t u r

    e s Human evaluation L a b e l s Machine learning model
  22. TRAINING Documents for training F e a t u r

    e s Human evaluation L a b e l s Machine learning model
  23. WE NEED USER FEATURES  Non-personalized relevance model: score =

    f(Document | Query)  Personalized relevance model: score = f(Document | Query, User)
  24. TRAINING Documents for training F e a t u r

    e s Human evaluation Search logs L a b e l s Machine learning model
  25. CLICKS AS TRAINING DATA Unfairly penalized?  Good results not

    seen are marked Not Relevant. Approach: Clicked = Relevant, Not-Clicked = Not Relevant User eye scan direction
  26. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped =

    Not Relevant • Only penalize results that the user has seen but ignored
  27. CLICKS AS TRAINING DATA Approach: Clicked = Relevant, Skipped =

    Not Relevant • Only penalize results that the user has seen but ignored • Risks inverting model by overweighing low-ranked results
  28. FAIR PAIRS Flippe d [Radlinski and Joachims, AAAI’06] • Fair

    Pairs: • Randomize, Clicked= R, Skipped= NR
  29. FAIR PAIRS Flippe d [Radlinski and Joachims, AAAI’06] • Fair

    Pairs: • Randomize, Clicked= R, Skipped= NR • Great at dealing with position bias • Does not invert models
  30. EASY NEGATIVES Page 1 Page 99 • Assumption: A decent

    current model would push out bad results to the very end. • Easy Negatives: Some of the results at the end are picked up as negative examples
  31. EASY NEGATIVES • Use strategies that sample across the feature

    space • Searches with less results preferred • Always sample from a given page, say page 10 2 pages 90+ pages
  32. PUTTING IT ALL TOGETHER  Human evaluation is not practical

    for personalized searches  Learn from user behavior – Multiple heuristics depending on the need – Different pros and cons
  33. 66 EFFICIENCY VS EXPRESSIVENESS  Build tree with logistic regression

    leaves.  By restricting decision nodes to (Query, User) segments, only one regression model can be evaluated for each document. X 2 =0 X 2 =? X 2 =1 X 4 ? X 4 =0 X 4 =1
  34. SCORING New document F e a t u r e

    s Machin e learning model score New document F e a t u r e s Machin e learning model score New document F e a t u r e s Machine learning model score Ordered list Ordered list Ordered list
  35. 69 TEST, TEST, TEST a b c d g h

    b e a f g h Model 1 Model 2 a b c e d f Interleaved [Radlinski et al., CIKM 2008] Interleaving
  36. SUMMARY  Query understanding leverages the rich structure of LinkedIn’s

    content and information needs.  Query tagging and rewriting allows us to deliver precision and recall.  For ranking, personalization is both the biggest challenge and the core of our solution.  Segmenting relevance models by query type helps us efficiently address the diversity of search needs.