Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[In]formation Retrieval: Search at LinkedIn

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

[In]formation Retrieval: Search at LinkedIn

This 2013 Bay Area Search Meetup presentation discusses LinkedIn's search platform.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 24, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. Recruiting Solutions Recruiting Solutions Recruiting Solutions formation Retrieval: Search at

    LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding 1 Shakti Daniel
  2. Let’s talk a bit about how it all works. § 

    Query Understanding §  Search Spam §  Unified Search More at http://data.linkedin.com/search. 7
  3. People are semi-structured objects. 9 9 for i in [1..n]!

    s ← w1 w2 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc (s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc (s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!
  4. Query structure has many applications. §  Boost results that match

    query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  … Query understanding focuses on set-level metrics. Not just about best answer, but getting to best question. 12
  5. How we train our search spam classifier. §  Find the

    queries targeted by spammers. –  10,000 most common non-name queries. §  Look at top results for a generic user. –  i.e., show unpersonalized search results. §  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers. §  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective. 17
  6. ROC curve for spam thresholding. 18 0 0.1 0.2 0.3

    0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 a b Spam score threshold 0 < a < b < 1
  7. Integrate spamminess into relevance score. §  Spam model yields a

    probability between 0 and 1. §  Use spam score as piecewise linear factor: if score < spam min : # not a spammer relevance *= 1.0 elif score > spam max : # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spam max - score) / (spam max - spam min ) 19
  8. Spam is an arms race. §  We can’t reveal precisely

    which features we use for spam detection, or spammers will work around them. §  Spammers will try to reverse-engineer us anyway. §  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking. §  Fighting spam is all about making the investment less profitable for the spammer. 20
  9. Introducing LinkedIn Unified Search! Goal: make all of our content

    more discoverable. Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page 23
  10. Best completion not always the most popular. §  In a

    heavy-tailed distribution, even the most popular queries account for a small fraction of distribution. §  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs §  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types. 25
  11. How we compute content type suggestions. §  Rank content types

    by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions. §  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias. §  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical. 27
  12. Intent Detection and Page Construction §  Relevance is now a

    two-part computation: P(Content Type | User, Query) x P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries to all verticals. §  Secondary components introduce diversity. 29
  13. Summary §  Personalize every search and leverage structure. §  Understand

    queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience. Goal: help LinkedIn’s 200M+ members find and be found. 30
  14. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact

    us: –  Shakti: [email protected] http://linkedin.com/in/sdsinha –  Daniel: [email protected] http://linkedin.com/in/dtunkelang §  Did we mention that we’re hiring? 32