[In]formation Retrieval: Search at LinkedIn

Recruiting Solutions Recruiting Solutions Recruiting Solutions formation Retrieval: Search at
LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding 1 Shakti Daniel

Why do 200M+ people use LinkedIn? 2

People use LinkedIn because of other people. 3

Search helps members find and be found. 4

Rich collection of professional content. 5

Every search is personalized. 6

Let’s talk a bit about how it all works. § 
Query Understanding §  Search Spam §  Unified Search More at http://data.linkedin.com/search. 7

Query Understanding 8

People are semi-structured objects. 9 9 for i in [1..n]!
s ← w1 w2 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc (s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc (s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

Word sense is contextual. 10

Understand queries as early as possible. 11

Query structure has many applications. §  Boost results that match
query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  … Query understanding focuses on set-level metrics. Not just about best answer, but getting to best question. 12

Search Spam 13

Let’s look at a search spammer. 14

Summary is verbose but legitimate. 15

But then comes the keyword stuffing. 16

How we train our search spam classifier. §  Find the
queries targeted by spammers. –  10,000 most common non-name queries. §  Look at top results for a generic user. –  i.e., show unpersonalized search results. §  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers. §  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective. 17

ROC curve for spam thresholding. 18 0 0.1 0.2 0.3
0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 a b Spam score threshold 0 < a < b < 1

Integrate spamminess into relevance score. §  Spam model yields a
probability between 0 and 1. §  Use spam score as piecewise linear factor: if score < spam min : # not a spammer relevance *= 1.0 elif score > spam max : # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spam max - score) / (spam max - spam min ) 19

Spam is an arms race. §  We can’t reveal precisely
which features we use for spam detection, or spammers will work around them. §  Spammers will try to reverse-engineer us anyway. §  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking. §  Fighting spam is all about making the investment less profitable for the spammer. 20

Unified Search 21

Un-Unified Search 22

Introducing LinkedIn Unified Search! Goal: make all of our content
more discoverable. Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page 23

Query Auto-Complete 24

Best completion not always the most popular. §  In a
heavy-tailed distribution, even the most popular queries account for a small fraction of distribution. §  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs §  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types. 25

Content Type Suggestions 26

How we compute content type suggestions. §  Rank content types
by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions. §  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias. §  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical. 27

Unified Search Result Page 28

Intent Detection and Page Construction §  Relevance is now a
two-part computation: P(Content Type | User, Query) x P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries to all verticals. §  Secondary components introduce diversity. 29

Summary §  Personalize every search and leverage structure. §  Understand
queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience. Goal: help LinkedIn’s 200M+ members find and be found. 30

Thank you! 31

Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact
us: –  Shakti: [email protected] http://linkedin.com/in/sdsinha –  Daniel: [email protected] http://linkedin.com/in/dtunkelang §  Did we mention that we’re hiring? 32

[In]formation Retrieval: Search at LinkedIn

[In]formation Retrieval: Search at LinkedIn

Daniel Tunkelang

More Decks by Daniel Tunkelang

Other Decks in Technology

Featured

Transcript

Recruiting Solutions Recruiting Solutions Recruiting Solutions formation Retrieval: Search at

Why do 200M+ people use LinkedIn? 2

People use LinkedIn because of other people. 3

Search helps members find and be found. 4

Rich collection of professional content. 5

Every search is personalized. 6

Let’s talk a bit about how it all works. §

Query Understanding 8

People are semi-structured objects. 9 9 for i in [1..n]!

Word sense is contextual. 10

Understand queries as early as possible. 11

Query structure has many applications. §  Boost results that match

Search Spam 13

Let’s look at a search spammer. 14

Summary is verbose but legitimate. 15

But then comes the keyword stuffing. 16

How we train our search spam classifier. §  Find the

ROC curve for spam thresholding. 18 0 0.1 0.2 0.3

Integrate spamminess into relevance score. §  Spam model yields a

Spam is an arms race. §  We can’t reveal precisely

Unified Search 21

Un-Unified Search 22

Introducing LinkedIn Unified Search! Goal: make all of our content

Query Auto-Complete 24

Best completion not always the most popular. §  In a

Content Type Suggestions 26

How we compute content type suggestions. §  Rank content types

Unified Search Result Page 28

Intent Detection and Page Construction §  Relevance is now a

Summary §  Personalize every search and leverage structure. §  Understand

Thank you! 31

Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact