Socializing Search. Professionally.

Recruiting Solutions Recruiting Solutions Recruiting Solutions Sriram Sankar Daniel Tunkelang
Principal Staff Engineer Head, Query Understanding Sriram Daniel Socializing Search. Professionally.

Whether you’ve tried to find an Apache committer…

3 …or an Apache commander,

4 you’ve probably used LinkedIn Search.

5 Let’s talk about… • Infrastructure • Quality Sriram Daniel

6 LinkedIn Search leverages the economic graph.

7 Social means that relevance is highly personalized.

8 Machine-learned ranking, socially.  Relevance models incorporate user features:
score = P (Document | Query, User)  Our model: tree with logistic regression leaves. 8 X 2 =0 X 2 =? X 2 =1 X 10 < 0.1234 ? Yes N o

9 LinkedIn’s focus: entity-oriented search. Company Employees Jobs Name Search

10 Query understanding can act as a relevance filter. 10
for i in [1..n] s  w 1 w 2 … w i if P c (s) > 0 a  new Segment() a.segs  {s} a.prob  P c (s) B[i]  {a} for j in [1..i-1] for b in B[j] s  w j w j+1 … w i if P c (s) > 0 a  new Segment() a.segs  b.segs U {s} a.prob  b.prob * P c (s) B[i]  B[i] U {a} sort B[i] by prob truncate B[i] to size k

11 Less is more. warren buffett

Jobs at LinkedIn Searc h link People currently working at
LinkedIn People who used to work at LinkedIn Coming soon: entity-driven search assist.

13 Infrastructure Lucene  Map of terms to documents –
the index  Provides an API to add and remove documents to the index  Provides an API to query the index

14 BLAH BLAH BLAH Daniel BLAH BLAH LinkedIn BLAH BLAH
BLAH BLAH BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH 2. 1. Daniel Sriram LinkedIn 2 1 Inverted Index Forward Index

15 A standard scoring capability is built in

16  Extremely easy to build a search engine 
But difficult to get sophisticated

17 The LinkedIn Search Stack Query Rewriter Index Retrieval Scorer
Sorter/Blender Request Response Offline Data Building Updates Live Updates Data

18 Search Index Served by Lucene  Inverted index 
Forward index  Static rank based document ordering

19 Offline Data Builds on Hadoop  Multi-stage map-reduce pipeline
allows complex data processing  Produces sharded single segment Lucene index with documents sorted by static rank  Produces data models for use in query rewriting

20 Live Data Updates  Feed based framework to support
updates to offline data builds  Lucene enhanced with a partial index update capability

21 Query Rewriting (and Planning)  Accepts raw query and
user metadata  Produces Lucene retrieval query and metadata for scoring  May use data models built offline

22 Index Retrieval  Lucene query built by query rewriter
is used to retrieve documents from the Lucene index  Documents are retrieved in static rank order (best document first)  Retrieval may be early-terminated – given that retrieval is in static rank order  No scoring is performed during retrieval

23 Scoring  Scoring is performed after retrieval  Its
input is the retrieved document (i.e., includes the forward index), a description of how the retrieval query matched the document, and the scoring metadata produced by the rewriter  Costly features can be computed offline during the index building process in Hadoop – e.g., tf/idf calculations

24 Summary Quality  LinkedIn Search leverages the economic graph.
 Social means that relevance is highly personalized.  Less is more: query understanding is a relevance filter.  Moving in the direction of suggesting structured queries. System  Powered by Lucene, but with additional components.  Offline data builds on Hadoop, partial index updates.  Index uses static ranking and early termination.  Scoring performed outside of Lucene.

25 Sriram Sankar Daniel Tunkelang [email protected] [email protected] https://linkedin.com/in/sriramxsankar https://linkedin.com/in/dtunkelang

Socializing Search. Professionally.

Socializing Search. Professionally.

Daniel Tunkelang

More Decks by Daniel Tunkelang

Other Decks in Technology

Featured

Transcript

Recruiting Solutions Recruiting Solutions Recruiting Solutions Sriram Sankar Daniel Tunkelang

Whether you’ve tried to find an Apache committer…

3 …or an Apache commander,

4 you’ve probably used LinkedIn Search.

5 Let’s talk about… • Infrastructure • Quality Sriram Daniel

6 LinkedIn Search leverages the economic graph.

7 Social means that relevance is highly personalized.

8 Machine-learned ranking, socially.  Relevance models incorporate user features:

9 LinkedIn’s focus: entity-oriented search. Company Employees Jobs Name Search

10 Query understanding can act as a relevance filter. 10

11 Less is more. warren buffett

Jobs at LinkedIn Searc h link People currently working at

13 Infrastructure Lucene  Map of terms to documents –

14 BLAH BLAH BLAH Daniel BLAH BLAH LinkedIn BLAH BLAH

15 A standard scoring capability is built in

16  Extremely easy to build a search engine 

17 The LinkedIn Search Stack Query Rewriter Index Retrieval Scorer

18 Search Index Served by Lucene  Inverted index 

19 Offline Data Builds on Hadoop  Multi-stage map-reduce pipeline

20 Live Data Updates  Feed based framework to support

21 Query Rewriting (and Planning)  Accepts raw query and

22 Index Retrieval  Lucene query built by query rewriter

23 Scoring  Scoring is performed after retrieval  Its

24 Summary Quality  LinkedIn Search leverages the economic graph.

25 Sriram Sankar Daniel Tunkelang [email protected] [email protected] https://linkedin.com/in/sriramxsankar https://linkedin.com/in/dtunkelang