Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Search as Communication: Lessons from a Persona...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Search as Communication: Lessons from a Personal Journey

The Etsy's Code as Craft presentation discusses lessons learned from my personal journey in search engineering. It covers insights from library science about treating search as an information-seeking context and communicating with users. It also discusses the importance of entity detection and how to leverage corpus features to improve extraction. I realized that queries vary in difficulty and systems need to recognize this and adapt accordingly. The key takeaway is that search should be treated as a communication problem rather than just a ranking task.

Avatar for Daniel Tunkelang

Daniel Tunkelang

May 25, 2026

More Decks by Daniel Tunkelang

Other Decks in Technology

Transcript

  1. Search  as  Communica/on:   Lessons  from  a  Personal  Journey  

    Daniel  Tunkelang   Head  of  Query  Understanding,  LinkedIn  
  2. Unfortunately,  I  never  read  them  in  school.   But  I

     did  study  graphs  and  stuff.    
  3. Outline   1.  Lessons  from  Library  Science     2.

     Adventures  with  InformaAon  ExtracAon     3.  A  Moment  of  Clarity  
  4. InformaAon  need   query   select  from  results   rank

     using  IR  model   USER:   SYSTEM:   M-­‐idf   PageRank   A  birds-­‐eye  view  of  how  search  engines  work.  
  5. Take-­‐away  for  search  engine  developers:       Act  like

     a  librarian.  Communicate  with  your  user.  
  6. 20   20 for i in [1..n]! s ← w1

    w2 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc (s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc (s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! People  search  for  en//es.  Recognize  them!  
  7. Problem:  they  process  each  document  separately.   EnAty   DetecAon

      System   Why  not  take  advantage  of  corpus  features?      
  8. Give  your  documents  the  right  to  vote!   Use  a

     high-­‐recall  method  to  collect  candidates.   •  e.g.,  all  Atle-­‐case  spans  of  words  other   than  single  word  beginning  a  sentence.     Process  each  document  separately.   •  Each  candidate  is  assigned  an  enAty  type,   or  no  type  at  all.     If  a  candidate  is  mostly  assigned  a  single  enAty   type,  extrapolate  to  all  its  occurrences.  
  9. Looking  for  topics?  Use  idf,  and  its  cousin  ridf.  

    Inverse  document  frequency  (idf)   •  Too  low?  Probably  a  stop  word.   •  Too  high?  Could  be  noise.     Residual  inverse  document  frequency  (ridf)   •  Predict  idf  using  Poisson  model.   •  Difference  between  idf  and  predicted  idf.    “a  good  keyword  is  far  from  Poisson”            [Church  and  Gale,  1995]  
  10. Take-­‐away  for  search  engine  developers:       En/ty  detec/on

     is  crucial.  And  it  isn’t  that  hard.  
  11. informaAon  Need   query   select  from  results   rank

     using  IR  model   USER:   SYSTEM:   M-­‐idf   PageRank   Let’s  go  back  to  our  pigeons  for  a  moment.    
  12. And  here’s  what  it  looks  like  to  the  user.  

    GOOD   NOT  SO  GOOD   But  can  the  system  tell  the  difference?  
  13. 34   34 for i in [1..n]! s ← w1

    w2 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc (s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi ! if Pc (s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc (s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! We  can  segment  informa/on  need  from  the  query.  
  14. Claudia  Hauff,  Query  Difficulty  for  Digital  Libraries  [2009]   There

     are  many  pre-­‐  and  post-­‐retrieval  signals.  
  15. Take-­‐away  for  search  engine  developers:       Queries  vary

     in  difficulty.  Recognize  and  adapt.  
  16. Review   1.  Lessons  from  Library  Science   •  Act

     like  a  librarian.  Communicate  with  users.     2.  Adventures  with  InformaAon  ExtracAon   •  EnAty  detecAon  is  crucial.  And  isn’t  that  hard.     3.  A  Moment  of  Clarity   •  Queries  vary  in  difficulty.  Recognize  and  adapt.