Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploiting Intent Tagging Information to Improve Vertical Search Query Intent Segmentation

bryanyuan2
June 26, 2013
170

Exploiting Intent Tagging Information to Improve Vertical Search Query Intent Segmentation

bryanyuan2

June 26, 2013
Tweet

Transcript

  1. Problem Definition How to describe user intent change or not

    q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 4
  2. Daniel E Rose and Danny Levinson Understanding User Goals in

    Web Search WWW 2004 Search Intent Classification 5 Informational Navigational Transactional 國父紀念館 日本料理 築地鮮魚 台北 築地鮮魚線上定位 Query Related Work
  3. Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic

    Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 Related Work Identifying Goals and Missions session 6
  4. Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa

    Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search mission is a related set of information needs, resulting in one or more goals. session mission mission mission 7
  5. Related Work Identifying Goals and Missions session mission mission mission

    goal goal goal goal goal goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search goal is an atomic information need, resulting in one or more queries. A search mission is a related set of information needs, resulting in one or more goals. 8
  6. q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理

    Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 9
  7. Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa

    Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 goal q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 10
  8. Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa

    Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 mission goal q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 11
  9. Our Proposal We propose a new features set to improve

    the goal and mission boundary detection in vertical search domain 12 q q q q q q q
  10. Our Proposal We propose a new features set to improve

    the goal and mission boundary detection in vertical search domain 13 Goal Goal Goal Goal Goal Mission Mission Mission q q q q q q q
  11. Reasons for Goal/Mission Hierarchical Structure !   Evaluation the importance

    websites to users !   Evaluation the accuracy of search engine 14
  12. Reasons for Goal/Mission Hierarchical Structure Goal 1 Goal 2 Goal

    3 Mission 1 A A B A !   Evaluation the importance websites to users site A is more important than site B 15
  13. Reasons for Goal/Mission Hierarchical Structure Goal 1 Goal 2 Goal

    3 Mission 1 Goal 4 Goal 1 Goal 2 Mission 1 !   Evaluation the accuracy of search engine Engine A Engine B < 16
  14. System Overview Baseline Rosie Jones, Kristina Lisa Klinkner.Beyond the Session

    Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 18 Intent Tagging Query log Preprocessing Statement Transition
  15. Preprocessing Query A Query B Query A Query A Query

    B Query A Query B Remove duplicate queries Preprocessing Intent tagging 19
  16. … Query log Query pairs Preprocessing Query 1 Query 2

    Query 3 Query 4 Preprocessing Intent tagging 20
  17. … Query log Query pairs Preprocessing Query 1 Query 2

    Query 3 Query 4 Query 1 Query 2 Query 2 Query 3 Query 3 Query 4 Query 4 Query 5 Preprocessing Intent tagging 21
  18. Intent Head (IH) Intent Modifier (IM) 公館韓式燒肉 愛樂廚房 Xiao Li

    Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 Preprocessing Intent tagging 22 •  IH •  IM:Location •  IM:Type Intent Tagging
  19. Intent Head (IH) Intent Modifier (IM) 公館韓式燒肉 愛樂廚房 Xiao Li

    Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 Preprocessing Intent tagging •  IH = 愛樂廚房 •  IM:Location = 公館 •  IM:Type = 韓式燒肉 23 Intent Tagging
  20. •  Lexical: Current word •  Syntactic: POS tag of the

    current word using CKIP (中文斷詞系統) •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Preprocessing Intent tagging 24 Intent Tagging
  21. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Lexical 公 館 韓

    式 燒 肉 愛 樂 廚 房 •  Syntactic 公館 (N) 韓式燒肉 (N) 愛樂廚房 (N) •  Yahoo! Life+ Title Match 公館韓式燒肉 (Y) 愛樂廚房 (Y) 25 •  Lexical: Current word •  Syntactic: POS tag of the current word using CKIP (中文斷詞系統) •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Intent Tagging
  22. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match:

    Current N (N=1~7) in Yahoo! Life+ title using maximum matching 26 Intent Tagging
  23. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 公 館 韓 式 燒

    肉 愛 樂 廚 房 公館 館韓 韓式 式燒 燒肉 愛樂 樂廚 廚房 公館韓 館韓式 韓式燒 式燒肉 愛樂廚 樂廚房 公館韓式 館韓式燒 韓式燒肉 愛樂廚房 公館韓式燒 館韓式燒肉 公館韓式燒肉 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 27 Intent Tagging
  24. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 公 館 韓 式 燒

    肉 愛 樂 廚 房 公館 館韓 韓式 式燒 燒肉 愛樂 樂廚 廚房 公館韓 館韓式 韓式燒 式燒肉 愛樂廚 樂廚房 公館韓式 館韓式燒 韓式燒肉 愛樂廚房 公館韓式燒 館韓式燒肉 公館韓式燒肉 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 28 Intent Tagging
  25. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match:

    Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 29 Intent Tagging
  26. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match:

    Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳 金山鴨肉 彰化肉圓 萬巒豬腳大王 30 Intent Tagging
  27. Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match:

    Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳 金山鴨肉 彰化肉圓 萬巒豬腳大王 Yahoo Life+ Database query Taiwan Road & Street Database 31 Intent Tagging
  28. 公 B-N B-Y B-IM:L 館 I-N I-Y I-IM:L 韓 B-N

    I-Y B-IM:T 式 I-N I-Y I-IM:T 燒 I-N I-Y I-IM:T 肉 I-N I-Y I-IM:T 愛 B-N B-Y B-IH 樂 I-N I-Y I-IH 廚 I-N I-Y I-IH 房 I-N I-Y I-IH 公館韓式燒肉 愛樂廚房 CRF++ 0.58 bi-gram Preprocessing Intent tagging 32 Intent Tagging
  29. Collecting Dataset 店名 找 具備 條件 的 資訊 Method 1:

    Find Target Method 2: Search from Location 地點 找 附近的 餐廳類型 具備 條件 User search hint generation 34
  30. Collecting Dataset - Method 1: Find Target 地址 營業時間 假日有沒有開

    招牌菜 負評 好評 聯絡電話 評論資訊充足 附近可以續攤 交通方便 營業時間晚 不要太吵雜 氣氛裝潢好 愛評網前 700 熱門 店家 Random select 店名 找 具備 條件 的 資訊 35
  31. Collecting Dataset - Method 2: Search from Location 評論資訊充足 附近可以續攤

    交通方便 營業時間晚 不要太吵雜 氣氛裝潢好 中式料理 日式料理 粵菜 港式飲茶 熱炒 四川菜 客家菜 海鮮餐廳 居酒屋 日式拉麵 … 愛評網熱 門地區 Random select 地點 找 附近的 餐廳類型 具備 條件 36
  32. .. Collecting Dataset Method 1: Find Target Method 2: Search

    from Location •  我想找「老乾杯(延吉店)」,我想知道 「招牌菜」,條件是「交通要方便」, 不然需要備案 •  我想找「築地鮮魚」,我想知道「有沒 有負評」,條件是「附近可以續攤」, 不然需要備案 •  我想找「永康街芋頭大王」,我想知道 「有沒有好評」,條件是「要多找點資 訊」,不然需要備案 •  找「松山機場」附近的「運動主題餐 廳」,我想知道「週六、週日有沒有 開」,如果沒有我也需要其他備案 •  找「台北新光山越站前」附近的「懷石 料理」,條件是「環境不要太吵雜」, 我想知道「週六、週日有沒有開」,如 果沒有我也需要其他備案 … … 38
  33. Collecting Dataset User query log Search hint User query Yahoo!

    生活+ http://tw.ipeen.lifestyle.yahoo.net/ 39
  34. Collecting Dataset User query log Search hint User query user

    40 q1 q2 q3 台北車站 窯烤 pizza 台北車站 BELLINI PASTA PASTA 台北車站 pizza
  35. Collecting Dataset User query log Search hint User query user

    41 q1 q2 q3 台北車站 窯烤 pizza 台北車站 BELLINI PASTA PASTA 台北車站 pizza
  36. Dataset Statistics General Stats Total users 54 Total user search

    log 2008 Total goals 1275 Total missions 712 Goal Stats Mission Stats Avg queries of each goal 1.5699 Avg queries of each mission 2.8101 Max queries of each goal 7 Max queries of each mission 11 Min queries of each goal 1 Min queries of each mission 1 42
  37. Intent Tagging Evaluation L + Y L + S Y

    + S L + Y + S P R F1 P R F1 P R F1 P R F1 IH All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 43
  38. Data Analysis for Wrong Prediction 木柵 (IM:Location) 壽司專賣山中屋壽司酒房 (IH) 新竹

    (IM:Location) 懷石料理竹北 (IM:Location) 中山北路 (IM:Location) 羊肉爐台北 (IM:Location) 高雄 (IM:Location) 左營樂品咖啡館 (IH) 師大 (IM:Location) 熱炒海鮮尊王活海鮮小吃店 (IH) 內湖 (IM:Location) 拉麵六丁目拉麵 (IM:Type) 居酒屋 (IM:Type) 風林火山blog (O) 44 Wrong Combination Wrong Segmentation 德記雲吞 (IH) 茶 (IM:Type) 館 (IH) 絲襪奶茶 (IM:Type)
  39. Intent tagging Statement Transition Statement Transition 45 Goal boundary Mission

    boundary IH IM:L IM:T Is the intent tag changed ? q1 q2
  40. NONE!!!!!"!!! = !! ∧!!! = !! `" DELETE!!!!!"!!! ≠ !!

    ∧!!! =!!" MODIFY!!!!!"!!! ≠ !! ∧!!! ≠!!! ∧ !!"##"$%!!"#$%&'( !! , !!! ≤ 0.5! NEW!!!!!"!!! ≠ !! ∧!!! ≠!!! ∧ !!"##"$%!!"#$%&'( !! , !!! > 0.5! SAME!!!!!"!!! ≠ !! ∧!!! ≠ !! ∧!!! = !! !! `" INSERT!!!!!"!!! = !! ∧!!! ≠!!" T !! , !! =" Intent tagging Statement Transition Statement Transition q1 q2 46
  41. IM:T IM:L IM:L IH = DELETE IM:L = SAME IH

    = NEW 台北車站 LUCCA PASTA 台北車站 義大利麵 •  INSERT Target intent only appeared in q2 •  DELETE Target intent has been deleted in q2 •  NEW Target intent has been changed if Jaccard distance>=0.5 •  MODIFY Target intent has been changed if Jaccard distance<0.5 •  SAME Target intent is equal in both q1 and q2 •  NONE Target intent has not been tagged in both q1 and q2 Intent tagging Statement Transition Statement Transition 47 IM:T q1 q2
  42. 公館 麻辣鍋 公館 麻辣鍋 天麻 •  INSERT Target intent only

    appeared in q2 •  DELETE Target intent has been deleted in q2 •  NEW Target intent has been changed if Jaccard distance>=0.5 •  MODIFY Target intent has been changed if Jaccard distance<0.5 •  SAME Target intent is equal in both q1 and q2 •  NONE Target intent has not been tagged in both q1 and q2 IM:T = SAME IM:L = SAME IH = INSERT IM:L IM:L IM:T IM:T Intent tagging Statement Transition Statement Transition 48 IH q1 q2
  43. Temporal Features Word and Character Edit Features Query Log Sequence

    Features Web Search Features •  inter-query time •  time diff •  sequential-queries •  lev •  word_pov •  word_suf •  nsubst_X_q1 •  nsubst_X_q2 •  nsubst_q2_X •  p_change •  pq12 •  entropy_X_q1 •  entropy_q1_X •  prisma •  commonw •  wordr Baseline 49 Preprocessing Intent tagging baseline
  44. Temporal Features q1 q2 q2 q3 q3 q4 •  inter-query

    time = threshold as a binary feature (5 mins, 30 mins, 60 mins, 120 mins) •  time diff = inter-query time in seconds •  sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user 50 Preprocessing Intent tagging baseline
  45. •  inter-query time = threshold as a binary feature (5

    mins, 30 mins, 60 mins, 120 mins) •  time diff = inter-query time in seconds •  sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user Temporal Features 築地鮮魚 台北 築地鮮魚 台北 blog 築地鮮魚 台北 blog 大大茶樓 南京 大大茶樓 南京 築地鮮魚 台北 blog 51 Preprocessing Intent tagging baseline
  46. Word and Character Edit Features •  lev = normalized Levenshtein

    distance •  word_pov = num. characters in common starting from the left •  word_suf = num. characters in common starting from the right •  commonw = num. words in common •  wordr = jaccard distance between sets of words q1 q2 52 Preprocessing Intent tagging baseline
  47. Word and Character Edit Features 新竹 冰淇淋餐廳 新竹 冰淇淋餐廳 莫凡彼

    •  lev = 3 •  word_pov = 7 •  word_suf = 0 •  commonw = 7 •  wordr = 0.7 53 Preprocessing Intent tagging baseline
  48. Query Log Sequence Features !(!! → !! )/!"#!" (!! →

    !! )! •  pq12 = !(!! |!! ) ! !"#! (!(!! |!! ))! •  entropy_X_q1 = ! !! !! ! !"#! ! !! !! ! •  entropy_q1_X = !"#$% !∃! ! → !! ! •  nsubst_X_q1 = !"#$% !∃! ! → !! ! •  nsubst_X_q2 = !"#$% !∃! !! → ! ! •  nsubst_q2_X = !! → !! : !! ≠ !! ! •  p_change = q1 q2 qj qi 54 Preprocessing Intent tagging baseline
  49. ! Prisma = cosine distance between vectors derived from the

    first 50 search results for the query terms Web Search Features Yahoo! BOSS Search API, http://developer.yahoo.com/boss/search/ 55 Preprocessing Intent tagging baseline
  50. Goal / Mission Boundary Detection B I B B I

    I Sequence Based q1 q2 q3 q4 q5 q6 Token Based
  51. Token Based – Goal Boundary Detection LIBSVM 3.17 classifier (

    30-sets train/test statistics evaluation ) 57 Goal boundary P R F1 Baseline [1] 0.6601 0.5739 0.6140 Baseline *[2] 0.7608 0.7869 0.7736 Our *[3] 0.7680 0.8247 0.7954 Fselect ( Baseline+Our )*[4] 0.7792 0.8247 0.8013 [1] commonw, prisma, time [2] lev, prisma, wordr, word_suf, word_pov, n_subst_X_q1 [3] crf_ih, crf_iml, crf_imt [4] lev, wordr, word_pov, commonw, prisma, crf_ih_state, crf_imt_state, crf_iml_state, pq12 *
  52. Token Based – Mission Boundary Detection [1] commonw, prisma, time

    [2] lev, wordr, peos_q1 prisma, time_diff, peos_q2, n_subst_X_q2 [3] crf_ih, crf_iml, crf_imt [4] wordr, commonw, lev, inter_query_time, word_pov, prisma, crf_ih_state, crf_iml_state, crf_imt_state, word_suf 58 Mission boundary P R F1 Baseline [1] 0.9172 0.9375 0.9272 Baseline *[2] 0.9313 0.9250 0.9281 Our *[3] 0.9201 0.9298 0.9249 Fselect ( Baseline+Our )*[4] 0.9282 0.9692 0.9483 LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation )
  53. Sequence Based – Goal Boundary Detection LIBSVM 3.17 classifier (

    30-sets train/test statistics evaluation ) [1] commonw, prisma, time [2] lev, prisma, wordr, word_suf, word_pov, n_subst_X_q1 [3] crf_ih, crf_iml, crf_imt [4] lev, wordr, word_pov, commonw, prisma, crf_ih_state, crf_imt_state, crf_iml_state, pq12 59 Goal boundary P R F1 Baseline [1] 0.2964 0.2573 0.2755 Baseline *[2] 0.5150 0.5317 0.5232 Our *[3] 0.5399 0.5798 0.5591 Fselect ( Baseline+Our )*[4] 0.5397 0.5712 0.5550 *
  54. Sequence Based – Mission Boundary Detection LIBSVM 3.17 classifier (

    30-sets train/test statistics evaluation ) 60 Mission boundary P R F1 Baseline [1] 0.8194 0.8375 0.8283 Baseline *[2] 0.8219 0.8163 0.8191 Our *[3] 0.8030 0.8115 0.8073 Fselect ( Baseline+Our )*[4] 0.8462 0.8837 0.8645 [1] commonw, prisma, time [2] lev, wordr, peos_q1 prisma, time_diff, peos_q2, n_subst_X_q2 [3] crf_ih, crf_iml, crf_imt [4] wordr, commonw, lev, inter_query_time, word_pov, prisma, crf_ih_state, crf_iml_state, crf_imt_state, word_suf
  55. Goal boundary P R F1 Baseline [1] 0.6601 0.5739 0.6140

    Baseline *[2] 0.7608 0.7869 0.7736 Our *[3] 0.7680 0.8247 0.7954 Fselect ( Baseline+Our )*[4] 0.7792 0.8247 0.8013 Mission boundary P R F1 Baseline [1] 0.9172 0.9375 0.9272 Baseline *[2] 0.9313 0.9250 0.9281 Our *[3] 0.9201 0.9298 0.9249 Fselect ( Baseline+Our )*[4] 0.9282 0.9692 0.9483 Goal boundary P R F1 Baseline [1] 0.2964 0.2573 0.2755 Baseline *[2] 0.5150 0.5317 0.5232 Our *[3] 0.5399 0.5798 0.5591 Fselect ( Baseline+Our )*[4] 0.5397 0.5712 0.5550 Mission boundary P R F1 Baseline [1] 0.8194 0.8375 0.8283 Baseline *[2] 0.8219 0.8163 0.8191 Our *[3] 0.8030 0.8115 0.8073 Fselect ( Baseline+Our )*[4] 0.8462 0.8837 0.8645 Sequence-based Token-based LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) * *
  56. Data Analysis q1 q2 Type1: Add Location The Diner 樂子美式餐廳

    (IH) The Diner 樂子美式餐廳 (IH) 科技大樓 (IM:Location) Same goal Informational Informational
  57. Data Analysis q1 q2 Type2: Add Intent Head 中山北路 (IM:Location)

    早午餐 (IM:Type) 中山北路 (IM:Location) 早午餐 (IM:Type) HANA BI (IH) Goal boundary Navigational Informational
  58. Data Analysis q2 q3 Type3: Add descriptions ( IH in

    q1 ) 洛城牛肉河粉 (IH) blog (O) 洛城牛肉河粉 (IH) blog (O) 推薦菜 (O) q1 洛城牛肉河粉 (IH) Same goal Informational Informational Informational
  59. Data Analysis q2 q3 Type4: Add descriptions ( without IH

    in q1 ) 永康街 (IM:Location) 懷舊料理 (IM:Type) 豐盛食堂 (IH) 永康街 (IM:Location) 懷舊料理 (IM:Type) 豐盛食堂 (IH) blog (O) Goal boundary q1 永康街 (IM:Location) 懷舊料理 (IM:Type) Navigational Informational Informational
  60. Conclusion A.  Intent tagging model •  Get well evaluation result

    to predict vertical search query intent, and also let search engine know what users wanted. Intent tagging predict goal/mission boundary •  It is not only improve 3% ~ 5% accuracy in both goal and mission boundary, but also tagging the search query in (IH, IM:Type, IM:Location or Others) in vertical search domain. 66
  61. Future Work A.  Intent tagging model •  Cause the price

    of manual tagging is high, and we would like to auto- generate answers to make it cost down. B.  Intent tagging predict goal/mission boundary •  Implement our method to other similar vertical search domain, for example: automobile, movie etc. 67
  62. Reference !   Rosie Jones and Kristina Lisa Klinkner, Beyond

    the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs, CIKM 2008 !   Xiao Li, Understanding the Semantic Structure of Noun Phrase Queries, ACL 2010 !   Alexander Kotov, Paul N Bennett, Ryen W White, Susan T Dumais, and Jaime Teevan, Modeling and Analysis of Cross-Session Search Tasks, SIGIR 2011 68
  63. Intent Tagging Evaluation L (Lexical) Y (Yahoo! Life+ Title Match)

    S (Syntactic) P R F1 P R F1 P R F1 IH Left .93780 .9502 .9440 .8822 .7940 .8358 .9121 .8594 .8850 Right .8869 .8482 .8671 .6622 .5923 .6253 .7956 .6626 .7230 All .8698 .8616 .8657 .6337 .5884 .6102 .7733 .6741 .7203 IM:Location Left .9771 .9765 .9768 .9225 .8735 .8973 .8929 .8702 .8814 Right .9543 .9424 .9483 .8117 .6941 .7483 .7940 .6931 .7402 All .9414 .9426 .9420 .7682 .6960 .7303 .7399 .6934 .7159 IM:Type Left .9519 .9470 .9495 .7325 .8804 .7997 .8614 .8186 .8394 Right .8923 .8390 .8648 .5626 .7005 .6240 .6784 .6303 .6534 All .8584 .8437 .8510 .4694 .7000 .5620 .6117 .6290 .6202 CRF++ 0.58 with 10 fold cross validation 70 Backup slides
  64. Intent Tagging Evaluation Acc F1 L + Y 0.8985 0.8910

    L + S 0.8972 0.8826 Y + S 0.7827 0.7471 L + Y + S 0.9038 0.8934 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic Backup slides
  65. Intent Tagging Analysis 中壢 (IM:Location) 赤坂拉麵 (IH) 燒鳥串燒 (IH) 台北

    (IM:Location) 師大 (IM:Location) 中式早餐 (IM:Type) 台北市忠孝東路四段 (IM:Location) 咖啡 (IM:Type) 台大 (IM:Location) 墨西哥菜 (IM:Type) 國父紀念館 (IM:Location) 中菜吃到飽 (IM:Type) 忠孝新生 (IM:Location) 懷舊餐廳 (IM:Type) 科技大樓 (IM:Location) 韓式燒肉 (IM:Type) 台大醫院 (IM:Location) 藝奇新日式料理 (IH) 台北101 (IM:Location) 石鍋拌飯 (IM:Type) query 72 Backup slides
  66. Goal Boundary Features Weight Feature Set Weight lev Word and

    Character Edit 0.5103 wordr Word and Character Edit 0.4660 word_pov Word and Character Edit 0.4185 commonw Word and Character Edit 0.3287 Prisma Web Search 0.1980 crf_ih_state Intent Tagging 0.1593 crf_imt_state Intent Tagging 0.1302 crf_iml_state Temporal 0.0835 crf_iml_state Intent Tagging 0.0511 pq12 Query Log Sequence 0.0279 Backup slides Goal boundary feature weight using fselect in libsvm
  67. Mission Boundary Features Weight Mission boundary feature weight using fselect

    in libsvm Feature Set Weight wordr Word and Character Edit 0.8284 commonw Word and Character Edit 0.7014 lev Word and Character Edit 0.6206 inter_query_time Temporal 0.4788 word_pov Word and Character Edit 0.4109 Prisma Web Search 0.1164 crf_iml_state Intent Tagging 0.0835 word_suf Word and Character Edit 0.0348 crf_ih_state Intent Tagging 0.0291 entropy_q1_X Query Log Sequence 0.0183 Backup slides
  68. Join Modeling mission mission mission goal goal goal goal goal

    Mission boundary Goal boundary Predict Backup slides
  69. Join Modeling mission mission mission goal goal goal goal Mission

    boundary Goal boundary Predict X •  Though the mission boundary is higher than goal boundary detection, so the result does not get improved goal Backup slides
  70. Goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout

    Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 •  A goal can be thought of as a group of related queries to accomplish a single discrete task. •  The queries need not be contiguous, but may be interleaved with queries from other goals A search goal is an atomic information need, resulting in one or more queries. Problem Definition Backup slides
  71. •  A mission is then an extended information need A

    search mission is a related set of information needs, resulting in one or more goals. Problem Definition Mission Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 Backup slides
  72. IM:Category IM:City IM:County IM:Employer IM:Level IM:Salary IM:State IM:Type Intent head

    Intent modifier Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides
  73. Transition: Transiting from state a to b Lexical: Current word

    Semantic: Current N-gram occurs in lexicon L (N=1~4) Syntactic: POS tag of the current word Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides
  74. Related Work Mentioned Goal and Mission !   Debora Donato

    et al. Do you want to take notes? Identifying research missions in Yahoo! Search Pad. WWW 2010. !   Claudio Lucchese, Salvatore Orlando et al. Identifying Task-based Sessions in Search Engine Query Logs. WSDM 2011. Backup slides
  75. Conditional Random Fields Feature 1 Feature 2 Feature 3 …

    Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Relation Label 1 Relation Label 2 Relation Label 3 Relation Label N Feature 1 Feature 2 Feature 3 … Backup slides
  76. Conditional Random Fields Feature 1 Feature 2 Feature 3 …

    Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Relation Label 1 Relation Label 2 Relation Label 3 Relation Label N Feature 1 Feature 2 Feature 3 … Backup slides
  77. Baseline CRF Format F1 F2 F3 F4 F5 … F1

    F2 F3 F4 F5 … F1 F2 F3 F4 F5 … F1 F2 F3 F4 F5 … ( q1 , q2 ) F1 F2 F3 F4 F5 … User 1 User 2 … Label Features O B I O B ( q2 , q3 ) ( q3 , q4 ) ( q4 , q5 ) ( q5 , q6 ) SPACE Backup slides
  78. answer predict IH IM:Type IH IM:Type Jin-Dong KIM, Tomoko OHTA

    et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 85 Backup slides
  79. IH IM:Type IH IM:Type answer predict Left boundary Left boundary

    Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 86 Backup slides
  80. Intent Tagging Evaluation IH IM:Type IH IM:Type answer predict IM:Type

    Right boundary Right boundary Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 87 Backup slides
  81. IH IM:Type IH IM:Type Intent Tagging Evaluation answer predict Complete

    match Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 88 Backup slides
  82. Intent Tagging Evaluation L + Y L + S Y

    + S L + Y + S P R F1 P R F1 P R F1 P R F1 IH Left .9540 .9521 .9530 .9510 .9397 .9453 .9171 .8591 .8872 .9482 .9445 .9464 Right .8930 .8522 .8721 .8861 .8710 .8785 .8221 .6960 .7538 .8972 .8667 .8817 All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location Left .9785 .9750 .9768 .9759 .9741 .9750 .9472 .9415 .9443 .9808 .9781 .9795 Right .9568 .9355 .9460 .9560 .9398 .9478 .8602 .8320 .8458 .9650 .9393 .9520 All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type Left .9481 .9543 .9512 .9499 .9465 .9482 .8639 .8441 .8539 .9503 .9525 .9514 Right .8876 .8598 .8735 .8891 .8477 .8679 .7221 .6699 .6951 .8797 .8595 .8695 All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 89 Backup slides
  83. Conditional Random Fields !   Conditional Random Fields (CRFs) are

    a class of statistical modeling method often applied in pattern recognition and machine learning, where they are used for structured prediction. CRF++, http://crfpp.googlecode.com/svn/trunk/doc/index.html Backup slides
  84. Goal / Mission Boundary Detection Real Case Query 1 Query

    2 IH IM:Location IM:Type Real Predict Real Predict Real Predict 燒鳥 台北 燒鳥串燒 台北 MODIFY MODIFY SAME SAME NONE DELETE 燒鳥串燒 台北 師大 早餐 DELETE DELETE NEW NEW INSERT INSERT 師大 早餐 師大 中式早餐 NONE NONE SAME SAME MODIFY MODIFY 中壢 拉麵 中壢 赤坂拉麵 INSERT INSERT SAME SAME DELETE DELETE 中壢 赤坂拉麵 中壢 赤坂拉麵 時間 SAME SAME SAME SAME DELETE NEW 中壢 拉麵 推薦 中壢 伊太郎 INSERT INSERT SAME SAME DELETE NEW 中壢 伊太郎 中壢 伊太郎 推 薦 SAME SAME SAME SAME NONE DELETE 中壢 伊太郎 推薦 風車 故鄉餐廳 時間 NEW NEW DELETE NEW NONE INSERT 內湖 水鳥22 法式 小館 桃園 川菜 DELETE DELETE NEW NEW INSERT NEW 桃園 川菜 桃園 福利川菜 INSERT NONE SAME SAME DELETE MODIFY Backup slides
  85. Intent Tagging Confusion Matrix IH IM:Location IM:Type O B I

    B I B I O IH B 871 41 56 1 57 8 38 I 40 3942 52 316 1 101 102 IM:Location B 63 33 746 1 9 8 15 I 1 249 8 1689 0 14 31 IM:Type B 55 5 4 0 1309 16 7 I 0 117 0 6 11 2332 18 O O 31 161 31 45 21 41 4240 predicted class actual class Using L + Y + S feature combination Backup slides
  86. 2. Given a multi-query task for a user, predict whether

    the user will return to this task in the future 1.  Given a user query, identify all related queries from previous sessions that the user has issued Task Continuation Alexander Kotov, Paul N. Bennett, Ryen W. White, Susan T. Dumais, and Jaime Teevan. Modeling and Analysis of Cross-Session Search Tasks. SIGIR 2011 93 Same Task Related Research Related Work Backup slides