Slide 1

Slide 1 text

Exploiting Intent Tagging Information to Improve Vertical Search Query Intent Segmentation

Slide 2

Slide 2 text

Overview Problem Definition Related work System overview Experiment Error analysis Conclusion

Slide 3

Slide 3 text

Problem Definition 3 q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理

Slide 4

Slide 4 text

Problem Definition How to describe user intent change or not q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 4

Slide 5

Slide 5 text

Daniel E Rose and Danny Levinson Understanding User Goals in Web Search WWW 2004 Search Intent Classification 5 Informational Navigational Transactional 國父紀念館 日本料理 築地鮮魚 台北 築地鮮魚線上定位 Query Related Work

Slide 6

Slide 6 text

Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 Related Work Identifying Goals and Missions session 6

Slide 7

Slide 7 text

Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search mission is a related set of information needs, resulting in one or more goals. session mission mission mission 7

Slide 8

Slide 8 text

Related Work Identifying Goals and Missions session mission mission mission goal goal goal goal goal goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search goal is an atomic information need, resulting in one or more queries. A search mission is a related set of information needs, resulting in one or more goals. 8

Slide 9

Slide 9 text

q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 9

Slide 10

Slide 10 text

Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 goal q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 10

Slide 11

Slide 11 text

Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 mission goal q1 q2 q3 築地鮮魚 台北 築地鮮魚 台北 日本料理 國父紀念館 日本料理 11

Slide 12

Slide 12 text

Our Proposal We propose a new features set to improve the goal and mission boundary detection in vertical search domain 12 q q q q q q q

Slide 13

Slide 13 text

Our Proposal We propose a new features set to improve the goal and mission boundary detection in vertical search domain 13 Goal Goal Goal Goal Goal Mission Mission Mission q q q q q q q

Slide 14

Slide 14 text

Reasons for Goal/Mission Hierarchical Structure !   Evaluation the importance websites to users !   Evaluation the accuracy of search engine 14

Slide 15

Slide 15 text

Reasons for Goal/Mission Hierarchical Structure Goal 1 Goal 2 Goal 3 Mission 1 A A B A !   Evaluation the importance websites to users site A is more important than site B 15

Slide 16

Slide 16 text

Reasons for Goal/Mission Hierarchical Structure Goal 1 Goal 2 Goal 3 Mission 1 Goal 4 Goal 1 Goal 2 Mission 1 !   Evaluation the accuracy of search engine Engine A Engine B < 16

Slide 17

Slide 17 text

Intent Tagging System Overview Query log Preprocessing Statement Transition 17

Slide 18

Slide 18 text

System Overview Baseline Rosie Jones, Kristina Lisa Klinkner.Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 18 Intent Tagging Query log Preprocessing Statement Transition

Slide 19

Slide 19 text

Preprocessing Query A Query B Query A Query A Query B Query A Query B Remove duplicate queries Preprocessing Intent tagging 19

Slide 20

Slide 20 text

… Query log Query pairs Preprocessing Query 1 Query 2 Query 3 Query 4 Preprocessing Intent tagging 20

Slide 21

Slide 21 text

… Query log Query pairs Preprocessing Query 1 Query 2 Query 3 Query 4 Query 1 Query 2 Query 2 Query 3 Query 3 Query 4 Query 4 Query 5 Preprocessing Intent tagging 21

Slide 22

Slide 22 text

Intent Head (IH) Intent Modifier (IM) 公館韓式燒肉 愛樂廚房 Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 Preprocessing Intent tagging 22 •  IH •  IM:Location •  IM:Type Intent Tagging

Slide 23

Slide 23 text

Intent Head (IH) Intent Modifier (IM) 公館韓式燒肉 愛樂廚房 Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 Preprocessing Intent tagging •  IH = 愛樂廚房 •  IM:Location = 公館 •  IM:Type = 韓式燒肉 23 Intent Tagging

Slide 24

Slide 24 text

•  Lexical: Current word •  Syntactic: POS tag of the current word using CKIP (中文斷詞系統) •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Preprocessing Intent tagging 24 Intent Tagging

Slide 25

Slide 25 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Lexical 公 館 韓 式 燒 肉 愛 樂 廚 房 •  Syntactic 公館 (N) 韓式燒肉 (N) 愛樂廚房 (N) •  Yahoo! Life+ Title Match 公館韓式燒肉 (Y) 愛樂廚房 (Y) 25 •  Lexical: Current word •  Syntactic: POS tag of the current word using CKIP (中文斷詞系統) •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Intent Tagging

Slide 26

Slide 26 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 26 Intent Tagging

Slide 27

Slide 27 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 公 館 韓 式 燒 肉 愛 樂 廚 房 公館 館韓 韓式 式燒 燒肉 愛樂 樂廚 廚房 公館韓 館韓式 韓式燒 式燒肉 愛樂廚 樂廚房 公館韓式 館韓式燒 韓式燒肉 愛樂廚房 公館韓式燒 館韓式燒肉 公館韓式燒肉 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 27 Intent Tagging

Slide 28

Slide 28 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 公 館 韓 式 燒 肉 愛 樂 廚 房 公館 館韓 韓式 式燒 燒肉 愛樂 樂廚 廚房 公館韓 館韓式 韓式燒 式燒肉 愛樂廚 樂廚房 公館韓式 館韓式燒 韓式燒肉 愛樂廚房 公館韓式燒 館韓式燒肉 公館韓式燒肉 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 28 Intent Tagging

Slide 29

Slide 29 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 29 Intent Tagging

Slide 30

Slide 30 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳 金山鴨肉 彰化肉圓 萬巒豬腳大王 30 Intent Tagging

Slide 31

Slide 31 text

Preprocessing Intent tagging 公館韓式燒肉 愛樂廚房 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉 公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳 金山鴨肉 彰化肉圓 萬巒豬腳大王 Yahoo Life+ Database query Taiwan Road & Street Database 31 Intent Tagging

Slide 32

Slide 32 text

公 B-N B-Y B-IM:L 館 I-N I-Y I-IM:L 韓 B-N I-Y B-IM:T 式 I-N I-Y I-IM:T 燒 I-N I-Y I-IM:T 肉 I-N I-Y I-IM:T 愛 B-N B-Y B-IH 樂 I-N I-Y I-IH 廚 I-N I-Y I-IH 房 I-N I-Y I-IH 公館韓式燒肉 愛樂廚房 CRF++ 0.58 bi-gram Preprocessing Intent tagging 32 Intent Tagging

Slide 33

Slide 33 text

Collecting Dataset Yahoo! 生活+ http://tw.ipeen.lifestyle.yahoo.net/ 33 Our vertical domain

Slide 34

Slide 34 text

Collecting Dataset 店名 找 具備 條件 的 資訊 Method 1: Find Target Method 2: Search from Location 地點 找 附近的 餐廳類型 具備 條件 User search hint generation 34

Slide 35

Slide 35 text

Collecting Dataset - Method 1: Find Target 地址 營業時間 假日有沒有開 招牌菜 負評 好評 聯絡電話 評論資訊充足 附近可以續攤 交通方便 營業時間晚 不要太吵雜 氣氛裝潢好 愛評網前 700 熱門 店家 Random select 店名 找 具備 條件 的 資訊 35

Slide 36

Slide 36 text

Collecting Dataset - Method 2: Search from Location 評論資訊充足 附近可以續攤 交通方便 營業時間晚 不要太吵雜 氣氛裝潢好 中式料理 日式料理 粵菜 港式飲茶 熱炒 四川菜 客家菜 海鮮餐廳 居酒屋 日式拉麵 … 愛評網熱 門地區 Random select 地點 找 附近的 餐廳類型 具備 條件 36

Slide 37

Slide 37 text

Collecting Dataset Method 1: Find Target Method 2: Search from Location * 5000 * 5000 37

Slide 38

Slide 38 text

.. Collecting Dataset Method 1: Find Target Method 2: Search from Location •  我想找「老乾杯(延吉店)」,我想知道 「招牌菜」,條件是「交通要方便」, 不然需要備案 •  我想找「築地鮮魚」,我想知道「有沒 有負評」,條件是「附近可以續攤」, 不然需要備案 •  我想找「永康街芋頭大王」,我想知道 「有沒有好評」,條件是「要多找點資 訊」,不然需要備案 •  找「松山機場」附近的「運動主題餐 廳」,我想知道「週六、週日有沒有 開」,如果沒有我也需要其他備案 •  找「台北新光山越站前」附近的「懷石 料理」,條件是「環境不要太吵雜」, 我想知道「週六、週日有沒有開」,如 果沒有我也需要其他備案 … … 38

Slide 39

Slide 39 text

Collecting Dataset User query log Search hint User query Yahoo! 生活+ http://tw.ipeen.lifestyle.yahoo.net/ 39

Slide 40

Slide 40 text

Collecting Dataset User query log Search hint User query user 40 q1 q2 q3 台北車站 窯烤 pizza 台北車站 BELLINI PASTA PASTA 台北車站 pizza

Slide 41

Slide 41 text

Collecting Dataset User query log Search hint User query user 41 q1 q2 q3 台北車站 窯烤 pizza 台北車站 BELLINI PASTA PASTA 台北車站 pizza

Slide 42

Slide 42 text

Dataset Statistics General Stats Total users 54 Total user search log 2008 Total goals 1275 Total missions 712 Goal Stats Mission Stats Avg queries of each goal 1.5699 Avg queries of each mission 2.8101 Max queries of each goal 7 Max queries of each mission 11 Min queries of each goal 1 Min queries of each mission 1 42

Slide 43

Slide 43 text

Intent Tagging Evaluation L + Y L + S Y + S L + Y + S P R F1 P R F1 P R F1 P R F1 IH All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 43

Slide 44

Slide 44 text

Data Analysis for Wrong Prediction 木柵 (IM:Location) 壽司專賣山中屋壽司酒房 (IH) 新竹 (IM:Location) 懷石料理竹北 (IM:Location) 中山北路 (IM:Location) 羊肉爐台北 (IM:Location) 高雄 (IM:Location) 左營樂品咖啡館 (IH) 師大 (IM:Location) 熱炒海鮮尊王活海鮮小吃店 (IH) 內湖 (IM:Location) 拉麵六丁目拉麵 (IM:Type) 居酒屋 (IM:Type) 風林火山blog (O) 44 Wrong Combination Wrong Segmentation 德記雲吞 (IH) 茶 (IM:Type) 館 (IH) 絲襪奶茶 (IM:Type)

Slide 45

Slide 45 text

Intent tagging Statement Transition Statement Transition 45 Goal boundary Mission boundary IH IM:L IM:T Is the intent tag changed ? q1 q2

Slide 46

Slide 46 text

NONE!!!!!"!!! = !! ∧!!! = !! `" DELETE!!!!!"!!! ≠ !! ∧!!! =!!" MODIFY!!!!!"!!! ≠ !! ∧!!! ≠!!! ∧ !!"##"$%!!"#$%&'( !! , !!! ≤ 0.5! NEW!!!!!"!!! ≠ !! ∧!!! ≠!!! ∧ !!"##"$%!!"#$%&'( !! , !!! > 0.5! SAME!!!!!"!!! ≠ !! ∧!!! ≠ !! ∧!!! = !! !! `" INSERT!!!!!"!!! = !! ∧!!! ≠!!" T !! , !! =" Intent tagging Statement Transition Statement Transition q1 q2 46

Slide 47

Slide 47 text

IM:T IM:L IM:L IH = DELETE IM:L = SAME IH = NEW 台北車站 LUCCA PASTA 台北車站 義大利麵 •  INSERT Target intent only appeared in q2 •  DELETE Target intent has been deleted in q2 •  NEW Target intent has been changed if Jaccard distance>=0.5 •  MODIFY Target intent has been changed if Jaccard distance<0.5 •  SAME Target intent is equal in both q1 and q2 •  NONE Target intent has not been tagged in both q1 and q2 Intent tagging Statement Transition Statement Transition 47 IM:T q1 q2

Slide 48

Slide 48 text

公館 麻辣鍋 公館 麻辣鍋 天麻 •  INSERT Target intent only appeared in q2 •  DELETE Target intent has been deleted in q2 •  NEW Target intent has been changed if Jaccard distance>=0.5 •  MODIFY Target intent has been changed if Jaccard distance<0.5 •  SAME Target intent is equal in both q1 and q2 •  NONE Target intent has not been tagged in both q1 and q2 IM:T = SAME IM:L = SAME IH = INSERT IM:L IM:L IM:T IM:T Intent tagging Statement Transition Statement Transition 48 IH q1 q2

Slide 49

Slide 49 text

Temporal Features Word and Character Edit Features Query Log Sequence Features Web Search Features •  inter-query time •  time diff •  sequential-queries •  lev •  word_pov •  word_suf •  nsubst_X_q1 •  nsubst_X_q2 •  nsubst_q2_X •  p_change •  pq12 •  entropy_X_q1 •  entropy_q1_X •  prisma •  commonw •  wordr Baseline 49 Preprocessing Intent tagging baseline

Slide 50

Slide 50 text

Temporal Features q1 q2 q2 q3 q3 q4 •  inter-query time = threshold as a binary feature (5 mins, 30 mins, 60 mins, 120 mins) •  time diff = inter-query time in seconds •  sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user 50 Preprocessing Intent tagging baseline

Slide 51

Slide 51 text

•  inter-query time = threshold as a binary feature (5 mins, 30 mins, 60 mins, 120 mins) •  time diff = inter-query time in seconds •  sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user Temporal Features 築地鮮魚 台北 築地鮮魚 台北 blog 築地鮮魚 台北 blog 大大茶樓 南京 大大茶樓 南京 築地鮮魚 台北 blog 51 Preprocessing Intent tagging baseline

Slide 52

Slide 52 text

Word and Character Edit Features •  lev = normalized Levenshtein distance •  word_pov = num. characters in common starting from the left •  word_suf = num. characters in common starting from the right •  commonw = num. words in common •  wordr = jaccard distance between sets of words q1 q2 52 Preprocessing Intent tagging baseline

Slide 53

Slide 53 text

Word and Character Edit Features 新竹 冰淇淋餐廳 新竹 冰淇淋餐廳 莫凡彼 •  lev = 3 •  word_pov = 7 •  word_suf = 0 •  commonw = 7 •  wordr = 0.7 53 Preprocessing Intent tagging baseline

Slide 54

Slide 54 text

Query Log Sequence Features !(!! → !! )/!"#!" (!! → !! )! •  pq12 = !(!! |!! ) ! !"#! (!(!! |!! ))! •  entropy_X_q1 = ! !! !! ! !"#! ! !! !! ! •  entropy_q1_X = !"#$% !∃! ! → !! ! •  nsubst_X_q1 = !"#$% !∃! ! → !! ! •  nsubst_X_q2 = !"#$% !∃! !! → ! ! •  nsubst_q2_X = !! → !! : !! ≠ !! ! •  p_change = q1 q2 qj qi 54 Preprocessing Intent tagging baseline

Slide 55

Slide 55 text

! Prisma = cosine distance between vectors derived from the first 50 search results for the query terms Web Search Features Yahoo! BOSS Search API, http://developer.yahoo.com/boss/search/ 55 Preprocessing Intent tagging baseline

Slide 56

Slide 56 text

Goal / Mission Boundary Detection B I B B I I Sequence Based q1 q2 q3 q4 q5 q6 Token Based

Slide 57

Slide 57 text

Token Based – Goal Boundary Detection LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) 57 Goal boundary P R F1 Baseline [1] 0.6601 0.5739 0.6140 Baseline *[2] 0.7608 0.7869 0.7736 Our *[3] 0.7680 0.8247 0.7954 Fselect ( Baseline+Our )*[4] 0.7792 0.8247 0.8013 [1] commonw, prisma, time [2] lev, prisma, wordr, word_suf, word_pov, n_subst_X_q1 [3] crf_ih, crf_iml, crf_imt [4] lev, wordr, word_pov, commonw, prisma, crf_ih_state, crf_imt_state, crf_iml_state, pq12 *

Slide 58

Slide 58 text

Token Based – Mission Boundary Detection [1] commonw, prisma, time [2] lev, wordr, peos_q1 prisma, time_diff, peos_q2, n_subst_X_q2 [3] crf_ih, crf_iml, crf_imt [4] wordr, commonw, lev, inter_query_time, word_pov, prisma, crf_ih_state, crf_iml_state, crf_imt_state, word_suf 58 Mission boundary P R F1 Baseline [1] 0.9172 0.9375 0.9272 Baseline *[2] 0.9313 0.9250 0.9281 Our *[3] 0.9201 0.9298 0.9249 Fselect ( Baseline+Our )*[4] 0.9282 0.9692 0.9483 LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation )

Slide 59

Slide 59 text

Sequence Based – Goal Boundary Detection LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) [1] commonw, prisma, time [2] lev, prisma, wordr, word_suf, word_pov, n_subst_X_q1 [3] crf_ih, crf_iml, crf_imt [4] lev, wordr, word_pov, commonw, prisma, crf_ih_state, crf_imt_state, crf_iml_state, pq12 59 Goal boundary P R F1 Baseline [1] 0.2964 0.2573 0.2755 Baseline *[2] 0.5150 0.5317 0.5232 Our *[3] 0.5399 0.5798 0.5591 Fselect ( Baseline+Our )*[4] 0.5397 0.5712 0.5550 *

Slide 60

Slide 60 text

Sequence Based – Mission Boundary Detection LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) 60 Mission boundary P R F1 Baseline [1] 0.8194 0.8375 0.8283 Baseline *[2] 0.8219 0.8163 0.8191 Our *[3] 0.8030 0.8115 0.8073 Fselect ( Baseline+Our )*[4] 0.8462 0.8837 0.8645 [1] commonw, prisma, time [2] lev, wordr, peos_q1 prisma, time_diff, peos_q2, n_subst_X_q2 [3] crf_ih, crf_iml, crf_imt [4] wordr, commonw, lev, inter_query_time, word_pov, prisma, crf_ih_state, crf_iml_state, crf_imt_state, word_suf

Slide 61

Slide 61 text

Goal boundary P R F1 Baseline [1] 0.6601 0.5739 0.6140 Baseline *[2] 0.7608 0.7869 0.7736 Our *[3] 0.7680 0.8247 0.7954 Fselect ( Baseline+Our )*[4] 0.7792 0.8247 0.8013 Mission boundary P R F1 Baseline [1] 0.9172 0.9375 0.9272 Baseline *[2] 0.9313 0.9250 0.9281 Our *[3] 0.9201 0.9298 0.9249 Fselect ( Baseline+Our )*[4] 0.9282 0.9692 0.9483 Goal boundary P R F1 Baseline [1] 0.2964 0.2573 0.2755 Baseline *[2] 0.5150 0.5317 0.5232 Our *[3] 0.5399 0.5798 0.5591 Fselect ( Baseline+Our )*[4] 0.5397 0.5712 0.5550 Mission boundary P R F1 Baseline [1] 0.8194 0.8375 0.8283 Baseline *[2] 0.8219 0.8163 0.8191 Our *[3] 0.8030 0.8115 0.8073 Fselect ( Baseline+Our )*[4] 0.8462 0.8837 0.8645 Sequence-based Token-based LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) * *

Slide 62

Slide 62 text

Data Analysis q1 q2 Type1: Add Location The Diner 樂子美式餐廳 (IH) The Diner 樂子美式餐廳 (IH) 科技大樓 (IM:Location) Same goal Informational Informational

Slide 63

Slide 63 text

Data Analysis q1 q2 Type2: Add Intent Head 中山北路 (IM:Location) 早午餐 (IM:Type) 中山北路 (IM:Location) 早午餐 (IM:Type) HANA BI (IH) Goal boundary Navigational Informational

Slide 64

Slide 64 text

Data Analysis q2 q3 Type3: Add descriptions ( IH in q1 ) 洛城牛肉河粉 (IH) blog (O) 洛城牛肉河粉 (IH) blog (O) 推薦菜 (O) q1 洛城牛肉河粉 (IH) Same goal Informational Informational Informational

Slide 65

Slide 65 text

Data Analysis q2 q3 Type4: Add descriptions ( without IH in q1 ) 永康街 (IM:Location) 懷舊料理 (IM:Type) 豐盛食堂 (IH) 永康街 (IM:Location) 懷舊料理 (IM:Type) 豐盛食堂 (IH) blog (O) Goal boundary q1 永康街 (IM:Location) 懷舊料理 (IM:Type) Navigational Informational Informational

Slide 66

Slide 66 text

Conclusion A.  Intent tagging model •  Get well evaluation result to predict vertical search query intent, and also let search engine know what users wanted. Intent tagging predict goal/mission boundary •  It is not only improve 3% ~ 5% accuracy in both goal and mission boundary, but also tagging the search query in (IH, IM:Type, IM:Location or Others) in vertical search domain. 66

Slide 67

Slide 67 text

Future Work A.  Intent tagging model •  Cause the price of manual tagging is high, and we would like to auto- generate answers to make it cost down. B.  Intent tagging predict goal/mission boundary •  Implement our method to other similar vertical search domain, for example: automobile, movie etc. 67

Slide 68

Slide 68 text

Reference !   Rosie Jones and Kristina Lisa Klinkner, Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs, CIKM 2008 !   Xiao Li, Understanding the Semantic Structure of Noun Phrase Queries, ACL 2010 !   Alexander Kotov, Paul N Bennett, Ryen W White, Susan T Dumais, and Jaime Teevan, Modeling and Analysis of Cross-Session Search Tasks, SIGIR 2011 68

Slide 69

Slide 69 text

Q / A

Slide 70

Slide 70 text

Intent Tagging Evaluation L (Lexical) Y (Yahoo! Life+ Title Match) S (Syntactic) P R F1 P R F1 P R F1 IH Left .93780 .9502 .9440 .8822 .7940 .8358 .9121 .8594 .8850 Right .8869 .8482 .8671 .6622 .5923 .6253 .7956 .6626 .7230 All .8698 .8616 .8657 .6337 .5884 .6102 .7733 .6741 .7203 IM:Location Left .9771 .9765 .9768 .9225 .8735 .8973 .8929 .8702 .8814 Right .9543 .9424 .9483 .8117 .6941 .7483 .7940 .6931 .7402 All .9414 .9426 .9420 .7682 .6960 .7303 .7399 .6934 .7159 IM:Type Left .9519 .9470 .9495 .7325 .8804 .7997 .8614 .8186 .8394 Right .8923 .8390 .8648 .5626 .7005 .6240 .6784 .6303 .6534 All .8584 .8437 .8510 .4694 .7000 .5620 .6117 .6290 .6202 CRF++ 0.58 with 10 fold cross validation 70 Backup slides

Slide 71

Slide 71 text

Intent Tagging Evaluation Acc F1 L + Y 0.8985 0.8910 L + S 0.8972 0.8826 Y + S 0.7827 0.7471 L + Y + S 0.9038 0.8934 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic Backup slides

Slide 72

Slide 72 text

Intent Tagging Analysis 中壢 (IM:Location) 赤坂拉麵 (IH) 燒鳥串燒 (IH) 台北 (IM:Location) 師大 (IM:Location) 中式早餐 (IM:Type) 台北市忠孝東路四段 (IM:Location) 咖啡 (IM:Type) 台大 (IM:Location) 墨西哥菜 (IM:Type) 國父紀念館 (IM:Location) 中菜吃到飽 (IM:Type) 忠孝新生 (IM:Location) 懷舊餐廳 (IM:Type) 科技大樓 (IM:Location) 韓式燒肉 (IM:Type) 台大醫院 (IM:Location) 藝奇新日式料理 (IH) 台北101 (IM:Location) 石鍋拌飯 (IM:Type) query 72 Backup slides

Slide 73

Slide 73 text

Goal Boundary Features Weight Feature Set Weight lev Word and Character Edit 0.5103 wordr Word and Character Edit 0.4660 word_pov Word and Character Edit 0.4185 commonw Word and Character Edit 0.3287 Prisma Web Search 0.1980 crf_ih_state Intent Tagging 0.1593 crf_imt_state Intent Tagging 0.1302 crf_iml_state Temporal 0.0835 crf_iml_state Intent Tagging 0.0511 pq12 Query Log Sequence 0.0279 Backup slides Goal boundary feature weight using fselect in libsvm

Slide 74

Slide 74 text

Mission Boundary Features Weight Mission boundary feature weight using fselect in libsvm Feature Set Weight wordr Word and Character Edit 0.8284 commonw Word and Character Edit 0.7014 lev Word and Character Edit 0.6206 inter_query_time Temporal 0.4788 word_pov Word and Character Edit 0.4109 Prisma Web Search 0.1164 crf_iml_state Intent Tagging 0.0835 word_suf Word and Character Edit 0.0348 crf_ih_state Intent Tagging 0.0291 entropy_q1_X Query Log Sequence 0.0183 Backup slides

Slide 75

Slide 75 text

Join Modeling mission mission mission goal goal goal goal goal Mission boundary Goal boundary Predict Backup slides

Slide 76

Slide 76 text

Join Modeling mission mission mission goal goal goal goal Mission boundary Goal boundary Predict X •  Though the mission boundary is higher than goal boundary detection, so the result does not get improved goal Backup slides

Slide 77

Slide 77 text

Goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 •  A goal can be thought of as a group of related queries to accomplish a single discrete task. •  The queries need not be contiguous, but may be interleaved with queries from other goals A search goal is an atomic information need, resulting in one or more queries. Problem Definition Backup slides

Slide 78

Slide 78 text

•  A mission is then an extended information need A search mission is a related set of information needs, resulting in one or more goals. Problem Definition Mission Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 Backup slides

Slide 79

Slide 79 text

IM:Category IM:City IM:County IM:Employer IM:Level IM:Salary IM:State IM:Type Intent head Intent modifier Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides

Slide 80

Slide 80 text

Transition: Transiting from state a to b Lexical: Current word Semantic: Current N-gram occurs in lexicon L (N=1~4) Syntactic: POS tag of the current word Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides

Slide 81

Slide 81 text

Related Work Mentioned Goal and Mission !   Debora Donato et al. Do you want to take notes? Identifying research missions in Yahoo! Search Pad. WWW 2010. !   Claudio Lucchese, Salvatore Orlando et al. Identifying Task-based Sessions in Search Engine Query Logs. WSDM 2011. Backup slides

Slide 82

Slide 82 text

Conditional Random Fields Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Relation Label 1 Relation Label 2 Relation Label 3 Relation Label N Feature 1 Feature 2 Feature 3 … Backup slides

Slide 83

Slide 83 text

Conditional Random Fields Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Relation Label 1 Relation Label 2 Relation Label 3 Relation Label N Feature 1 Feature 2 Feature 3 … Backup slides

Slide 84

Slide 84 text

Baseline CRF Format F1 F2 F3 F4 F5 … F1 F2 F3 F4 F5 … F1 F2 F3 F4 F5 … F1 F2 F3 F4 F5 … ( q1 , q2 ) F1 F2 F3 F4 F5 … User 1 User 2 … Label Features O B I O B ( q2 , q3 ) ( q3 , q4 ) ( q4 , q5 ) ( q5 , q6 ) SPACE Backup slides

Slide 85

Slide 85 text

answer predict IH IM:Type IH IM:Type Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 85 Backup slides

Slide 86

Slide 86 text

IH IM:Type IH IM:Type answer predict Left boundary Left boundary Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 86 Backup slides

Slide 87

Slide 87 text

Intent Tagging Evaluation IH IM:Type IH IM:Type answer predict IM:Type Right boundary Right boundary Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 87 Backup slides

Slide 88

Slide 88 text

IH IM:Type IH IM:Type Intent Tagging Evaluation answer predict Complete match Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 88 Backup slides

Slide 89

Slide 89 text

Intent Tagging Evaluation L + Y L + S Y + S L + Y + S P R F1 P R F1 P R F1 P R F1 IH Left .9540 .9521 .9530 .9510 .9397 .9453 .9171 .8591 .8872 .9482 .9445 .9464 Right .8930 .8522 .8721 .8861 .8710 .8785 .8221 .6960 .7538 .8972 .8667 .8817 All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location Left .9785 .9750 .9768 .9759 .9741 .9750 .9472 .9415 .9443 .9808 .9781 .9795 Right .9568 .9355 .9460 .9560 .9398 .9478 .8602 .8320 .8458 .9650 .9393 .9520 All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type Left .9481 .9543 .9512 .9499 .9465 .9482 .8639 .8441 .8539 .9503 .9525 .9514 Right .8876 .8598 .8735 .8891 .8477 .8679 .7221 .6699 .6951 .8797 .8595 .8695 All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 89 Backup slides

Slide 90

Slide 90 text

Conditional Random Fields !   Conditional Random Fields (CRFs) are a class of statistical modeling method often applied in pattern recognition and machine learning, where they are used for structured prediction. CRF++, http://crfpp.googlecode.com/svn/trunk/doc/index.html Backup slides

Slide 91

Slide 91 text

Goal / Mission Boundary Detection Real Case Query 1 Query 2 IH IM:Location IM:Type Real Predict Real Predict Real Predict 燒鳥 台北 燒鳥串燒 台北 MODIFY MODIFY SAME SAME NONE DELETE 燒鳥串燒 台北 師大 早餐 DELETE DELETE NEW NEW INSERT INSERT 師大 早餐 師大 中式早餐 NONE NONE SAME SAME MODIFY MODIFY 中壢 拉麵 中壢 赤坂拉麵 INSERT INSERT SAME SAME DELETE DELETE 中壢 赤坂拉麵 中壢 赤坂拉麵 時間 SAME SAME SAME SAME DELETE NEW 中壢 拉麵 推薦 中壢 伊太郎 INSERT INSERT SAME SAME DELETE NEW 中壢 伊太郎 中壢 伊太郎 推 薦 SAME SAME SAME SAME NONE DELETE 中壢 伊太郎 推薦 風車 故鄉餐廳 時間 NEW NEW DELETE NEW NONE INSERT 內湖 水鳥22 法式 小館 桃園 川菜 DELETE DELETE NEW NEW INSERT NEW 桃園 川菜 桃園 福利川菜 INSERT NONE SAME SAME DELETE MODIFY Backup slides

Slide 92

Slide 92 text

Intent Tagging Confusion Matrix IH IM:Location IM:Type O B I B I B I O IH B 871 41 56 1 57 8 38 I 40 3942 52 316 1 101 102 IM:Location B 63 33 746 1 9 8 15 I 1 249 8 1689 0 14 31 IM:Type B 55 5 4 0 1309 16 7 I 0 117 0 6 11 2332 18 O O 31 161 31 45 21 41 4240 predicted class actual class Using L + Y + S feature combination Backup slides

Slide 93

Slide 93 text

2. Given a multi-query task for a user, predict whether the user will return to this task in the future 1.  Given a user query, identify all related queries from previous sessions that the user has issued Task Continuation Alexander Kotov, Paul N. Bennett, Ryen W. White, Susan T. Dumais, and Jaime Teevan. Modeling and Analysis of Cross-Session Search Tasks. SIGIR 2011 93 Same Task Related Research Related Work Backup slides

Slide 94

Slide 94 text

30-Sets Train/Test Statistics Evaluation Independent Set (fsselect) Testing Testing Training 30-fold … 1 4 :

Slide 95

Slide 95 text

30-Sets Train/Test Statistics Evaluation Independent Set (fsselect) Testing Testing Training 30-fold … 1 4 :