Exploiting Intent Tagging Information to Improve Vertical Search Query Intent Segmentation

Exploiting Intent Tagging Information to Improve Vertical Search Query Intent
Segmentation

Overview Problem Definition Related work System overview Experiment Error analysis
Conclusion

Problem Definition 3 q1 q2 q3 築地鮮魚台北築地鮮魚台北
日本料理國父紀念館日本料理

Problem Definition How to describe user intent change or not
q1 q2 q3 築地鮮魚台北築地鮮魚台北日本料理國父紀念館日本料理 4

Daniel E Rose and Danny Levinson Understanding User Goals in
Web Search WWW 2004 Search Intent Classification 5 Informational Navigational Transactional 國父紀念館日本料理築地鮮魚台北築地鮮魚線上定位 Query Related Work

Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic
Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 Related Work Identifying Goals and Missions session 6

Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa
Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search mission is a related set of information needs, resulting in one or more goals. session mission mission mission 7

Related Work Identifying Goals and Missions session mission mission mission
goal goal goal goal goal goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 A search goal is an atomic information need, resulting in one or more queries. A search mission is a related set of information needs, resulting in one or more goals. 8

q1 q2 q3 築地鮮魚台北築地鮮魚台北日本料理國父紀念館日本料理
Related Work Identifying Goals and Missions Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 9

Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 goal q1 q2 q3 築地鮮魚台北築地鮮魚台北日本料理國父紀念館日本料理 10

Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs CIKM 2008 mission goal q1 q2 q3 築地鮮魚台北築地鮮魚台北日本料理國父紀念館日本料理 11

Our Proposal We propose a new features set to improve
the goal and mission boundary detection in vertical search domain 12 q q q q q q q

Our Proposal We propose a new features set to improve
the goal and mission boundary detection in vertical search domain 13 Goal Goal Goal Goal Goal Mission Mission Mission q q q q q q q

Reasons for Goal/Mission Hierarchical Structure !   Evaluation the importance
websites to users !   Evaluation the accuracy of search engine 14

Reasons for Goal/Mission Hierarchical Structure Goal 1 Goal 2 Goal
3 Mission 1 A A B A !   Evaluation the importance websites to users site A is more important than site B 15

Reasons for Goal/Mission Hierarchical Structure Goal 1 Goal 2 Goal
3 Mission 1 Goal 4 Goal 1 Goal 2 Mission 1 !   Evaluation the accuracy of search engine Engine A Engine B < 16

Intent Tagging System Overview Query log Preprocessing Statement Transition 17

System Overview Baseline Rosie Jones, Kristina Lisa Klinkner.Beyond the Session
Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 18 Intent Tagging Query log Preprocessing Statement Transition

Preprocessing Query A Query B Query A Query A Query
B Query A Query B Remove duplicate queries Preprocessing Intent tagging 19

… Query log Query pairs Preprocessing Query 1 Query 2
Query 3 Query 4 Preprocessing Intent tagging 20

… Query log Query pairs Preprocessing Query 1 Query 2
Query 3 Query 4 Query 1 Query 2 Query 2 Query 3 Query 3 Query 4 Query 4 Query 5 Preprocessing Intent tagging 21

Intent Head (IH) Intent Modifier (IM) 公館韓式燒肉愛樂廚房 Xiao Li
Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 Preprocessing Intent tagging 22 •  IH •  IM:Location •  IM:Type Intent Tagging

Intent Head (IH) Intent Modifier (IM) 公館韓式燒肉愛樂廚房 Xiao Li
Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 Preprocessing Intent tagging •  IH = 愛樂廚房 •  IM:Location = 公館 •  IM:Type = 韓式燒肉 23 Intent Tagging

•  Lexical: Current word •  Syntactic: POS tag of the
current word using CKIP (中文斷詞系統) •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Preprocessing Intent tagging 24 Intent Tagging

Preprocessing Intent tagging 公館韓式燒肉愛樂廚房 •  Lexical 公館韓
式燒肉愛樂廚房 •  Syntactic 公館 (N) 韓式燒肉 (N) 愛樂廚房 (N) •  Yahoo! Life+ Title Match 公館韓式燒肉 (Y) 愛樂廚房 (Y) 25 •  Lexical: Current word •  Syntactic: POS tag of the current word using CKIP (中文斷詞系統) •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching Intent Tagging

Preprocessing Intent tagging 公館韓式燒肉愛樂廚房 •  Yahoo! Life+ Title Match:
Current N (N=1~7) in Yahoo! Life+ title using maximum matching 26 Intent Tagging

Preprocessing Intent tagging 公館韓式燒肉愛樂廚房公館韓式燒
肉愛樂廚房公館館韓韓式式燒燒肉愛樂樂廚廚房公館韓館韓式韓式燒式燒肉愛樂廚樂廚房公館韓式館韓式燒韓式燒肉愛樂廚房公館韓式燒館韓式燒肉公館韓式燒肉 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 27 Intent Tagging

Preprocessing Intent tagging 公館韓式燒肉愛樂廚房公館韓式燒
肉愛樂廚房公館館韓韓式式燒燒肉愛樂樂廚廚房公館韓館韓式韓式燒式燒肉愛樂廚樂廚房公館韓式館韓式燒韓式燒肉愛樂廚房公館韓式燒館韓式燒肉公館韓式燒肉 •  Yahoo! Life+ Title Match: Current N (N=1~7) in Yahoo! Life+ title using maximum matching 28 Intent Tagging

Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉公館 + 韓式燒肉 (Location) 29 Intent Tagging

Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳金山鴨肉彰化肉圓萬巒豬腳大王 30 Intent Tagging

Current N (N=1~7) in Yahoo! Life+ title using maximum matching 公館韓式燒肉公館 + 韓式燒肉 (Location) 基隆活海鮮餐廳金山鴨肉彰化肉圓萬巒豬腳大王 Yahoo Life+ Database query Taiwan Road & Street Database 31 Intent Tagging

公 B-N B-Y B-IM:L 館 I-N I-Y I-IM:L 韓 B-N
I-Y B-IM:T 式 I-N I-Y I-IM:T 燒 I-N I-Y I-IM:T 肉 I-N I-Y I-IM:T 愛 B-N B-Y B-IH 樂 I-N I-Y I-IH 廚 I-N I-Y I-IH 房 I-N I-Y I-IH 公館韓式燒肉愛樂廚房 CRF++ 0.58 bi-gram Preprocessing Intent tagging 32 Intent Tagging

Collecting Dataset Yahoo! 生活+ http://tw.ipeen.lifestyle.yahoo.net/ 33 Our vertical domain

Collecting Dataset 店名找具備條件的資訊 Method 1:
Find Target Method 2: Search from Location 地點找附近的餐廳類型具備條件 User search hint generation 34

Collecting Dataset - Method 1: Find Target 地址營業時間假日有沒有開
招牌菜負評好評聯絡電話評論資訊充足附近可以續攤交通方便營業時間晚不要太吵雜氣氛裝潢好愛評網前 700 熱門店家 Random select 店名找具備條件的資訊 35

Collecting Dataset - Method 2: Search from Location 評論資訊充足附近可以續攤
交通方便營業時間晚不要太吵雜氣氛裝潢好中式料理日式料理粵菜港式飲茶熱炒四川菜客家菜海鮮餐廳居酒屋日式拉麵 … 愛評網熱門地區 Random select 地點找附近的餐廳類型具備條件 36

Collecting Dataset Method 1: Find Target Method 2: Search from
Location * 5000 * 5000 37

.. Collecting Dataset Method 1: Find Target Method 2: Search
from Location •  我想找「老乾杯(延吉店)」，我想知道「招牌菜」，條件是「交通要方便」，不然需要備案 •  我想找「築地鮮魚」，我想知道「有沒有負評」，條件是「附近可以續攤」，不然需要備案 •  我想找「永康街芋頭大王」，我想知道「有沒有好評」，條件是「要多找點資訊」，不然需要備案 •  找「松山機場」附近的「運動主題餐廳」，我想知道「週六、週日有沒有開」，如果沒有我也需要其他備案 •  找「台北新光山越站前」附近的「懷石料理」，條件是「環境不要太吵雜」，我想知道「週六、週日有沒有開」，如果沒有我也需要其他備案 … … 38

Collecting Dataset User query log Search hint User query Yahoo!
生活+ http://tw.ipeen.lifestyle.yahoo.net/ 39

Collecting Dataset User query log Search hint User query user
40 q1 q2 q3 台北車站窯烤 pizza 台北車站 BELLINI PASTA PASTA 台北車站 pizza

Collecting Dataset User query log Search hint User query user
41 q1 q2 q3 台北車站窯烤 pizza 台北車站 BELLINI PASTA PASTA 台北車站 pizza

Dataset Statistics General Stats Total users 54 Total user search
log 2008 Total goals 1275 Total missions 712 Goal Stats Mission Stats Avg queries of each goal 1.5699 Avg queries of each mission 2.8101 Max queries of each goal 7 Max queries of each mission 11 Min queries of each goal 1 Min queries of each mission 1 42

Intent Tagging Evaluation L + Y L + S Y
+ S L + Y + S P R F1 P R F1 P R F1 P R F1 IH All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 43

Data Analysis for Wrong Prediction 木柵 (IM:Location) 壽司專賣山中屋壽司酒房 (IH) 新竹
(IM:Location) 懷石料理竹北 (IM:Location) 中山北路 (IM:Location) 羊肉爐台北 (IM:Location) 高雄 (IM:Location) 左營樂品咖啡館 (IH) 師大 (IM:Location) 熱炒海鮮尊王活海鮮小吃店 (IH) 內湖 (IM:Location) 拉麵六丁目拉麵 (IM:Type) 居酒屋 (IM:Type) 風林火山blog (O) 44 Wrong Combination Wrong Segmentation 德記雲吞 (IH) 茶 (IM:Type) 館 (IH) 絲襪奶茶 (IM:Type)

Intent tagging Statement Transition Statement Transition 45 Goal boundary Mission
boundary IH IM:L IM:T Is the intent tag changed ? q1 q2

NONE!!!!!"!!! = !! ∧!!! = !! `" DELETE!!!!!"!!! ≠ !!
∧!!! =!!" MODIFY!!!!!"!!! ≠ !! ∧!!! ≠!!! ∧ !!"##"$%!!"#$%&'( !! , !!! ≤ 0.5! NEW!!!!!"!!! ≠ !! ∧!!! ≠!!! ∧ !!"##"$%!!"#$%&'( !! , !!! > 0.5! SAME!!!!!"!!! ≠ !! ∧!!! ≠ !! ∧!!! = !! !! `" INSERT!!!!!"!!! = !! ∧!!! ≠!!" T !! , !! =" Intent tagging Statement Transition Statement Transition q1 q2 46

IM:T IM:L IM:L IH = DELETE IM:L = SAME IH
= NEW 台北車站 LUCCA PASTA 台北車站義大利麵 •  INSERT Target intent only appeared in q2 •  DELETE Target intent has been deleted in q2 •  NEW Target intent has been changed if Jaccard distance>=0.5 •  MODIFY Target intent has been changed if Jaccard distance<0.5 •  SAME Target intent is equal in both q1 and q2 •  NONE Target intent has not been tagged in both q1 and q2 Intent tagging Statement Transition Statement Transition 47 IM:T q1 q2

公館麻辣鍋公館麻辣鍋天麻 •  INSERT Target intent only
appeared in q2 •  DELETE Target intent has been deleted in q2 •  NEW Target intent has been changed if Jaccard distance>=0.5 •  MODIFY Target intent has been changed if Jaccard distance<0.5 •  SAME Target intent is equal in both q1 and q2 •  NONE Target intent has not been tagged in both q1 and q2 IM:T = SAME IM:L = SAME IH = INSERT IM:L IM:L IM:T IM:T Intent tagging Statement Transition Statement Transition 48 IH q1 q2

Temporal Features Word and Character Edit Features Query Log Sequence
Features Web Search Features •  inter-query time •  time diff •  sequential-queries •  lev •  word_pov •  word_suf •  nsubst_X_q1 •  nsubst_X_q2 •  nsubst_q2_X •  p_change •  pq12 •  entropy_X_q1 •  entropy_q1_X •  prisma •  commonw •  wordr Baseline 49 Preprocessing Intent tagging baseline

Temporal Features q1 q2 q2 q3 q3 q4 •  inter-query
time = threshold as a binary feature (5 mins, 30 mins, 60 mins, 120 mins) •  time diff = inter-query time in seconds •  sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user 50 Preprocessing Intent tagging baseline

•  inter-query time = threshold as a binary feature (5
mins, 30 mins, 60 mins, 120 mins) •  time diff = inter-query time in seconds •  sequential-queries = binary feature which is positive if the queries are sequential in time, with no intervening queries from the same user Temporal Features 築地鮮魚台北築地鮮魚台北 blog 築地鮮魚台北 blog 大大茶樓南京大大茶樓南京築地鮮魚台北 blog 51 Preprocessing Intent tagging baseline

Word and Character Edit Features •  lev = normalized Levenshtein
distance •  word_pov = num. characters in common starting from the left •  word_suf = num. characters in common starting from the right •  commonw = num. words in common •  wordr = jaccard distance between sets of words q1 q2 52 Preprocessing Intent tagging baseline

Word and Character Edit Features 新竹冰淇淋餐廳新竹冰淇淋餐廳莫凡彼
•  lev = 3 •  word_pov = 7 •  word_suf = 0 •  commonw = 7 •  wordr = 0.7 53 Preprocessing Intent tagging baseline

Query Log Sequence Features !(!! → !! )/!"#!" (!! →
!! )! •  pq12 = !(!! |!! ) ! !"#! (!(!! |!! ))! •  entropy_X_q1 = ! !! !! ! !"#! ! !! !! ! •  entropy_q1_X = !"#$% !∃! ! → !! ! •  nsubst_X_q1 = !"#$% !∃! ! → !! ! •  nsubst_X_q2 = !"#$% !∃! !! → ! ! •  nsubst_q2_X = !! → !! : !! ≠ !! ! •  p_change = q1 q2 qj qi 54 Preprocessing Intent tagging baseline

! Prisma = cosine distance between vectors derived from the
first 50 search results for the query terms Web Search Features Yahoo! BOSS Search API, http://developer.yahoo.com/boss/search/ 55 Preprocessing Intent tagging baseline

Goal / Mission Boundary Detection B I B B I
I Sequence Based q1 q2 q3 q4 q5 q6 Token Based

Token Based – Goal Boundary Detection LIBSVM 3.17 classifier (
30-sets train/test statistics evaluation ) 57 Goal boundary P R F1 Baseline [1] 0.6601 0.5739 0.6140 Baseline *[2] 0.7608 0.7869 0.7736 Our *[3] 0.7680 0.8247 0.7954 Fselect ( Baseline+Our )*[4] 0.7792 0.8247 0.8013 [1] commonw, prisma, time [2] lev, prisma, wordr, word_suf, word_pov, n_subst_X_q1 [3] crf_ih, crf_iml, crf_imt [4] lev, wordr, word_pov, commonw, prisma, crf_ih_state, crf_imt_state, crf_iml_state, pq12 *

Token Based – Mission Boundary Detection [1] commonw, prisma, time
[2] lev, wordr, peos_q1 prisma, time_diff, peos_q2, n_subst_X_q2 [3] crf_ih, crf_iml, crf_imt [4] wordr, commonw, lev, inter_query_time, word_pov, prisma, crf_ih_state, crf_iml_state, crf_imt_state, word_suf 58 Mission boundary P R F1 Baseline [1] 0.9172 0.9375 0.9272 Baseline *[2] 0.9313 0.9250 0.9281 Our *[3] 0.9201 0.9298 0.9249 Fselect ( Baseline+Our )*[4] 0.9282 0.9692 0.9483 LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation )

Sequence Based – Goal Boundary Detection LIBSVM 3.17 classifier (
30-sets train/test statistics evaluation ) [1] commonw, prisma, time [2] lev, prisma, wordr, word_suf, word_pov, n_subst_X_q1 [3] crf_ih, crf_iml, crf_imt [4] lev, wordr, word_pov, commonw, prisma, crf_ih_state, crf_imt_state, crf_iml_state, pq12 59 Goal boundary P R F1 Baseline [1] 0.2964 0.2573 0.2755 Baseline *[2] 0.5150 0.5317 0.5232 Our *[3] 0.5399 0.5798 0.5591 Fselect ( Baseline+Our )*[4] 0.5397 0.5712 0.5550 *

Sequence Based – Mission Boundary Detection LIBSVM 3.17 classifier (
30-sets train/test statistics evaluation ) 60 Mission boundary P R F1 Baseline [1] 0.8194 0.8375 0.8283 Baseline *[2] 0.8219 0.8163 0.8191 Our *[3] 0.8030 0.8115 0.8073 Fselect ( Baseline+Our )*[4] 0.8462 0.8837 0.8645 [1] commonw, prisma, time [2] lev, wordr, peos_q1 prisma, time_diff, peos_q2, n_subst_X_q2 [3] crf_ih, crf_iml, crf_imt [4] wordr, commonw, lev, inter_query_time, word_pov, prisma, crf_ih_state, crf_iml_state, crf_imt_state, word_suf

Goal boundary P R F1 Baseline [1] 0.6601 0.5739 0.6140
Baseline *[2] 0.7608 0.7869 0.7736 Our *[3] 0.7680 0.8247 0.7954 Fselect ( Baseline+Our )*[4] 0.7792 0.8247 0.8013 Mission boundary P R F1 Baseline [1] 0.9172 0.9375 0.9272 Baseline *[2] 0.9313 0.9250 0.9281 Our *[3] 0.9201 0.9298 0.9249 Fselect ( Baseline+Our )*[4] 0.9282 0.9692 0.9483 Goal boundary P R F1 Baseline [1] 0.2964 0.2573 0.2755 Baseline *[2] 0.5150 0.5317 0.5232 Our *[3] 0.5399 0.5798 0.5591 Fselect ( Baseline+Our )*[4] 0.5397 0.5712 0.5550 Mission boundary P R F1 Baseline [1] 0.8194 0.8375 0.8283 Baseline *[2] 0.8219 0.8163 0.8191 Our *[3] 0.8030 0.8115 0.8073 Fselect ( Baseline+Our )*[4] 0.8462 0.8837 0.8645 Sequence-based Token-based LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) LIBSVM 3.17 classifier ( 30-sets train/test statistics evaluation ) * *

Data Analysis q1 q2 Type1: Add Location The Diner 樂子美式餐廳
(IH) The Diner 樂子美式餐廳 (IH) 科技大樓 (IM:Location) Same goal Informational Informational

Data Analysis q1 q2 Type2: Add Intent Head 中山北路 (IM:Location)
早午餐 (IM:Type) 中山北路 (IM:Location) 早午餐 (IM:Type) HANA BI (IH) Goal boundary Navigational Informational

Data Analysis q2 q3 Type3: Add descriptions ( IH in
q1 ) 洛城牛肉河粉 (IH) blog (O) 洛城牛肉河粉 (IH) blog (O) 推薦菜 (O) q1 洛城牛肉河粉 (IH) Same goal Informational Informational Informational

Data Analysis q2 q3 Type4: Add descriptions ( without IH
in q1 ) 永康街 (IM:Location) 懷舊料理 (IM:Type) 豐盛食堂 (IH) 永康街 (IM:Location) 懷舊料理 (IM:Type) 豐盛食堂 (IH) blog (O) Goal boundary q1 永康街 (IM:Location) 懷舊料理 (IM:Type) Navigational Informational Informational

Conclusion A.  Intent tagging model •  Get well evaluation result
to predict vertical search query intent, and also let search engine know what users wanted. Intent tagging predict goal/mission boundary •  It is not only improve 3% ~ 5% accuracy in both goal and mission boundary, but also tagging the search query in (IH, IM:Type, IM:Location or Others) in vertical search domain. 66

Future Work A.  Intent tagging model •  Cause the price
of manual tagging is high, and we would like to auto- generate answers to make it cost down. B.  Intent tagging predict goal/mission boundary •  Implement our method to other similar vertical search domain, for example: automobile, movie etc. 67

Reference !   Rosie Jones and Kristina Lisa Klinkner, Beyond
the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs, CIKM 2008 !   Xiao Li, Understanding the Semantic Structure of Noun Phrase Queries, ACL 2010 !   Alexander Kotov, Paul N Bennett, Ryen W White, Susan T Dumais, and Jaime Teevan, Modeling and Analysis of Cross-Session Search Tasks, SIGIR 2011 68

Intent Tagging Evaluation L (Lexical) Y (Yahoo! Life+ Title Match)
S (Syntactic) P R F1 P R F1 P R F1 IH Left .93780 .9502 .9440 .8822 .7940 .8358 .9121 .8594 .8850 Right .8869 .8482 .8671 .6622 .5923 .6253 .7956 .6626 .7230 All .8698 .8616 .8657 .6337 .5884 .6102 .7733 .6741 .7203 IM:Location Left .9771 .9765 .9768 .9225 .8735 .8973 .8929 .8702 .8814 Right .9543 .9424 .9483 .8117 .6941 .7483 .7940 .6931 .7402 All .9414 .9426 .9420 .7682 .6960 .7303 .7399 .6934 .7159 IM:Type Left .9519 .9470 .9495 .7325 .8804 .7997 .8614 .8186 .8394 Right .8923 .8390 .8648 .5626 .7005 .6240 .6784 .6303 .6534 All .8584 .8437 .8510 .4694 .7000 .5620 .6117 .6290 .6202 CRF++ 0.58 with 10 fold cross validation 70 Backup slides

Intent Tagging Evaluation Acc F1 L + Y 0.8985 0.8910
L + S 0.8972 0.8826 Y + S 0.7827 0.7471 L + Y + S 0.9038 0.8934 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic Backup slides

Intent Tagging Analysis 中壢 (IM:Location) 赤坂拉麵 (IH) 燒鳥串燒 (IH) 台北
(IM:Location) 師大 (IM:Location) 中式早餐 (IM:Type) 台北市忠孝東路四段 (IM:Location) 咖啡 (IM:Type) 台大 (IM:Location) 墨西哥菜 (IM:Type) 國父紀念館 (IM:Location) 中菜吃到飽 (IM:Type) 忠孝新生 (IM:Location) 懷舊餐廳 (IM:Type) 科技大樓 (IM:Location) 韓式燒肉 (IM:Type) 台大醫院 (IM:Location) 藝奇新日式料理 (IH) 台北101 (IM:Location) 石鍋拌飯 (IM:Type) query 72 Backup slides

Goal Boundary Features Weight Feature Set Weight lev Word and
Character Edit 0.5103 wordr Word and Character Edit 0.4660 word_pov Word and Character Edit 0.4185 commonw Word and Character Edit 0.3287 Prisma Web Search 0.1980 crf_ih_state Intent Tagging 0.1593 crf_imt_state Intent Tagging 0.1302 crf_iml_state Temporal 0.0835 crf_iml_state Intent Tagging 0.0511 pq12 Query Log Sequence 0.0279 Backup slides Goal boundary feature weight using fselect in libsvm

Mission Boundary Features Weight Mission boundary feature weight using fselect
in libsvm Feature Set Weight wordr Word and Character Edit 0.8284 commonw Word and Character Edit 0.7014 lev Word and Character Edit 0.6206 inter_query_time Temporal 0.4788 word_pov Word and Character Edit 0.4109 Prisma Web Search 0.1164 crf_iml_state Intent Tagging 0.0835 word_suf Word and Character Edit 0.0348 crf_ih_state Intent Tagging 0.0291 entropy_q1_X Query Log Sequence 0.0183 Backup slides

Join Modeling mission mission mission goal goal goal goal goal
Mission boundary Goal boundary Predict Backup slides

Join Modeling mission mission mission goal goal goal goal Mission
boundary Goal boundary Predict X •  Though the mission boundary is higher than goal boundary detection, so the result does not get improved goal Backup slides

Goal Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout
Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 •  A goal can be thought of as a group of related queries to accomplish a single discrete task. •  The queries need not be contiguous, but may be interleaved with queries from other goals A search goal is an atomic information need, resulting in one or more queries. Problem Definition Backup slides

•  A mission is then an extended information need A
search mission is a related set of information needs, resulting in one or more goals. Problem Definition Mission Rosie Jones, Kristina Lisa Klinkner. Beyond the Session Timeout Automatic Hierarchical Segmentation of Search Topics in Query Logs. CIKM 2008 Backup slides

IM:Category IM:City IM:County IM:Employer IM:Level IM:Salary IM:State IM:Type Intent head
Intent modifier Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides

Transition: Transiting from state a to b Lexical: Current word
Semantic: Current N-gram occurs in lexicon L (N=1~4) Syntactic: POS tag of the current word Related Work Intent Tagging Xiao Li Understanding the Semantic Structure of Noun Phrase Queries ACL 2010 query Backup slides

Related Work Mentioned Goal and Mission !   Debora Donato
et al. Do you want to take notes? Identifying research missions in Yahoo! Search Pad. WWW 2010. !   Claudio Lucchese, Salvatore Orlando et al. Identifying Task-based Sessions in Search Engine Query Logs. WSDM 2011. Backup slides

Conditional Random Fields Feature 1 Feature 2 Feature 3 …
Feature 1 Feature 2 Feature 3 … Feature 1 Feature 2 Feature 3 … Relation Label 1 Relation Label 2 Relation Label 3 Relation Label N Feature 1 Feature 2 Feature 3 … Backup slides

Baseline CRF Format F1 F2 F3 F4 F5 … F1
F2 F3 F4 F5 … F1 F2 F3 F4 F5 … F1 F2 F3 F4 F5 … ( q1 , q2 ) F1 F2 F3 F4 F5 … User 1 User 2 … Label Features O B I O B ( q2 , q3 ) ( q3 , q4 ) ( q4 , q5 ) ( q5 , q6 ) SPACE Backup slides

answer predict IH IM:Type IH IM:Type Jin-Dong KIM, Tomoko OHTA
et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 85 Backup slides

IH IM:Type IH IM:Type answer predict Left boundary Left boundary
Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. Intent Tagging Evaluation 86 Backup slides

Intent Tagging Evaluation IH IM:Type IH IM:Type answer predict IM:Type
Right boundary Right boundary Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 87 Backup slides

IH IM:Type IH IM:Type Intent Tagging Evaluation answer predict Complete
match Jin-Dong KIM, Tomoko OHTA et al. Introduction to the Bio-Entity Recognition Task at JNLPBA. Proceeding JNLPBA '04. 88 Backup slides

Intent Tagging Evaluation L + Y L + S Y
+ S L + Y + S P R F1 P R F1 P R F1 P R F1 IH Left .9540 .9521 .9530 .9510 .9397 .9453 .9171 .8591 .8872 .9482 .9445 .9464 Right .8930 .8522 .8721 .8861 .8710 .8785 .8221 .6960 .7538 .8972 .8667 .8817 All .8811 .8656 .8733 .8780 .8676 .8728 .8011 .7044 .7497 .8874 .8688 .8780 IM:Location Left .9785 .9750 .9768 .9759 .9741 .9750 .9472 .9415 .9443 .9808 .9781 .9795 Right .9568 .9355 .9460 .9560 .9398 .9478 .8602 .8320 .8458 .9650 .9393 .9520 All .9456 .9359 .9408 .9419 .9385 .9402 .7222 .6699 .6951 .9530 .9400 .9465 IM:Type Left .9481 .9543 .9512 .9499 .9465 .9482 .8639 .8441 .8539 .9503 .9525 .9514 Right .8876 .8598 .8735 .8891 .8477 .8679 .7221 .6699 .6951 .8797 .8595 .8695 All .8457 .8660 .8557 .8537 .8531 .8534 .6517 .6714 .6614 .8523 .8660 .8591 CRF++ 0.58 with 10 fold cross validation L = Lexical Y = Yahoo! Life+ Title Match S = Syntactic 89 Backup slides

Conditional Random Fields !   Conditional Random Fields (CRFs) are
a class of statistical modeling method often applied in pattern recognition and machine learning, where they are used for structured prediction. CRF++, http://crfpp.googlecode.com/svn/trunk/doc/index.html Backup slides

Goal / Mission Boundary Detection Real Case Query 1 Query
2 IH IM:Location IM:Type Real Predict Real Predict Real Predict 燒鳥台北燒鳥串燒台北 MODIFY MODIFY SAME SAME NONE DELETE 燒鳥串燒台北師大早餐 DELETE DELETE NEW NEW INSERT INSERT 師大早餐師大中式早餐 NONE NONE SAME SAME MODIFY MODIFY 中壢拉麵中壢赤坂拉麵 INSERT INSERT SAME SAME DELETE DELETE 中壢赤坂拉麵中壢赤坂拉麵時間 SAME SAME SAME SAME DELETE NEW 中壢拉麵推薦中壢伊太郎 INSERT INSERT SAME SAME DELETE NEW 中壢伊太郎中壢伊太郎推薦 SAME SAME SAME SAME NONE DELETE 中壢伊太郎推薦風車故鄉餐廳時間 NEW NEW DELETE NEW NONE INSERT 內湖水鳥22 法式小館桃園川菜 DELETE DELETE NEW NEW INSERT NEW 桃園川菜桃園福利川菜 INSERT NONE SAME SAME DELETE MODIFY Backup slides

Intent Tagging Confusion Matrix IH IM:Location IM:Type O B I
B I B I O IH B 871 41 56 1 57 8 38 I 40 3942 52 316 1 101 102 IM:Location B 63 33 746 1 9 8 15 I 1 249 8 1689 0 14 31 IM:Type B 55 5 4 0 1309 16 7 I 0 117 0 6 11 2332 18 O O 31 161 31 45 21 41 4240 predicted class actual class Using L + Y + S feature combination Backup slides

2. Given a multi-query task for a user, predict whether
the user will return to this task in the future 1.  Given a user query, identify all related queries from previous sessions that the user has issued Task Continuation Alexander Kotov, Paul N. Bennett, Ryen W. White, Susan T. Dumais, and Jaime Teevan. Modeling and Analysis of Cross-Session Search Tasks. SIGIR 2011 93 Same Task Related Research Related Work Backup slides

30-Sets Train/Test Statistics Evaluation Independent Set (fsselect) Testing Testing Training
30-fold … 1 4 :

Exploiting Intent Tagging Information to Improv...

Exploiting Intent Tagging Information to Improve Vertical Search Query Intent Segmentation

More Decks by bryanyuan2

Featured

Transcript