Slide 1

Slide 1 text

1 The Wisdom of Advertisers: Mining Subgoals via Query Clustering CIKM2012 Session: Advertising Takehiro Yamamoto Kyoto University Tetsuya Sakai MSRA Mayu Iwata Osaka University Yu Chen MSRA Ji-Rong Wen MSRA Katsumi Tanaka Kyoto University

Slide 2

Slide 2 text

Background (1/2) 2 Travel Maui Information needs of searchers are sometimes Complex Book flights Hawaiian Air cheap flight tickets Sheraton Maui cheap Maui hotel Book a hotel Join a tour travel agents Maui one day Maui tour sea foods Maui Maui restaurants Find restaurants

Slide 3

Slide 3 text

Background (2/2) 3 Lose Weight Information needs of searchers are sometimes Complex Do physical exercise fitness center swimming school healthy recipes weight loss foods Control calories intake Have diet pills diet pills HCG drops lose weight surgery Lap band Have surgery

Slide 4

Slide 4 text

Search Goal and Subgoals (Our Definition) Search goal: A search goal is an action that the searcher wants to achieve, often represented by a verb plus possibly a noun phrase Subgoal: A search goal x is a subgoal of another search goal y, if achieving x helps the searcher to also achieve y either wholly or partially 4 lose weight Do physical exercise Have diet pills travel Maui Book flights Book a hotel (wholly) achieve (partially) achieve Subgoals Search goal

Slide 5

Slide 5 text

Our Goal: Subgoal Mining 5 Automatically mine subgoals of a given search goal from data by means of query clustering INPUT OUTPUT A query that represents a search goal Clustered related queries in the viewpoint of subgoals Lose Weight fitness workout health clubs diet recipes healthy recipes Jillian michaels denise autin Kathy smith Sponsored search data

Slide 6

Slide 6 text

Application Example 6 Search lose weight pills Provide the searcher with alternative solutions — Suggest another solutions that the searcher may not be aware of

Slide 7

Slide 7 text

Application Example 7 Search lose weight pills Provide the searcher with alternative solutions — Suggest another solutions that the searcher may not be aware of lose weight

Slide 8

Slide 8 text

Application Example 8 Search lose weight pills Provide the searcher with alternative solutions — Suggest another solutions that the searcher may not be aware of lose weight - fitness centers - health clubs - swimming clubs - healthy foods - diet recipes - protein foods - protein Related searches

Slide 9

Slide 9 text

Conventional Approach 9 Conventional Query clustering — Session data [Boldi2008][Wang2009] — Clickthrough data [Beeferman2000][Wen2001] — Session + Clickthrough data [Sadikov2010] (our baseline) l Session Data — Queries co-occur in the same search session represent the same user intent l Clickthrough Data — Queries that share the similar document clicks represent the same user intent [Sadikov2010] Sadikov et al., Query Refinements by User Intent., WWW2010 Maui hotels Sheraton Maui www.sheraton-maui.com Sheraton Maui Sheraton Maui Resort & Spa

Slide 10

Slide 10 text

Problems from the viewpoint of Subgoal Mining l Session data — Single session may contain multiple goals — User’s search process may span multiple sessions l Clickthrough data — Document may contain multiple goals 10 Session and clickthrough data are not sufficient for Subgoal Mining Hawaiian Air Maui hotels Webpage of All about Maui flights Maui Maui hotels

Slide 11

Slide 11 text

Our Idea 11 Leveraging Sponsored Search (Ads) data for mining subgoals curves women's fitness Lose weight and keep it off! Request a free week pass today. www.curves.com health clubs weight loss exercise Bid phrase Ad

Slide 12

Slide 12 text

Strong points of Sponsored search data l One ad is directly connected to one goal — Ads are designed to make the user perform an action or transaction l Ads reflect the advertisers’ tremendous effort — They try to match searchers’ various queries with the underlying user intents through diverse bid phrases — We can obtain high quality and diverse ad-query relationships 12 curves women's fitness Lose weight and keep it off! Request a free week pass today. www.curves.com health clubs weight loss exercise

Slide 13

Slide 13 text

Strong points of Sponsored search data l One ad is directly connected to one goal — Ads are designed to make the user perform an action or transaction l Ads reflect the advertisers’ tremendous effort — They try to match searchers’ various queries with the underlying user intents through diverse bid phrases — We can obtain high quality and diverse ad-query relationships 13 curves women's fitness Lose weight and keep it off! Request a free week pass today. www.curves.com health clubs weight loss exercise Ads are useful resource for mining subgoals Hypothesis Hypothesis

Slide 14

Slide 14 text

Overview of Our Method 14 Conventional Method ≒ [Sadikov2010] Sponsored search data Session data Clickthrough data Co-occurring queries in the same session 1. Collecting Related queries 2. Clustering Queries Random walk on the Query-Document Graph Our method Bid phrases of ads 2. Clustering Queries Random walk on the Query-Ad Graph 1. Collecting Related queries [Sadikov2010] Sadikov et al., Query Refinements by User Intent., WWW2010 Input query Input query

Slide 15

Slide 15 text

1. Collecting Related Queries 1. Retrieve ads that contain the input query in their contents 2. Collect top n queries (bid phrases) that have most impressions for the retrieved ads ※ impression = display of ad to the searcher 15 curves women's fitness lose weight and keep it off! request a free week pass today. www.curves.com health club fitness center Diet tea Wu long tea Official wu long diet tea original wu long slimming tea. wu long tea helps you lose weight. www.officialwulongtea.com lose weight health club fitness center Wu long tea Related queries 100 50 70 30 Use contents of ads for obtaining related queries Input query:

Slide 16

Slide 16 text

2. Clustering Queries into Subgoals (1/2) 16 Combine two information for clustering queries into subgoals 1. Ad impression in sponsored search data 2. Query co-occurrence in session data health club fitness center Wu long tea Co-occur frequently ? Similar ad impression?

Slide 17

Slide 17 text

2. Clustering Queries into Subgoals (2/2) 17 Combine two information for clustering queries into subgoals 1. Ad impression in sponsored search data 2. Query co-occurrence in session data Random Walk on the Query-Ad Graph — Query-to-Query transition: Probability of the query qi co-occurs with qj in the same session — Query-to-Ad transition: Probability of ad a is displayed in response to query q Queries Ads

Slide 18

Slide 18 text

Example 18 Input Query: Relieve Stress ・cheap lexapro ・generic lexapro ・lexapro side effects, ・wellbutrin xl 150 mg ・baseball stress balls ・stress ball ・stress relief toy ・stress toys ・body massage ・massage therapy ・massage therapist ・stress factory ・exercise heal ・gaiam ・holden ・qigong ・anxiety medications ・herbs anxiety ・zen garden Query Clusters

Slide 19

Slide 19 text

Experiment: Proposed and Baseline methods 19 Baseline Method (DocClick) ≒ [Sadikov2010] Sponsored search data Session data Clickthrough data Frequently co-occurring queries with input query - Document clicks - Query co-occurrence Collecting Related queries Clustering Queries Proposed method (AdImp) - Ad impressions - Query co-occurrence Queries that have high impression to ads Collecting Related queries Clustering Queries

Slide 20

Slide 20 text

Queries used in Experiment 20 Domain Example Queries Business car insurance dept relief lawn care resume writing project management Health back pain relief eye care lose weight teeth whitening quit smoking Recreation disney cruise fly fishing Hiking vegas shows whale watching Society iq test learn spanish sat prep us immigration wedding Sports bodybuilding kayaking skateboarding workout routines hockey equipment 121 queries - frequent queries in sponsored search data - contain multiple subgoals through assessors’ judgments

Slide 21

Slide 21 text

Ground Truth Construction 21 Manually cluster queries into subgoals Collection A Collection B-1 Collection B-2 98 queries 23 queries lose weight 80 queries obtained from sponsored search data (our method) 80 queries obtained from session data (baseline method) Query pool Manual clustering of queries fitness health club exercising Do physical exercise xxxxx query query query yyyy query query query zzzzz query query query

Slide 22

Slide 22 text

Ground Truth Subgoal Construction 22 Manually cluster queries into subgoals Collection A Collection B-1 Collection B-2 98 queries 23 queries lose weight 80 queries obtained by sponsored search data 80 queries obtained by session data Query pool Manual clustering of queries fitness health club exercising Do phycisal exercise xxxxx query query query yyyy query query query zzzzz query query query Evaluate our method with a large data set

Slide 23

Slide 23 text

Ground Truth Subgoal Construction 23 Manually cluster queries into subgoals Collection A Collection B-1 Collection B-2 98 queries 23 queries lose weight 80 queries obtained by sponsored search data 80 queries obtained by session data Query pool Manual clustering of queries fitness health club exercising Do phycisal exercise xxxxx query query query yyyy query query query zzzzz query query query Find general trends across different datasets/assessors

Slide 24

Slide 24 text

Statistics of Test Collection 24 Collection-A Collection B-1 Collection B-2 Ads Session Ads Session Ads Session Non-Relevant Queries (%) 8.2 41.3 28.8 66.1 11.4 52.4 Overlap of Subgoals .71 .51 .62

Slide 25

Slide 25 text

Statistics of Test Collection 25 Collection-A Collection B-1 Collection B-2 Ads Session Ads Session Ads Session Non-Relevant Queries (%) 8.2 41.3 28.8 66.1 11.4 52.4 Overlap of Subgoals .71 .51 .62 Sponsored search data can easily obtain more relevant queries to the given topic − Session data contained many topic shifted queries e.g “Sheraton Maui” -> “C# Programming”

Slide 26

Slide 26 text

Statistics of Test Collection 26 Collection-A Collection B-1 Collection B-2 Ads Session Ads Session Ads Session Non-Relevant Queries (%) 8.2 41.3 28.8 66.1 11.4 52.4 Overlap of Subgoals .71 .51 .62 Overlap of subgoals between sponsored search and session data are high − We can extract searchers’ intents from sponsored search data

Slide 27

Slide 27 text

(a) Collection A (b) Collection B-1 (c) Collection B-2 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F1 purity 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 purity 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 purity Quality of clustering results Purity = homogeneity of clustering results F1 = accuracy of classification Our method (AdImp) consistently outperformed conventional (DocClick) method (session+clickthrough) AdImp DocClick 27

Slide 28

Slide 28 text

Difference of Sponsored Search and Session data (1/2) 28 Subgoals that contain queries obtained only from session data query = “back pain relief” - back pain causes - causes of back pain learn about cause query = “moving” - moving to do list - moving checklist make a TODO list No ads are associated with these related queries These related queries are not related to money Our method cannot identify this kind of subgoals Related queries Subgoal Related queries Subgoal

Slide 29

Slide 29 text

Difference of Sponsored Search and Session data (2/2) 29 Subgoals that contain queries obtained only from sponsored search data query = “relieve stress” - zen garden visit a zen garden query = “quit smoking” - acupuncture - acupuncture stop smoking have acupuncture treatment The searchers may not be aware of these solutions Our method (sponsored search data) can provide unexpected solutions to the searcher Related queries Subgoal Related queries Subgoal

Slide 30

Slide 30 text

Summary 30 Mining subgoals via query clustering l Utilized Sponsored search data l Our method outperformed conventional (session+clickthrough) approach Future Work l Finding relationships among goals — While some subgoals wholly achieve their search goal, some partially achieve their search goal — Some subgoals need to be satisfied after others are achieved “Book hotels” should be achieved before achieving “Find restaurant”

Slide 31

Slide 31 text

Appendix: Developed Interface for Manual Clustering 31