Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Wisdom of Advertisers: Mining Subgoals via Query Clustering

The Wisdom of Advertisers: Mining Subgoals via Query Clustering

CIKM2012 Session: Advertising

Takehiro Yamamoto

October 10, 2012
Tweet

More Decks by Takehiro Yamamoto

Other Decks in Research

Transcript

  1. 1 The Wisdom of Advertisers: Mining Subgoals via Query Clustering

    CIKM2012 Session: Advertising Takehiro Yamamoto Kyoto University Tetsuya Sakai MSRA Mayu Iwata Osaka University Yu Chen MSRA Ji-Rong Wen MSRA Katsumi Tanaka Kyoto University
  2. Background (1/2) 2 Travel Maui Information needs of searchers are

    sometimes Complex Book flights Hawaiian Air cheap flight tickets Sheraton Maui cheap Maui hotel Book a hotel Join a tour travel agents Maui one day Maui tour sea foods Maui Maui restaurants Find restaurants
  3. Background (2/2) 3 Lose Weight Information needs of searchers are

    sometimes Complex Do physical exercise fitness center swimming school healthy recipes weight loss foods Control calories intake Have diet pills diet pills HCG drops lose weight surgery Lap band Have surgery
  4. Search Goal and Subgoals (Our Definition) Search goal: A search

    goal is an action that the searcher wants to achieve, often represented by a verb plus possibly a noun phrase Subgoal: A search goal x is a subgoal of another search goal y, if achieving x helps the searcher to also achieve y either wholly or partially 4 lose weight Do physical exercise Have diet pills travel Maui Book flights Book a hotel (wholly) achieve (partially) achieve Subgoals Search goal
  5. Our Goal: Subgoal Mining 5 Automatically mine subgoals of a

    given search goal from data by means of query clustering INPUT OUTPUT A query that represents a search goal Clustered related queries in the viewpoint of subgoals Lose Weight fitness workout health clubs diet recipes healthy recipes Jillian michaels denise autin Kathy smith Sponsored search data
  6. Application Example 6 Search lose weight pills Provide the searcher

    with alternative solutions — Suggest another solutions that the searcher may not be aware of
  7. Application Example 7 Search lose weight pills Provide the searcher

    with alternative solutions — Suggest another solutions that the searcher may not be aware of lose weight
  8. Application Example 8 Search lose weight pills Provide the searcher

    with alternative solutions — Suggest another solutions that the searcher may not be aware of lose weight - fitness centers - health clubs - swimming clubs - healthy foods - diet recipes - protein foods - protein Related searches
  9. Conventional Approach 9 Conventional Query clustering — Session data [Boldi2008][Wang2009]

    — Clickthrough data [Beeferman2000][Wen2001] — Session + Clickthrough data [Sadikov2010] (our baseline) l Session Data — Queries co-occur in the same search session represent the same user intent l Clickthrough Data — Queries that share the similar document clicks represent the same user intent [Sadikov2010] Sadikov et al., Query Refinements by User Intent., WWW2010 Maui hotels Sheraton Maui www.sheraton-maui.com Sheraton Maui Sheraton Maui Resort & Spa
  10. Problems from the viewpoint of Subgoal Mining l Session data

    — Single session may contain multiple goals — User’s search process may span multiple sessions l Clickthrough data — Document may contain multiple goals 10 Session and clickthrough data are not sufficient for Subgoal Mining Hawaiian Air Maui hotels Webpage of All about Maui flights Maui Maui hotels
  11. Our Idea 11 Leveraging Sponsored Search (Ads) data for mining

    subgoals curves women's fitness Lose weight and keep it off! Request a free week pass today. www.curves.com health clubs weight loss exercise Bid phrase Ad
  12. Strong points of Sponsored search data l One ad is

    directly connected to one goal — Ads are designed to make the user perform an action or transaction l Ads reflect the advertisers’ tremendous effort — They try to match searchers’ various queries with the underlying user intents through diverse bid phrases — We can obtain high quality and diverse ad-query relationships 12 curves women's fitness Lose weight and keep it off! Request a free week pass today. www.curves.com health clubs weight loss exercise
  13. Strong points of Sponsored search data l One ad is

    directly connected to one goal — Ads are designed to make the user perform an action or transaction l Ads reflect the advertisers’ tremendous effort — They try to match searchers’ various queries with the underlying user intents through diverse bid phrases — We can obtain high quality and diverse ad-query relationships 13 curves women's fitness Lose weight and keep it off! Request a free week pass today. www.curves.com health clubs weight loss exercise Ads are useful resource for mining subgoals Hypothesis Hypothesis
  14. Overview of Our Method 14 Conventional Method ≒ [Sadikov2010] Sponsored

    search data Session data Clickthrough data Co-occurring queries in the same session 1. Collecting Related queries 2. Clustering Queries Random walk on the Query-Document Graph Our method Bid phrases of ads 2. Clustering Queries Random walk on the Query-Ad Graph 1. Collecting Related queries [Sadikov2010] Sadikov et al., Query Refinements by User Intent., WWW2010 Input query Input query
  15. 1. Collecting Related Queries 1. Retrieve ads that contain the

    input query in their contents 2. Collect top n queries (bid phrases) that have most impressions for the retrieved ads ※ impression = display of ad to the searcher 15 curves women's fitness lose weight and keep it off! request a free week pass today. www.curves.com health club fitness center Diet tea Wu long tea Official wu long diet tea original wu long slimming tea. wu long tea helps you lose weight. www.officialwulongtea.com lose weight health club fitness center Wu long tea Related queries 100 50 70 30 Use contents of ads for obtaining related queries Input query:
  16. 2. Clustering Queries into Subgoals (1/2) 16 Combine two information

    for clustering queries into subgoals 1. Ad impression in sponsored search data 2. Query co-occurrence in session data health club fitness center Wu long tea Co-occur frequently ? Similar ad impression?
  17. 2. Clustering Queries into Subgoals (2/2) 17 Combine two information

    for clustering queries into subgoals 1. Ad impression in sponsored search data 2. Query co-occurrence in session data Random Walk on the Query-Ad Graph — Query-to-Query transition: Probability of the query qi co-occurs with qj in the same session — Query-to-Ad transition: Probability of ad a is displayed in response to query q Queries Ads
  18. Example 18 Input Query: Relieve Stress ・cheap lexapro ・generic lexapro

    ・lexapro side effects, ・wellbutrin xl 150 mg ・baseball stress balls ・stress ball ・stress relief toy ・stress toys ・body massage ・massage therapy ・massage therapist ・stress factory ・exercise heal ・gaiam ・holden ・qigong ・anxiety medications ・herbs anxiety ・zen garden Query Clusters
  19. Experiment: Proposed and Baseline methods 19 Baseline Method (DocClick) ≒

    [Sadikov2010] Sponsored search data Session data Clickthrough data Frequently co-occurring queries with input query - Document clicks - Query co-occurrence Collecting Related queries Clustering Queries Proposed method (AdImp) - Ad impressions - Query co-occurrence Queries that have high impression to ads Collecting Related queries Clustering Queries
  20. Queries used in Experiment 20 Domain Example Queries Business car

    insurance dept relief lawn care resume writing project management Health back pain relief eye care lose weight teeth whitening quit smoking Recreation disney cruise fly fishing Hiking vegas shows whale watching Society iq test learn spanish sat prep us immigration wedding Sports bodybuilding kayaking skateboarding workout routines hockey equipment 121 queries - frequent queries in sponsored search data - contain multiple subgoals through assessors’ judgments
  21. Ground Truth Construction 21 Manually cluster queries into subgoals Collection

    A Collection B-1 Collection B-2 98 queries 23 queries lose weight 80 queries obtained from sponsored search data (our method) 80 queries obtained from session data (baseline method) Query pool Manual clustering of queries fitness health club exercising Do physical exercise xxxxx query query query yyyy query query query zzzzz query query query
  22. Ground Truth Subgoal Construction 22 Manually cluster queries into subgoals

    Collection A Collection B-1 Collection B-2 98 queries 23 queries lose weight 80 queries obtained by sponsored search data 80 queries obtained by session data Query pool Manual clustering of queries fitness health club exercising Do phycisal exercise xxxxx query query query yyyy query query query zzzzz query query query Evaluate our method with a large data set
  23. Ground Truth Subgoal Construction 23 Manually cluster queries into subgoals

    Collection A Collection B-1 Collection B-2 98 queries 23 queries lose weight 80 queries obtained by sponsored search data 80 queries obtained by session data Query pool Manual clustering of queries fitness health club exercising Do phycisal exercise xxxxx query query query yyyy query query query zzzzz query query query Find general trends across different datasets/assessors
  24. Statistics of Test Collection 24 Collection-A Collection B-1 Collection B-2

    Ads Session Ads Session Ads Session Non-Relevant Queries (%) 8.2 41.3 28.8 66.1 11.4 52.4 Overlap of Subgoals .71 .51 .62
  25. Statistics of Test Collection 25 Collection-A Collection B-1 Collection B-2

    Ads Session Ads Session Ads Session Non-Relevant Queries (%) 8.2 41.3 28.8 66.1 11.4 52.4 Overlap of Subgoals .71 .51 .62 Sponsored search data can easily obtain more relevant queries to the given topic − Session data contained many topic shifted queries e.g “Sheraton Maui” -> “C# Programming”
  26. Statistics of Test Collection 26 Collection-A Collection B-1 Collection B-2

    Ads Session Ads Session Ads Session Non-Relevant Queries (%) 8.2 41.3 28.8 66.1 11.4 52.4 Overlap of Subgoals .71 .51 .62 Overlap of subgoals between sponsored search and session data are high − We can extract searchers’ intents from sponsored search data
  27. (a) Collection A (b) Collection B-1 (c) Collection B-2 0

    0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 F1 purity 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 purity 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 purity Quality of clustering results Purity = homogeneity of clustering results F1 = accuracy of classification Our method (AdImp) consistently outperformed conventional (DocClick) method (session+clickthrough) AdImp DocClick 27
  28. Difference of Sponsored Search and Session data (1/2) 28 Subgoals

    that contain queries obtained only from session data query = “back pain relief” - back pain causes - causes of back pain learn about cause query = “moving” - moving to do list - moving checklist make a TODO list No ads are associated with these related queries These related queries are not related to money Our method cannot identify this kind of subgoals Related queries Subgoal Related queries Subgoal
  29. Difference of Sponsored Search and Session data (2/2) 29 Subgoals

    that contain queries obtained only from sponsored search data query = “relieve stress” - zen garden visit a zen garden query = “quit smoking” - acupuncture - acupuncture stop smoking have acupuncture treatment The searchers may not be aware of these solutions Our method (sponsored search data) can provide unexpected solutions to the searcher Related queries Subgoal Related queries Subgoal
  30. Summary 30 Mining subgoals via query clustering l Utilized Sponsored

    search data l Our method outperformed conventional (session+clickthrough) approach Future Work l Finding relationships among goals — While some subgoals wholly achieve their search goal, some partially achieve their search goal — Some subgoals need to be satisfied after others are achieved “Book hotels” should be achieved before achieving “Find restaurant”