Pro Yearly is on sale from $80 to $50! »

オンライン広告における不正クリック検出手法と歴史

 オンライン広告における不正クリック検出手法と歴史

2016-09-03
データマイニング+WEB東京での発表資料です

D77e6b2d469947a4792ab062d466350b?s=128

Takashi Nishibayashi

September 03, 2016
Tweet

Transcript

  1. ΦϯϥΠϯ޿ࠂʹ͓͚Δ ෆਖ਼ΫϦοΫݕग़ख๏ͱྺ࢙ Takashi Nishibayashi (@hagino3000) 2016-09-03 ୈ56ճ #TokyoWebmining 

  2. Agenda 1.Introduction 2.໰୊ͷഎܠઆ໌ 3.Click Fraudͷख๏ 4.࿦จ঺հɺClick Fraudݕग़ख๏ͷྺ࢙ 5.֤ख๏ͱ࣮૷࣌ͷ࿩ 

  3. ࣗݾ঺հ ID: hagino3000 Name: ੢ྛ ޹ (Takashi Nishibayashi) Job: Software

    Engineer ݱࡏ͸ΞυωοτϫʔΫࣄۀऀʹͯ഑৴ޮ཰ͷ࠷ దԽʹैࣄ (ೖࡳՁ֨ௐ੔ϩδοΫɾ޿ࠂબ୒ϩ δοΫͷઃܭ͔Β࣮૷·Ͱ) 
  4. ൃදͷ಺༰ ✴ ࿩͢͜ͱ ✴ Click Fraudͱ͸Կ͔ ✴ Click Fraudݕग़ख๏ͷྺ࢙ ✴

    ϧʔϧϕʔεͱҟৗݕ஌Ξϓϩʔν ✴ ࿩ͤͳ͍ࣄ ✴ ฐࣾͷ࣮σʔλɺ۩ମతͳݕग़ϧʔϧ ✴ ԿނClick Fraudͳͷ͔ ✴ Ad FraudͷதͰ΋Click Fraud͕CPCϞσϧͷσΟεϓϨΠ޿ ࠂΛѻ͍ͬͯΔൃදऀʹͱͬͯ࠷΋਎ۙͳͨΊ 
  5. 1. ໰୊ͷഎܠઆ໌ ✴ Ad Networkͱ͸ ✴ ΫϦοΫใुܕ޿ࠂ ✴ ༻ޠ ✴

    ෆਖ਼ΫϦοΫͱClick Fraud ✴ Click FraudͷԿ͕໰୊ͳͷ͔ 
  6. Ad Networkͱ͸ ✴ ΠϯλʔωοτͷσΟεϓϨΠ޿ࠂྖҬʹ͓͍ͯɺ ෳ਺ͷ޿ࠂओͱෳ਺ͷഔମࣾΛଋͶͯ޿ࠂΛ഑৴͢ Δ࢓૊Έ ✴ ഔମࣾʹ͸ऩӹΛɺ޿ࠂओʹ͸ίϯόʔδϣϯΛ΋ ͨΒ͢ͷ͕࢓ࣄ 

  7. ΫϦοΫใुܕ޿ࠂ ✴ όφʔ޿ࠂͷใुܗଶͷҰͭ ✴ ޿ࠂόφʔͷ1ΫϦοΫຖʹɺഔମࣾʹ͸ऩӹ͕ɺ ޿ࠂओʹ͸ίετ͕ൃੜ͢ΔϞσϧ ✴ Google AdWordsʹొ৔ͨ͠ͷ͕2002೥ [1]

    ✴ PPC (Pay Per Click) or CPC (Cost Per Click)ͱུ͞Ε Δ 
  8. ࠓճͷొ৔ਓ෺  Ad Network ೖߘ ޿ࠂओ (Advertiser) ഔମࣾ (Publisher) ޿ࠂഔମ

    (Advertising medium) ޿ࠂ഑৴ ޿ࠂඅ ޿ࠂऩӹ ΦʔσΟΤϯε Click Click
  9. ༻ޠɾུޠิ଍ ✴ IMP (Impression) / Click / Conversion ✴ ޿ࠂͷදࣔ

    / ޿ࠂͷΫϦοΫ / ޿ࠂओࢦఆͷΞΫγϣϯ(੒Ռ)Λ ✴ CTR (Click through rate) ✴ ͋Δظؒʹ͓͚Δ Click਺/Impression਺ ͕Α͘࢖ΘΕΔ ✴ ࿮ (Frame) ✴ ޿ࠂ࿮ ✴ ޿ࠂഔମ (Advertising medium) / ഔମࣾ (Publisher) ✴ WebϝσΟΞ΍ɺϞόΠϧΞϓϦ ✴ WebϝσΟΞӡӦऀɺϞόΠϧΞϓϦ։ൃऀ ✴ Ad Fraud ✴ ΦϯϥΠϯ޿ࠂʹର͢Δ࠮ٗߦҝશൠ
  10. ෆਖ਼ΫϦοΫͷఆٛ ✴ Google AdWordsͷTypes of invalid traffic [2] ͔ΒҾ༻ ✴

    Accidental clicks that provide no value, such as the second click of a double-click ✴ Manual clicks intended to increase someone's advertising costs ✴ Manual clicks intended to increase profits for website owners hosting your ads ✴ Clicks and impressions by automated tools, robots, or other deceptive software
  11. Click Fraud ✴ ෆਖ਼ΫϦοΫͷதͰ΋ҙਤతͳ෺ ✴ ഔମ͕ࣾࣗ਎ͷརӹͷͨΊʹ ✴ Α͋͘Δ ✴ ޿ࠂओ͕ڝ߹ଞࣾͷ޿ࠂ༧ࢉΛ࡟ΔͨΊʹ

    ✴ ϦεςΟϯά޿ࠂ΁ͷ߈ܸ ✴ ޿ࠂओʹͱͬͯՁ஋ͷແ͍ΫϦοΫ ✴ ೔ຊͩͱʮϫϯΫϦοΫ࠮ٗʯͱ͍͏ผͷࣄ৅Λࢦ ͢୯ޠ͕ઌʹීٴͨͨ͠ΊɺฆΒΘ͍͠ 
  12. Կ͕໰୊͔ ✴ Ձ஋ͷແ͍ΫϦοΫʹ޿ࠂओ͕ίετΛࢧ෷͏ ✴ ؒ઀తʹෆਖ਼Λ͍ͯ͠ͳ͍ଆશһ͕ଛ֐ΛඃΔ ✴ ΦϯϥΠϯ޿ࠂʹ͓͚Δ36%͕Click Fraudͱ΋ [4] ✴

    ΦϯϥΠϯϚʔέςΟϯάʹର͢Δ৴༻ͷᆝଛ 
  13. 2. Click Fraudͷख๏ ✴ Click Fraudͷख๏ ✴ ݕग़͢ΔͷʹԿ͕೉͍͠ͷ͔ ✴ ๷ޚଆͷΞΫγϣϯ͸Ͳ͏͢΂͖͔

    
  14. ॳظͷClick Fraud ✴ 2004೥ɺΠϯυͷΫϦοΫ৬ਓͷΠϯλϏϡʔه ࣄ͕ THE TIMES OF INDIA ʹܝࡌ͞ΕΔ

    [5]
 "It's boring, but it is extra money for a couple of hours of clicking weblinks every day," ✴ 2004೥ɺGoogle͕12ਓମ੍ͰClick FraudΛߦͳͬ ͍ͯͨഔମࣾΛૌ͑ͯউૌ [6] 
  15. None
  16. Operation Ghost Click ✴ Ϛϧ΢ΣΞΛ4೥ؒӡ༻ͯ͠400ສϢʔβʔʹײ છɺ1400ສυϧΛෆਖ਼ʹՔ͍ͩࣄྫ [8] ✴ DNS ChangerͱClick

    HighjackingͰࣗવͳ޿ࠂΫ ϦοΫΛ৐ͬऔΔ 
  17. Click Fraudͷख๏ [3] ✴ ݸਓͷख࡞ۀʹΑΔஆ͔Έͷ͋ΔΫϦοΫ ✴ Ϋϥ΢υιʔγϯά ✴ ΫϦοΫϑΝʔϜ ✴

    ΫϦοΫBOT ✴ Ϛϧ΢ΣΞ (Botnet) ✴ ͦͷଞ ✴ → ෆਖ਼ΫϦοΫ ୅ߦ Ͱࠓ͙͢ݕࡧ 
  18.  http://ever-click.com

  19. ΫϦοΫbotͷػೳ 

  20. ਓ͕ؒૢ࡞͍ͯ͠Δέʔε 

  21. Ͳ͏͍͔ͨ͠ ✴ ݱ࣮ͷγεςϜͰ͸ෆਖ਼ΫϦοΫΛ·ͱΊͯແޮԽ͍ͨ͠ ✴ ҙਤͤͣߦͳͬͨॏෳΫϦοΫ (2ͭ໨͸ແޮ) ✴ ༠ൃ͞ΕͨޡΫϦοΫɺڧ੍ભҠ ✴ WebαΠτӡӦऀ͕ҙਤతʹ܁Γฦ͠ߦͳͬͨ෺

    ✴ ΫϦοΫ୯ҐͰ൑ఆͰ͖Ε͹ɺͦͷֹ͚ͩࢧ෷͍ΛࢭΊͯ޿ࠂ ओʹฦ͢ࣄ͕Ͱ͖Δ ✴ Publisher୯ҐͰ൑ఆͨ͠৔߹͸ → ༷ʑ ✴ ਓ͕൑அΛԼ͢ͷͰԿ͕·͍ͣͷ͔ཧ༝͕ཉ͍͠ ✴ ֐͕ແ͚Ε͹ੜ͖ͨڭࢣσʔλͱͯ͠ଘଓ͍͖ͯͨͩͨ͘͠ 
  22. ೉͠͞ ✴ ๷ޚଆͷରԠ͕߈ܸऀʹ͹ΕΔͱΠλνͬ͜͝ ✴ Ԡ౴࣌ʹؾ͍͍ͮͯͳ͍ϑϦΛ͠ͳ͚Ε͹ͳΒͳ͍ ✴ ྫ:botʹ͸޿ࠂΛදࣔ͠ͳ͍ → τϥΠΞϯυΤ ϥʔͷػձΛ༩͑ͯ͠·͏ͨΊNG

    ✴ ঢ়گূڌͰ൑அ͢Δ͔͠ͳ͍ ✴ ڭࢣσʔλ࡞Γʹ͍͘໰୊ ✴ ޿ࠂओ/ഔମࣾʹݟͤΔϨϙʔτͷ਺ࣈ͕֬ఆ͢Δલʹ ແޮԽॲཧΛ͢Δඞཁ͕͋Δ(༛༧͸1࣌ؒ) 
  23. ೉͠͞ (ଓ͖ ✴ HTTP͸γϯϓϧͰεςʔτϨεͳςΩετϓϩτ ίϧ ✴ UserAgent΍cookieͷ಺༰͸؆୯ʹِ૷Ͱ͖Δ ✴ IPΞυϨε͸͍͘ΒͰ΋੾Γସ͕͑ޮ͘ ✴

    Android͸ԿͰ΋Ͱ͖Δ ✴ ϓϩάϥϜͰtouchΠϕϯτΛੜ੒Ͱ͖Δ [9] ✴ Android IDͷॻ͖׵͑ 
  24. ख๏͸༷ʑ ✴ ૬ख͸ݟ͑ͳ͍͕ɺϞνϕʔγϣϯ͸൑͍ͬͯΔ ✴ ଎͘ɺָʹɺͨ͘͞ΜՔ͍͗ͨ ✴ ഑৴ۀऀʹݟ͔ͭΓͨ͘͸ແ͍ ✴ ঢ়گূڌͱϞνϕʔγϣϯ͔ΒԾઆ͸ཱͯΒΕΔ 

  25. 3. Click Fraudݕग़ͷ࿦จ঺հ ✴ ྺ࢙ ✴ Microsoftͷख๏ ✴ Googleͷख๏ ✴

    ػցֶशϞσϧͱૉੑ ✴ ͦͷଞ 
  26. ରClick Fraudݚڀͷྺ࢙ ✴ 2005೥Microsoft Research͔Βݚڀ࿦จ͕ެ։ ✴ Click Fraud Resistant Methods

    for Learning Click- Through Rates [7] ✴ ػցֶशΞϧΰϦζϜͰ͸Πϯυਓ࿑ಇऀʹΑΔΫ ϦοΫΛࣝผ͢Δͷ͸࣮࣭ෆՄೳɺͱ͋Δ ✴ ϙϫιϯ෼෍ΛԾఆͯ͠ෆਖ਼ΫϦοΫΛআ֎ͨ͠ CTRΛ༧ଌ͢Δ ✴ 2000೥୅Ͱ͸ػցֶश͸ར༻͞Εͣ 
  27. A. Tuzhilin. The Lane’s Gifts v. Google Report (2006) ✴

    2002ʙ2006೥ͷGoogleͷෆਖ਼ΫϦοΫݕग़ख๏ ͷධՁϨϙʔτ ✴ ओʹϧʔϧϕʔεͱҟৗݕ஌ϕʔε ✴ PublisherͷΞΧ΢ϯτΛఀࢭ͢ΔϑϩʔͳͲɺӡ ༻ͷ࿩͕๛෋ ✴ ۩ମతͳݕग़ϧʔϧ͸ॻ͍ͯͳ͍ ✴ Google͸ͦͷޙɺspider.io౳Λങऩ͍ͯ͠Δ 
  28. Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at

    Microsoft (2015) ✴ Microsoftࣾʹ͓͚Δෆਖ਼ΫϦοΫϑΟϧλγεςϜͷ มભ ✴ ػցֶशͩͱνϡʔχϯά͕Ͱ͖ͳ͘ͳΔࣄ͕Θ͔ͬ ͍ͯͨͷͰɺ͋͑ͯϧʔϧϕʔεʹͨ͠ ✴ Ϟσϧͷૉੑ͕૿͑Δͱɺ໰୊͕ൃੜͨ࣌͠ʹ໰୊ͱ weightͷ੾Γ෼͚͕೉͘͠ͳΔ ✴ େن໛ͳෆਖ਼ݕग़γεςϜͩͱϧʔϧϕʔεͷํ༗ར ͳ఺͕͋Δͱ൑அ 
  29. ✴ ෳ਺ͷϧʔϧͷ૊Έ߹ΘͤͰΫϦοΫΛධՁ ✴ ܾఆ໦΋ར༻ ✴ ϧʔϧϕʔεͷϝϦοτ ✴ Ͳͷϧʔϧ͕ൃಈ͔ͨ͠શͯϩάʹ࢒ͤΔ ✴ ਖ਼͍͠ϧʔϧ͸ͲΕ͔ɺޡͬͨϧʔϧ͸ͲΕ͔

    ✴ ॳظஈ֊Ͱ͸ෆਖ਼൑ఆͨ͠ϨίʔυΛࣺ͍ͯͯͨ ✴ ͜Ε͸ࣦഊͩͬͨ ✴ ൑ఆγεςϜͷΞοϓσʔτͷӨڹ͕ଌΕͳ͍ 
  30.  ӡ༻νʔϜ ͕͍Δ ո͍͠8FCαΠτʹ͸ ΫϩʔϥʔΛ์ͭ ΫϦοΫͷ ධՁ

  31. Fraud Detection in Mobile Advertising (FDMA) 2012 Competition ✴ ΫϦοΫϩά͔Βෆਖ਼ΫϦοΫΛߦͳ͍ͬͯΔ

    PublisherΛ൑ผ͢Δίϯϖͷ্Ґਞͷख๏ͷղઆ ✴ ༏উνʔϜͷૉੑ ✴ ओʹ౷ܭྔΛ࢖༻ (Χ΢ϯτɺඪ४ภࠩ౳) ✴ Chao-Shen EntropyͰ܁Γฦ͠ΫϦοΫΛධՁ ✴ ༷ʑͳ࣠ɾظؒͰͷΫϦοΫ਺ͷूܭ ✴ generalized boosted regression model (GBM) ✴ Random ForestΑΓ΋GBMͷੑೳ͕ग़ͨ 
  32. 

  33. (ଓ͖)Ϟσϧͷૉੑ ✴ IPAddressͷࠃ͕Πϯυ ✴ IPAddressͷࠃ͕γϯΨϙʔϧ ✴ IPAddressຖͷΫϦοΫͷΧ΢ϯτ ✴ ϦϑΝϥ -

    UA - IPAddress .. ͷΫϦοΫΧ΢ϯτ ✴ ໷ؒͷϦϑΝϥຖͷΫϦοΫΧ΢ϯτ ✴ 1࣌ؒຖͷΫϦοΫΧ΢ϯτͷඪ४ภࠩͷখ͞͞ 
  34. 

  35. ͦͷଞͷख๏ ✴ Blaff Ads ✴ ِͷ޿ࠂ/ਓؒʹ͸ݟ͑ͳ͍޿ࠂΛ഑৴ͯ͠Ϋ ϦοΫ͞ΕͨΒbotͱ൑ఆ͢Δ [12] ✴ Ϛ΢εΧʔιϧͷي੻ʹΑΔbot൑ఆ

    ✴ εϚʔτϑΥϯͷ৔߹͸εΫϩʔϧ ✴ ڵຯ൑ఆʹ΋࢖ΘΕ͍ͯΔ 
  36. 4. ࣮૷ͷ࿩ ✴ ϧʔϧϕʔεɺҟৗݕ஌ϕʔεɺϞσϧϕʔε ✴ ࣮૷ίετͱϝϯςφϯε 

  37. ͙͢ʹ࣮૷Ͱ͖Δͷ͸ ✴ ϧʔϧͷ૊Έ߹Θͤ ✴ આ໌͠΍͍͢ɺΫϦοΫ୯Ґͷݕग़͕༰қ ✴ SQLͰॻ͚Δ ✴ ҟৗݕ஌ϕʔε ✴

    SQLͰॻ͚ͨΓ͢Δɻwindowؔ਺Λ࢖͑͹
 Τϯτϩϐʔ΍KLμΠόʔδΣϯεɺZݕఆ΋͍͚Δɻ ✴ Ϟσϧϕʔε(ػցֶश) ✴ ڭࢣσʔλΛཷΊΔͷʹ͕͔͔࣌ؒΔ…… 
  38. ࣮૷ίετ ✴ SQL͕౤͛ΒΕΔσʔλετΞɺBigQuery΍ Amazon Red-ShiftɺHadoopΫϥελ͕͋ΔͱḿΔ ✴ ΫΤϦͷεέδϡʔϧ࣮ߦ͸re:dashͰSlackʹ௨஌ ✴ 2࣌ؒͰݕূՄೳͳॴ·Ͱ͍͚Δ 

  39. ҟৗݕ஌Ξϓϩʔν ✴ ਖ਼ৗ࣌ͷσʔλ͔ΒϞσϧ͕࡞ΕΔ ✴ ҟৗ౓ = ✴ ΫϦοΫ਺΍CTRͷόʔετ ✴ ࣌ܥྻσʔλͷҟৗݕ஌

    ✴ ޿ࠂ഑৴ͷϩάͩͱपظੑ͕͋Δࣄʹ஫ҙ ✴ ΫϦοΫϘοτʹΑͬͯ͸CTRͷόʔετݕ஌Λආ͚Δ ͨΊʹɺClickͷ100ഒఔ౓ͷIMP΋ಉ࣌ʹੜ੒ͯ͘͠Δ 
  40. पظੑͷ͋Δ࣌ܥྻσʔλ ͷҟৗݕ஌ ✴ Introducing practical and robust anomaly detection in

    a time series [14] ✴ ϝτϦΫε͕গͳ͔ͬͨΒ֎෦αʔϏεͷར༻΋ݕ౼Ͱ͖Δ 
  41. ຊདྷ౳͘͠ͳΔ͸ͣͷ ೋͭͷ֬཰෼෍ؒͷڑ཭  ✴ ྫ: IMP਺ͱClick਺ɺ௨ৗ͸2ऀͷؒʹ૬͕ؔ͋ΔɻIMP͸࣌ؒप ظ͕͋ΔͨΊૣே͸྆ํͱ΋গͳ͍͠ɺனٳΈʹ͸྆ํ૿͑Δ ✴ ͦ͏ͳ͍ͬͯͳ͍޿ࠂ࿮͸……?? ✴

    ࣌ؒଳʹؔ܎ͳ͘ৗʹҰఆͷΫϦοΫ͕ੜ͍ͯ͡Δ ✴ IMP਺ͷਪҠͱ͸ؔ܎ͳ͘ΫϦοΫ͕όʔετ͢Δ ✴ લऀ͸botɺޙऀ͸ਓ͕ؒ΍ΕΔ࣌ʹΫϦοΫͯ͠Δ ✴ ܭࢉ ✴ 1࣍ݩσʔλͷ৔߹͸มಈ܎਺ͱϐΞιϯ૬ؔ܎਺ ✴ 2࣍ݩҎ্
  42. Graph-Based Anomaly Detection [13] ✴ ڞ༗͞Ε͍ͯΔϦιʔε (IPΞυϨεɺUserAgentͷηοτ) ͔Βෆਖ਼ۀऀΛḷΔ ✴ bot͔͠དྷͯͳ͍WebαΠτͷϩά͸ٯʹϊΠζ͕গͳ͍ͷͰར༻Ͱ͖Δ

    
  43. ػցֶश͸ۜͷ஄ؙ͔ ✴ ϧʔϧϕʔεΛ૊߹ͤͨΑ͏ʹɺ֤έʔεͷ൑ఆʹಛ Խͨ͠ऑֶशثΛ·ͱΊͨ෺͸ӡ༻Ͱ͖Δ͔΋͠Εͳ ͍ ✴ FDMA 2012ͷ༏উϞσϧ ✴ ߈ܸଆ΋ਐԽ͖͓ͯͯ͠ΓɺClick

    Fraudରࡦͷܾఆଧ ͸ݟ͔͍ͭͬͯͳ͍ ✴ DeepLearningͰෆਖ਼ݕग़͠·͢ɺͱ͍͏ϓϨεϦϦʔ ε͸ग़͖ͯͯ΋಺༰͸֎ʹग़ͯ͜ͳ͍ → ֤͕ࣗΜ͹Δ 
  44. ࢀߟจݙ (1) 1. A. Tuzhilin. The lane’s gifts v. Google

    report (2006) 2. About invalid traffic - AdWords Help
 https://support.google.com/adwords/answer/2549113?hl=en 3. Alrwais, S.A., Dun, C.W., Gupta, M., Gerber, A., Spatscheck, O., Osterweil, E.: Dissecting ghost clicks: Ad fraud via misdirected human clicks (2012) 4. What is click fraud and how can you prevent it?
 http://memeburn.com/2015/06/what-is-click-fraud-and-how-can-you-prevent-it/ 5. India's secret army of online ad 'clickers'
 http://timesofindia.indiatimes.com/business/india-business/Indias-secret-army-of-online-ad- clickers/articleshow/654822.cms 6. https://web.archive.org/web/20090212140101/http://webpronews.com/topnews/2005/07/05/ google-wins-clickfraud-case-vs-auction-experts 7. N. Immorlica, K. Jain, M. Mahdian, and K. Talwar. Click fraud resistant methods for learning click- through rates. In Internet and Network Economics, pages 34–45 (2005) 
  45. ࢀߟจݙ (2) 8. The Truth About Online Ad Fraud
 https://www.exchangewire.com/blog/2014/05/29/the-truth-about-online-ad-fraud/

    9. Geumhwan Cho, Junsung Cho, Youngbae Song, DonghyunChoi and Hyoungshick Kim. Combating online fraud attacks in mobile-based advertising (2016) 10. B Kitts, JY Zhang, G Wu, W Brandi, J Beasley, K Morrill, J Ettedgui, S Siddhartha, H Yuan, F Gao, et al., Click fraud detection: adversarial pattern recognition over 5 years at Microsoft. Real World Data Min. Appl. 17(1), 181–201 (2015) 11. R Oentaryo, E-P Lim, M Finegold, D Lo, F Zhu, C Phua, E-Y Cheu, G-E Yap, K Sim, MN Nguyen, et al., Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99– 140 (2014) 12. Haddadi, H. Fighting online click-fraud using bluff ads. In SIGCOMM Computer Communications Review (CCR) (2010). 13. White Paper, Fraud Detection: Discovering Connections with Graph Databases Gorka Sadowksi & Philip Rathle (2015) 14. https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series