Upgrade to Pro — share decks privately, control downloads, hide ads and more …

オンライン広告における不正クリック検出手法と歴史

 オンライン広告における不正クリック検出手法と歴史

2016-09-03
データマイニング+WEB東京での発表資料です

Takashi Nishibayashi

September 03, 2016
Tweet

More Decks by Takashi Nishibayashi

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ ID: hagino3000 Name: ੢ྛ ޹ (Takashi Nishibayashi) Job: Software

    Engineer ݱࡏ͸ΞυωοτϫʔΫࣄۀऀʹͯ഑৴ޮ཰ͷ࠷ దԽʹैࣄ (ೖࡳՁ֨ௐ੔ϩδοΫɾ޿ࠂબ୒ϩ δοΫͷઃܭ͔Β࣮૷·Ͱ) 
  2. ൃදͷ಺༰ ✴ ࿩͢͜ͱ ✴ Click Fraudͱ͸Կ͔ ✴ Click Fraudݕग़ख๏ͷྺ࢙ ✴

    ϧʔϧϕʔεͱҟৗݕ஌Ξϓϩʔν ✴ ࿩ͤͳ͍ࣄ ✴ ฐࣾͷ࣮σʔλɺ۩ମతͳݕग़ϧʔϧ ✴ ԿނClick Fraudͳͷ͔ ✴ Ad FraudͷதͰ΋Click Fraud͕CPCϞσϧͷσΟεϓϨΠ޿ ࠂΛѻ͍ͬͯΔൃදऀʹͱͬͯ࠷΋਎ۙͳͨΊ 
  3. 1. ໰୊ͷഎܠઆ໌ ✴ Ad Networkͱ͸ ✴ ΫϦοΫใुܕ޿ࠂ ✴ ༻ޠ ✴

    ෆਖ਼ΫϦοΫͱClick Fraud ✴ Click FraudͷԿ͕໰୊ͳͷ͔ 
  4. ࠓճͷొ৔ਓ෺  Ad Network ೖߘ ޿ࠂओ (Advertiser) ഔମࣾ (Publisher) ޿ࠂഔମ

    (Advertising medium) ޿ࠂ഑৴ ޿ࠂඅ ޿ࠂऩӹ ΦʔσΟΤϯε Click Click
  5. ༻ޠɾུޠิ଍ ✴ IMP (Impression) / Click / Conversion ✴ ޿ࠂͷදࣔ

    / ޿ࠂͷΫϦοΫ / ޿ࠂओࢦఆͷΞΫγϣϯ(੒Ռ)Λ ✴ CTR (Click through rate) ✴ ͋Δظؒʹ͓͚Δ Click਺/Impression਺ ͕Α͘࢖ΘΕΔ ✴ ࿮ (Frame) ✴ ޿ࠂ࿮ ✴ ޿ࠂഔମ (Advertising medium) / ഔମࣾ (Publisher) ✴ WebϝσΟΞ΍ɺϞόΠϧΞϓϦ ✴ WebϝσΟΞӡӦऀɺϞόΠϧΞϓϦ։ൃऀ ✴ Ad Fraud ✴ ΦϯϥΠϯ޿ࠂʹର͢Δ࠮ٗߦҝશൠ
  6. ෆਖ਼ΫϦοΫͷఆٛ ✴ Google AdWordsͷTypes of invalid traffic [2] ͔ΒҾ༻ ✴

    Accidental clicks that provide no value, such as the second click of a double-click ✴ Manual clicks intended to increase someone's advertising costs ✴ Manual clicks intended to increase profits for website owners hosting your ads ✴ Clicks and impressions by automated tools, robots, or other deceptive software
  7. Click Fraud ✴ ෆਖ਼ΫϦοΫͷதͰ΋ҙਤతͳ෺ ✴ ഔମ͕ࣾࣗ਎ͷརӹͷͨΊʹ ✴ Α͋͘Δ ✴ ޿ࠂओ͕ڝ߹ଞࣾͷ޿ࠂ༧ࢉΛ࡟ΔͨΊʹ

    ✴ ϦεςΟϯά޿ࠂ΁ͷ߈ܸ ✴ ޿ࠂओʹͱͬͯՁ஋ͷແ͍ΫϦοΫ ✴ ೔ຊͩͱʮϫϯΫϦοΫ࠮ٗʯͱ͍͏ผͷࣄ৅Λࢦ ͢୯ޠ͕ઌʹීٴͨͨ͠ΊɺฆΒΘ͍͠ 
  8. ॳظͷClick Fraud ✴ 2004೥ɺΠϯυͷΫϦοΫ৬ਓͷΠϯλϏϡʔه ࣄ͕ THE TIMES OF INDIA ʹܝࡌ͞ΕΔ

    [5]
 "It's boring, but it is extra money for a couple of hours of clicking weblinks every day," ✴ 2004೥ɺGoogle͕12ਓମ੍ͰClick FraudΛߦͳͬ ͍ͯͨഔମࣾΛૌ͑ͯউૌ [6] 
  9. Click Fraudͷख๏ [3] ✴ ݸਓͷख࡞ۀʹΑΔஆ͔Έͷ͋ΔΫϦοΫ ✴ Ϋϥ΢υιʔγϯά ✴ ΫϦοΫϑΝʔϜ ✴

    ΫϦοΫBOT ✴ Ϛϧ΢ΣΞ (Botnet) ✴ ͦͷଞ ✴ → ෆਖ਼ΫϦοΫ ୅ߦ Ͱࠓ͙͢ݕࡧ 
  10. Ͳ͏͍͔ͨ͠ ✴ ݱ࣮ͷγεςϜͰ͸ෆਖ਼ΫϦοΫΛ·ͱΊͯແޮԽ͍ͨ͠ ✴ ҙਤͤͣߦͳͬͨॏෳΫϦοΫ (2ͭ໨͸ແޮ) ✴ ༠ൃ͞ΕͨޡΫϦοΫɺڧ੍ભҠ ✴ WebαΠτӡӦऀ͕ҙਤతʹ܁Γฦ͠ߦͳͬͨ෺

    ✴ ΫϦοΫ୯ҐͰ൑ఆͰ͖Ε͹ɺͦͷֹ͚ͩࢧ෷͍ΛࢭΊͯ޿ࠂ ओʹฦ͢ࣄ͕Ͱ͖Δ ✴ Publisher୯ҐͰ൑ఆͨ͠৔߹͸ → ༷ʑ ✴ ਓ͕൑அΛԼ͢ͷͰԿ͕·͍ͣͷ͔ཧ༝͕ཉ͍͠ ✴ ֐͕ແ͚Ε͹ੜ͖ͨڭࢣσʔλͱͯ͠ଘଓ͍͖ͯͨͩͨ͘͠ 
  11. ೉͠͞ ✴ ๷ޚଆͷରԠ͕߈ܸऀʹ͹ΕΔͱΠλνͬ͜͝ ✴ Ԡ౴࣌ʹؾ͍͍ͮͯͳ͍ϑϦΛ͠ͳ͚Ε͹ͳΒͳ͍ ✴ ྫ:botʹ͸޿ࠂΛදࣔ͠ͳ͍ → τϥΠΞϯυΤ ϥʔͷػձΛ༩͑ͯ͠·͏ͨΊNG

    ✴ ঢ়گূڌͰ൑அ͢Δ͔͠ͳ͍ ✴ ڭࢣσʔλ࡞Γʹ͍͘໰୊ ✴ ޿ࠂओ/ഔମࣾʹݟͤΔϨϙʔτͷ਺ࣈ͕֬ఆ͢Δલʹ ແޮԽॲཧΛ͢Δඞཁ͕͋Δ(༛༧͸1࣌ؒ) 
  12. ରClick Fraudݚڀͷྺ࢙ ✴ 2005೥Microsoft Research͔Βݚڀ࿦จ͕ެ։ ✴ Click Fraud Resistant Methods

    for Learning Click- Through Rates [7] ✴ ػցֶशΞϧΰϦζϜͰ͸Πϯυਓ࿑ಇऀʹΑΔΫ ϦοΫΛࣝผ͢Δͷ͸࣮࣭ෆՄೳɺͱ͋Δ ✴ ϙϫιϯ෼෍ΛԾఆͯ͠ෆਖ਼ΫϦοΫΛআ֎ͨ͠ CTRΛ༧ଌ͢Δ ✴ 2000೥୅Ͱ͸ػցֶश͸ར༻͞Εͣ 
  13. A. Tuzhilin. The Lane’s Gifts v. Google Report (2006) ✴

    2002ʙ2006೥ͷGoogleͷෆਖ਼ΫϦοΫݕग़ख๏ ͷධՁϨϙʔτ ✴ ओʹϧʔϧϕʔεͱҟৗݕ஌ϕʔε ✴ PublisherͷΞΧ΢ϯτΛఀࢭ͢ΔϑϩʔͳͲɺӡ ༻ͷ࿩͕๛෋ ✴ ۩ମతͳݕग़ϧʔϧ͸ॻ͍ͯͳ͍ ✴ Google͸ͦͷޙɺspider.io౳Λങऩ͍ͯ͠Δ 
  14. Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at

    Microsoft (2015) ✴ Microsoftࣾʹ͓͚Δෆਖ਼ΫϦοΫϑΟϧλγεςϜͷ มભ ✴ ػցֶशͩͱνϡʔχϯά͕Ͱ͖ͳ͘ͳΔࣄ͕Θ͔ͬ ͍ͯͨͷͰɺ͋͑ͯϧʔϧϕʔεʹͨ͠ ✴ Ϟσϧͷૉੑ͕૿͑Δͱɺ໰୊͕ൃੜͨ࣌͠ʹ໰୊ͱ weightͷ੾Γ෼͚͕೉͘͠ͳΔ ✴ େن໛ͳෆਖ਼ݕग़γεςϜͩͱϧʔϧϕʔεͷํ༗ར ͳ఺͕͋Δͱ൑அ 
  15. ✴ ෳ਺ͷϧʔϧͷ૊Έ߹ΘͤͰΫϦοΫΛධՁ ✴ ܾఆ໦΋ར༻ ✴ ϧʔϧϕʔεͷϝϦοτ ✴ Ͳͷϧʔϧ͕ൃಈ͔ͨ͠શͯϩάʹ࢒ͤΔ ✴ ਖ਼͍͠ϧʔϧ͸ͲΕ͔ɺޡͬͨϧʔϧ͸ͲΕ͔

    ✴ ॳظஈ֊Ͱ͸ෆਖ਼൑ఆͨ͠ϨίʔυΛࣺ͍ͯͯͨ ✴ ͜Ε͸ࣦഊͩͬͨ ✴ ൑ఆγεςϜͷΞοϓσʔτͷӨڹ͕ଌΕͳ͍ 
  16. Fraud Detection in Mobile Advertising (FDMA) 2012 Competition ✴ ΫϦοΫϩά͔Βෆਖ਼ΫϦοΫΛߦͳ͍ͬͯΔ

    PublisherΛ൑ผ͢Δίϯϖͷ্Ґਞͷख๏ͷղઆ ✴ ༏উνʔϜͷૉੑ ✴ ओʹ౷ܭྔΛ࢖༻ (Χ΢ϯτɺඪ४ภࠩ౳) ✴ Chao-Shen EntropyͰ܁Γฦ͠ΫϦοΫΛධՁ ✴ ༷ʑͳ࣠ɾظؒͰͷΫϦοΫ਺ͷूܭ ✴ generalized boosted regression model (GBM) ✴ Random ForestΑΓ΋GBMͷੑೳ͕ग़ͨ 
  17. 

  18. (ଓ͖)Ϟσϧͷૉੑ ✴ IPAddressͷࠃ͕Πϯυ ✴ IPAddressͷࠃ͕γϯΨϙʔϧ ✴ IPAddressຖͷΫϦοΫͷΧ΢ϯτ ✴ ϦϑΝϥ -

    UA - IPAddress .. ͷΫϦοΫΧ΢ϯτ ✴ ໷ؒͷϦϑΝϥຖͷΫϦοΫΧ΢ϯτ ✴ 1࣌ؒຖͷΫϦοΫΧ΢ϯτͷඪ४ภࠩͷখ͞͞ 
  19. 

  20. ͙͢ʹ࣮૷Ͱ͖Δͷ͸ ✴ ϧʔϧͷ૊Έ߹Θͤ ✴ આ໌͠΍͍͢ɺΫϦοΫ୯Ґͷݕग़͕༰қ ✴ SQLͰॻ͚Δ ✴ ҟৗݕ஌ϕʔε ✴

    SQLͰॻ͚ͨΓ͢Δɻwindowؔ਺Λ࢖͑͹
 Τϯτϩϐʔ΍KLμΠόʔδΣϯεɺZݕఆ΋͍͚Δɻ ✴ Ϟσϧϕʔε(ػցֶश) ✴ ڭࢣσʔλΛཷΊΔͷʹ͕͔͔࣌ؒΔ…… 
  21. ҟৗݕ஌Ξϓϩʔν ✴ ਖ਼ৗ࣌ͷσʔλ͔ΒϞσϧ͕࡞ΕΔ ✴ ҟৗ౓ = ✴ ΫϦοΫ਺΍CTRͷόʔετ ✴ ࣌ܥྻσʔλͷҟৗݕ஌

    ✴ ޿ࠂ഑৴ͷϩάͩͱपظੑ͕͋Δࣄʹ஫ҙ ✴ ΫϦοΫϘοτʹΑͬͯ͸CTRͷόʔετݕ஌Λආ͚Δ ͨΊʹɺClickͷ100ഒఔ౓ͷIMP΋ಉ࣌ʹੜ੒ͯ͘͠Δ 
  22. पظੑͷ͋Δ࣌ܥྻσʔλ ͷҟৗݕ஌ ✴ Introducing practical and robust anomaly detection in

    a time series [14] ✴ ϝτϦΫε͕গͳ͔ͬͨΒ֎෦αʔϏεͷར༻΋ݕ౼Ͱ͖Δ 
  23. ຊདྷ౳͘͠ͳΔ͸ͣͷ ೋͭͷ֬཰෼෍ؒͷڑ཭  ✴ ྫ: IMP਺ͱClick਺ɺ௨ৗ͸2ऀͷؒʹ૬͕ؔ͋ΔɻIMP͸࣌ؒप ظ͕͋ΔͨΊૣே͸྆ํͱ΋গͳ͍͠ɺனٳΈʹ͸྆ํ૿͑Δ ✴ ͦ͏ͳ͍ͬͯͳ͍޿ࠂ࿮͸……?? ✴

    ࣌ؒଳʹؔ܎ͳ͘ৗʹҰఆͷΫϦοΫ͕ੜ͍ͯ͡Δ ✴ IMP਺ͷਪҠͱ͸ؔ܎ͳ͘ΫϦοΫ͕όʔετ͢Δ ✴ લऀ͸botɺޙऀ͸ਓ͕ؒ΍ΕΔ࣌ʹΫϦοΫͯ͠Δ ✴ ܭࢉ ✴ 1࣍ݩσʔλͷ৔߹͸มಈ܎਺ͱϐΞιϯ૬ؔ܎਺ ✴ 2࣍ݩҎ্
  24. ػցֶश͸ۜͷ஄ؙ͔ ✴ ϧʔϧϕʔεΛ૊߹ͤͨΑ͏ʹɺ֤έʔεͷ൑ఆʹಛ Խͨ͠ऑֶशثΛ·ͱΊͨ෺͸ӡ༻Ͱ͖Δ͔΋͠Εͳ ͍ ✴ FDMA 2012ͷ༏উϞσϧ ✴ ߈ܸଆ΋ਐԽ͖͓ͯͯ͠ΓɺClick

    Fraudରࡦͷܾఆଧ ͸ݟ͔͍ͭͬͯͳ͍ ✴ DeepLearningͰෆਖ਼ݕग़͠·͢ɺͱ͍͏ϓϨεϦϦʔ ε͸ग़͖ͯͯ΋಺༰͸֎ʹग़ͯ͜ͳ͍ → ֤͕ࣗΜ͹Δ 
  25. ࢀߟจݙ (1) 1. A. Tuzhilin. The lane’s gifts v. Google

    report (2006) 2. About invalid traffic - AdWords Help
 https://support.google.com/adwords/answer/2549113?hl=en 3. Alrwais, S.A., Dun, C.W., Gupta, M., Gerber, A., Spatscheck, O., Osterweil, E.: Dissecting ghost clicks: Ad fraud via misdirected human clicks (2012) 4. What is click fraud and how can you prevent it?
 http://memeburn.com/2015/06/what-is-click-fraud-and-how-can-you-prevent-it/ 5. India's secret army of online ad 'clickers'
 http://timesofindia.indiatimes.com/business/india-business/Indias-secret-army-of-online-ad- clickers/articleshow/654822.cms 6. https://web.archive.org/web/20090212140101/http://webpronews.com/topnews/2005/07/05/ google-wins-clickfraud-case-vs-auction-experts 7. N. Immorlica, K. Jain, M. Mahdian, and K. Talwar. Click fraud resistant methods for learning click- through rates. In Internet and Network Economics, pages 34–45 (2005) 
  26. ࢀߟจݙ (2) 8. The Truth About Online Ad Fraud
 https://www.exchangewire.com/blog/2014/05/29/the-truth-about-online-ad-fraud/

    9. Geumhwan Cho, Junsung Cho, Youngbae Song, DonghyunChoi and Hyoungshick Kim. Combating online fraud attacks in mobile-based advertising (2016) 10. B Kitts, JY Zhang, G Wu, W Brandi, J Beasley, K Morrill, J Ettedgui, S Siddhartha, H Yuan, F Gao, et al., Click fraud detection: adversarial pattern recognition over 5 years at Microsoft. Real World Data Min. Appl. 17(1), 181–201 (2015) 11. R Oentaryo, E-P Lim, M Finegold, D Lo, F Zhu, C Phua, E-Y Cheu, G-E Yap, K Sim, MN Nguyen, et al., Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15(1), 99– 140 (2014) 12. Haddadi, H. Fighting online click-fraud using bluff ads. In SIGCOMM Computer Communications Review (CCR) (2010). 13. White Paper, Fraud Detection: Discovering Connections with Graph Databases Gorka Sadowksi & Philip Rathle (2015) 14. https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series