jujudubai
June 13, 2014
850

# Estimating Conversion Rate in Display Advertising from Past Performance Data

スパース性が高いデータに対する推定方法や補正方法等、実務で参考になる点が多いです。もちろん研究にも。

June 13, 2014

## Transcript

JUJU
2. ### Agenda 1. Introduction 2. Issues 1. Problem Setup & Formulation

2. Data Hierarchies 3. Conversion Rate Estimation 1. Past Performance at Different Hierarchical Levels 2. Combining Estimators using Logistic Regression 4. Practical Issues (Propose) 1. Data Imbalance 2. Output Calibration 3. Missing Features 4. Feature Selection 5. Results & Discussion 1. Data Imbalance & Score Calibration 2. Missing Value Imputation 3. “Baseline Estimators” vs “Logistic Regression” 6. Conclusion
3. ### Introduction 1.ɹ޿ࠂग़ߘͷೖࡳΛߦ͏DSPͷ͓࿩  ɹ޿ࠂΛදࣔ͢ΔWebαΠτɺϢʔβʔ͔Β࠷దԽͳ”bidprice”Λਪఆ͍ͨ͠  ɹ㱺ɹCVRΛਖ਼֬ʹ༧ଌ͢Δඞཁੑ 2.ɹ՝୊  ɹ[1]ɹCVR͸ۃΊͯখ͘͞ɺ෼ੳʹे෼ͳ਺஋͕ಘΒΕͳ͍  -ɹCVR = 0.0001 ~

0.1%  ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ㱺ɹ֊૚Խɺ”implicit”ͳΫϥελϦϯάͰղܾ  ɹ[2] ɹRTBͰ͸ɺೖࡳ·Ͱͷ͕࣌ؒ5 ~ 10ms͔͠ͳ͍  -ɹ࣌ؒత੍໿͕ڧ͍ͨΊɺܭࢉྔ͕গͳ͔ͭ͘ਖ਼֬ͳଌఆ๏͕ඞཁ  ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ㱺ɹɹLogistic RegressionͰղܾ
4. ### Issues:  §2.2 Problem Setup & Formulation 1.ɹม਺ͷઆ໌ 2.ɹ࠷దͳ޿ࠂͷબ୒  3.ɹϩδεςΟοΫճؼʹΑΔCVRͷ༧ଌ •

ɹCVΛ֬཰࿦తʹଊ͑Δ    4.ɹ࠷దͳ޿ࠂͷબ୒ʹ͓͚Δ໰୊ઃఆ Yijk ⇠ Bernoulli (pijk), pijk = p(Y = 1|ui, pj, ak) ϕϧψʔΠ෼෍ΛԾఆ͢Δ user : ui page : pj ad : ak ad⇤ = arg max k=1,....,n p ( Y =1 | ui, pj, ak)
5. ### 5. ɹϢʔβʔݸਓʢu_iʣͷCVRΛݟੵ΋ΒͣʹɺΫϥελϦϯάͱ࠷໬ਪఆ๏ʹै͍ܭࢉ • ɹ{ad,user,page} ͱ֊૚ߏ଄Խͤ͞Δ͜ͱͰɺσʔλͷεύʔεੑΛղܾ \$7਺ JNQ਺ Ϣʔβʔͷଐ͢ΔΫϥελʔͱ  ϢʔβʔݸਓͷCVR͸ಉ౳Λߟ͑Δ ˜

Yijk ⇠ Binomial ( Tijk,˜ pijk), ˜ pijk = p ( Y = 1 | u 2 UsrClusti, pj, ak), ˜ p MLE ijk = ( Sijk Tijk , if Tijk 6= 0 unknown, if Tink = 0 ,
6. ### §2.3 Data Hierarchies 1.ɹσʔλʹ֊૚ߏ଄Λ࣋ͨͤɺ”implict” ͳΫϥελϦϯά  ɹAdɺpageɺuserͷNodeΛٻΊΔͱɺla × lp × lu

ݸʢ=MݸʣͷΫϥελʹॴଐ͢Δ 2.ɹ“implicit” user clustering ͱ͸…  ɹ㱺ɹ“explicit”ͳΫϥελϦϯάʢσϞάϥϑΟοΫ৘ใ΍ऩೖ౳ΛҼࢠʹ…)  ɹɹɹͰ͸ͳ͘ɺ  ɹɹɹ๚໰ͨ͠WebαΠτͷΧςΰϦʔ౳ΛҼࢠʹ…  ɹɹɹ{User × Publisher Type × Campaign}
7. ### 3. ɹpast count data(i.e., number of imp and cv) ͔Βద੾ͳΫϥελʔʹ഑ஔ

ɹݸਓͱΫϥελͷCVRͷۙࣅࣜ  Ϣʔβʔͷଐ͢ΔΫϥελʔ pijk = p(Y = 1|ui,pj,ak) ⇡ ˜ pijk = p( ˜ Y = 1|u 2 Cui ,pj,ak)
8. ### Conversion Rate Estimation:  §3.1 Past Performance at Different Hierarchical Levels

1. ɹThe ﬁrst challengeɿ  ɹಉҰ΋͘͠͸ۃΊ͍ͯۙCVRΛ࣋ͭ”user”ͱ”group users”=ClusterΛಛఆ͢Δ͜ͱ 2. ɹThe main challengeɿ  ɹෆ଍͕ͪ͠ͳ”CVͨ͠؍ଌ஋”ΛͲͷΑ͏ʹѻ͏͔  ɹmissing dataͷิਖ਼  ɹex) for a true conversion rate that is on order of 10^-5 (on the order of millions imp) 3. ɹCVRͷ࠶ఆٛ    4. ɹ࠷໬ਪఆ๏Ͱܭࢉ֤ͨؔ͠਺Λσʔλ֊૚ຖͰͦΕͧΕ͔͚߹ΘͤɺMݸͷp_ijkΛఆٛ  ɹ㱺ɹ֤֊૚ຖͷ̍ͭ̍ͭͷCVR༧ଌثͷਫ਼౓͸ऑ͍͕ɺͦΕΛ૊Έ߹ΘͤΔ͜ͱߴਫ਼౓ʹ VTFSɺQBHFɺ"E͸άϧʔϓԽ  pijk = f(ˆ ˜ p1 ijk = 1| u 2 Cui ,2 Cpj ,a 2 Cak )
9. ### §3.2 Combining Estimators using Logistic Regression 1.ɹϩδεςΟοΫճؼΛར༻֤ͯ͠༧ଌثʹ࠷దͳॏΈ෇͚  ɹ㱺ɹMݸͷҟͳΔϨϕϧͰͷ࠷໬ਪఆ๏ʹΑΔCVRͷਪఆ 2.ɹ֤༧ଌثΛ૊Έ߹ΘͤͨCVR༧ଌͷ࠷ఆٛ

ɹ㱺ɹ࠷దͳύϥϝʔλϕΫτϧ”β”ΛܾΊΔඞཁੑ pijk = f(ˆ ˜ p1 ijk ,...,ˆ ˜ pM ijk ; )

11. ### Practical Issues (Propose): §4.1 Data Imbalance 1. ɹ2ͭͷ”Data Imbalance” 1.ɹThe

average conversion rate of an advertising campaign is inherently very low and … - ɹCVR͸ฏۉతʹ௿͘ɺे෼ͳαϯϓϧ਺Λ֬อ͢Δͷ͕ࠔ೉  ɹex) 10^-3 to 10^-6 2.ɹThe ratio of the number of no-conversion events to that of conversion events is very large… - ɹcvʹର͢Δʢ๲େͳʣimp਺ͷσʔλͷௐ੔ 2.ɹରࡦ  ɹɹ-ɹCVͨ͠σʔλ͸શͯར༻  ɹɹ-ɹnon-CVσʔλ͸αϯϓϦϯάͯ͠ར༻
12. ### §4.2 Output Calibration 1.ɹείΞΛ۠ؒͰ෼ׂ͠ɺCalibration͢Δ  ɹLRͷ݁Ռɹʹɹbin಺ʹ౳෼ׂ

ຊ౰ͷ\$73ʹ͚ۙͮΔ CJO಺ͷ\$73ʢαϯϓϧʣΛܭࢉ

14. ### §4.3 Missing Features 1.ɹσʔλ͕ͳ͍৔߹ͷิਖ਼  ɹɹ-ɹMARύλʔϯʢmissing at randomʣ  ɹɹ ɹˠɹ͋Δม਺͕ܽଌͱͳΔ֬཰͸؍ଌ͞Εͨσʔλ͚ͩʹґଘ͠ɺ  ɹɹ

ɹɹɹܽଌ͍ͯ͠Δม਺ͷຊདྷͷ஋ʹ͸ґଘ͠ͳ͍  ɹɹ ex)ɹϢʔβʔϓϩϑΝΠϧαʔόʹͳ͍userIDɺcategoryʹ෼ྨͰ͖ͳ͍Web page    ɹɹ-ɹ৚݅෇͖Ψ΢ε෼෍ʹΑΔิਖ਼

16. ### Results & Discussion: §5.1 Data Imbalance & Score Calibration 1.ɹ

࣮ݧ৚݅  ɹɹ-ɹ޿ࠂ࿮͸5छྨʢͦΕͧΕ1೔͋ͨΓ100ສPVʣ  ɹɹ-ɹ2012/1ͷ2िؒͷϩάΛར༻  ɹɹ-ɹ1ि໨͸܇࿅ɺ2ि໨͸ςετ  ɹɹ-ɹ͋ΔϢʔβʔ͕ϖʔδɺ޿ࠂΛݟͨ࣌ʹCV͢Δ͔͠ͳ͍͔༧૝ 2.ɹ܇࿅σʔλͷCV/non-CV཰ͷӨڹ  ɹˎɹIRʢimbalance ratioʣʹΑΔࠩ͸ͳ͠

18. ### §5.3 “Baseline Estimators” vs “Logistic Regression” 1. ɹఏҊख๏ͷޮՌ 1.ɹBaseline 1

:  Ϣʔβʔͷ೥ྸɺੑผɺډॅ஍Ҭ౳ͷσϞάϥϑΟοΫͳଐੑʹΑΔΫϥελϦϯάͱ  ޿ࠂ܈ʹΑΔਪఆ Ϣʔβʔͱಉ͡ϢʔβʔάϧʔϓʢΫϥελʔʣ  ʢաڈͷӾཡύλʔϯ΍σϞάϥϑΟοΫ౳ʹΑΓಛఆʣ pijk ⇡ ˆ ˜ p1 = pMLE(Y = 1| u 2 CG ui ,ak 2 Campaignak )
19. ### 2.ɹBaseLine 2 :   ɹ޿ࠂ܈ΛݟͨϢʔβʔͱಛఆͷ޿ࠂʹΑΔਪఆ      2. ݁Ռ  -ɹLR͕B1ʹରͯ͠28.2%ɺB2ʹରͯ͠5.92%਺஋͕ߴ͍

pijk ⇡ ˆ ˜ p2 = pMLE(Y = 1|u 2 CA ui ,ak)

21. ### §6 Conclusion 1. ݁࿦ 1. RTBʹର͢Δߴ଎ͳCVRਪఆख๏ΛఏҊ  →ɹLRʹΑΔύϥϝʔλͷਪఆ͕޷·͍͠ 2. ֊૚ߏ଄ԽʹΑΓਫ਼౓͕޲্  →ɹ{user,page,ad}

ͷΑ͏ʹ֊૚Խ͢Δ͜ͱ͕޷·͍͠ 2. ՝୊ͱٙ໰ • ɹCV཰ͷΫϥελϦϯάख๏ʹؔͯ͠͸ৄ͍͠هࡌ͕ͳ͔ͬͨ఺ɻ  ɹ”implicit”ͳΫϥελϦϯάͷ۩ମతͳख๏͕هࡌ͞Ε͍ͯͳ͍ɻ