論文解説 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com

Slide 1

Slide 1 text

Kazuki Motohashi - Konduit K.K. 実践者向けディープラーニング勉強会第8回 - 18/November/2019 Konduit 株式会社本橋和貴 @kmotohas 論⽂解説 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com %FFQMFBSOJOHK LPOEVJUTFSWJOH ࡞ͬͯΔձࣾͩΑ

Slide 5

Slide 5 text

Kazuki Motohashi - Konduit K.K. 実践者向けディープラーニング勉強会第8回 - 18/November/2019 (a) 旅⾏の⽂脈を予測 - ⼦供連れか否かなど (b) レビューを始めとするコンテンツを集めてどれを表⽰するか - なぜその施設が⼈気か要約 (c) 価格やオプションのトレンド 5 Ϟσϧ׆༻ͷ۩ମྫ (a) Traveller Context Model (b) Content Curation Model (c) Content Augmentation Model Figure 1: Examples of Application of Machine Learning likely is that a user is shopping for a family trip. Usually, Family noisy and vast, making it hard to be consumed by users. Content Curation is the process of making content accessible to humans. For example, we have collected over 171M reviews in more than 1.5M properties, which contain highly valuable information about the service a particular accommodation provides and a very rich source of selling points. A Machine Learning model "curates" reviews, con- structing brief and representative summaries of the outstanding aspects of an accommodation (Figure 1(b)). 2.1.6 Content Augmentation. The whole process of users brows- ing, selecting, booking, and reviewing accommodations, puts to our disposal implicit signals that allow us to construct deeper un- derstanding of the services and the quality a particular property or destination can oer. Models in this family derive attributes of a property, destination or even specic dates, augmenting the explicit service oer. Content Augmentation diers from Content Curation in that curation is about making already existing content easily accessible by users whereas augmentation is about enriching an existing entity using data from many others. To illustrate this idea, we give two examples: • Great Value: Booking.com provides a wide selection of properties, oering dierent levels of value in the form of ameni- ties, location, quality of the service and facilities, policies, and many other dimensions. Users need to assess how the price asked for a room relates to the value they would obtain. Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA

Slide 8

Slide 8 text

Kazuki Motohashi - Konduit K.K. 実践者向けディープラーニング勉強会第8回 - 18/November/2019 8 "MMNPEFMGBNJMJFTDBOQSPWJEFWBMVF Figure 2: Model Families Business Impact relative to median impact. 3 MODELING: OFFLINE MODEL PERFORMANCE IS JUST A HEALTH CHECK A common approach to quantify the quality of a model is to estimate or predict the performance the model will have when exposed to data it has never seen before. Dierent avors of cross-validation are used to estimate the value of a specic metric that depends on the task (classication, regression, ranking). In Booking.com we are very much concerned with the value a model brings to our customers and our business. Such value is estimated through Randomized Controlled Trials (RCTs) and specic business metrics like conversion, customer service tickets or cancellations. A very interesting nding is that increasing the performance of a model, does not necessarily translates to a gain in value. Figure 4 illustrates this learning. Each point represents the comparison of a successful model that proved its value through a previous RCT, versus a new model. The horizontal coordinate is given by the relative dierence between the new model and the current baseline according to an oine estimation of the performance of the models. This data is only about classiers and rankers, evaluated by ROC AUC and Mean Reciprocal Rank respectively. The vertical coordinate is given by the relative dierence in a business metric of interest as observed Applied Data Science Track Paper KDD ’19, August 4–8, 2019, Anchorage, AK, USA Figure 2: Model Families Business Impact relative to median impact. Figure 3: A sequence of experiments on a Recommendations Prod- uct. Each experiment tests a new version focusing on the indicated discipline or ML Problem Setup. The length of the bar is the observed impact relative to the rst version (all statistically signi- cant) improving the model behind them. We have also observed models like con interest does no this lear model t model. T between oine e only abo Recipro the rela in a RC the sam (46 mod deeper a conde with 90 of corre between same tim the exte models built in furtherm already remarka areas of exampl Machin with hu the busi This the one lDPSFNFUSJDTzʹର͢Δ֤ϞσϧಋೖͷΠϯύΫτ ʢϕϯνϚʔΫϞσϧΛͱ͢Δʣ ਪનγεςϜͷվળཤྺ ʢॳظΞϧΰϦζϜͷΠϯύΫτΛͱ͢Δʣ ࣌ؒ ʢ$POUFOU$VSBUJPOҎ֎ʣ͢΂ͯϙδςΟϒͳޮՌ ஈ֊తʹվળʢվળޮՌ͸ανΓؾຯʣ

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text