Kaggle Criteo Challenge and Online Learning

Kaggle Criteo Challenge and Online Learning Kaggle Paris Meetup –
October 23nd, 2014 - Paris, France Christophe Bourguignat – AXA Data Innovation Lab - @chris_bour

Kaggle Criteo Challenge • Develop a model predicting ad click-
through rate (CTR) • Train : a portion of Criteo's traffic over a period of 7 days • Test : events on the day following the training period

Kaggle Criteo Challenge • Total : – 52 millions rows
– 13 GB (uncompressed)

Kaggle Criteo Challenge • Undisclosed detailed meaning of the features
• 13 numerical features – Mostly counters : number of times the user visited the advertiser website, … • 26 categorical features – Publisher features : the domain of the url where the ad was displayed, … – Advertiser features : advertiser id, type of products, … – User features : browser type, …

Cardinalities & ranges examples • I5 : – Between 0
and 23 159 456 • C3 : – 10 131 226 distinct categories

• Model that learns one instance at a time •
Soon after the prediction is made, the true label of the instance is discovered Online Machine Learning

• 1 : receive the instance • 2 : predict
the label for the instance • 3 : the algorithm receives the true label of the instance – use this label feedback to update hypothesis for future trials Online Machine Learning

• Applications of a sequential nature • Applications with huge
amounts of data – traditional learning approaches that use the entire data set are computationally infeasible Typical use cases

from datetime import datetime from csv import DictReader from math
import exp, log, sqrt Solution with 75 lines of Python Code Source : https://www.kaggle.com/c/criteo-display-ad-challenge/forums/t/10322/beat-the-benchmark-with-less-then-200mb-of-memory

• 1 : receive the instance D = 2 **
20 # number of weights use for learning def get_x(csv_row, D): x = [0] # 0 is the index of the bias term for key, value in csv_row.items(): index = int(value + key[1:], 16) % D # weakest hash ever ;) x.append(index) return x # x contains indices of features that have a value of 1 Solution with 75 lines of Python Code

• 2 : predict the label for the instance #
initialize our model D = 2 ** 20 # number of weights use for learning w = [0.] * D # weights def get_p(x, w): wTx = 0. for i in x: # do wTx wTx += w[i] * 1. # w[i] * x[i], but if i in x we got x[i] = 1. return 1. / (1. + exp(-max(min(wTx, 20.), -20.))) # bounded sigmoid Solution with 75 lines of Python Code

• 3 : the algorithm receives the true label of
the instance – use this label feedback to update hypothesis for future trials # initialize our model w = [0.] * D # weights n = [0.] * D # number of times we've encountered a feature alpha = .1 # learning rate for sgd optimization def update_w(w, n, x, p, y): for i in x: # alpha / (sqrt(n) + 1) is the adaptive learning rate heuristic # (p - y) * x[i] is the current gradient # note that in our case, if i in x then x[i] = 1 w[i] -= (p - y) * alpha / (sqrt(n[i]) + 1.) n[i] += 1. return w, n Solution with 75 lines of Python Code

Hashing trick (features hashing) • Typically used for text classification
• Use of dictionaries to represent features have drawbacks – Take large amount of memory – Grow in size as training set grows – ML can be attacked (e.g. using misspelling words not in the stored vocabulary)

Improvements • Hashing trick dimension D = 2 ** 20
-> D = 2 ** 29 • Hashing function index = int(value + key[1:], 16) % D -> index = mmh3.hash(str(i) + value) % D MurmurHash performs well in a random distribution of regular keys

Online ML with scikit-learn import pandas as pd from sklearn
import linear_model model = linear_model.SGDClassifier() train = pd.read_csv(“train.csv”, chunksize = 100000, iterator = True) for chunk in train: model.partial_fit(X, y)

Thank You !

Kaggle Criteo Challenge and Online Learning

Kaggle Criteo Challenge and Online Learning

Christophe Bourguignat

More Decks by Christophe Bourguignat

Other Decks in Science

Featured

Transcript

Kaggle Criteo Challenge and Online Learning Kaggle Paris Meetup –

Kaggle Criteo Challenge • Develop a model predicting ad click-

Kaggle Criteo Challenge • Total : – 52 millions rows

Kaggle Criteo Challenge • Undisclosed detailed meaning of the features

Cardinalities & ranges examples • I5 : – Between 0

• Model that learns one instance at a time •

• 1 : receive the instance • 2 : predict

• Applications of a sequential nature • Applications with huge

from datetime import datetime from csv import DictReader from math

• 1 : receive the instance D = 2 **

• 2 : predict the label for the instance #

• 3 : the algorithm receives the true label of

Hashing trick (features hashing) • Typically used for text classification

Improvements • Hashing trick dimension D = 2 ** 20

Online ML with scikit-learn import pandas as pd from sklearn

Thank You !