Machine Learning (Intro)

MACHINE LEARNING Norman Thomas 1

DISCLAIMER During this talk you will hear a lot of
buzz-words. 2

OVERVIEW ➤ Disciplines within Artiﬁcial Intelligence ➤ Examples ➤ Machine
Learning Basics ➤ Algorithms ➤ Neural Networks ➤ Loss Functions ➤ Data Engineering ➤ Data Acquisition ➤ Data Cleansing ➤ Feature Engineering ➤ Campaign Suggestions for dotbooks ➤ Knowing about Uncertainty ➤ What's Next? 3

Artificial Intelligence Machine Learning Supervised Learning Unsupervised Learning DISCIPLINES WITHIN
AI ➤ AI – Algorithm Driven ➤ Search, A* Search ➤ Heuristics (e.g. Simulated Annealing) ➤ ML – Data Driven ➤ Linear Regression ➤ Support Vector Machines ➤ Decision Trees / Random Forests ➤ Clustering ➤ Reinforcement Learning ➤ DL – Data Driven Neural Nets ➤ Image Classiﬁcation / CV ➤ Voice Recognition ➤ Feature Extraction ➤ GANs Deep Learning 4

EXAMPLES: MACHINE LEARNING ➤ Classiﬁcation ➤ Regression ➤ Clustering ➤
grouping unknown data ➤ Anomaly Detection ➤ Credit Card Fraud ➤ Intrusion Detection ➤ Recommender Systems / Collaborative Filtering ➤ Dimensionality Reduction ➤ Reinforcement Learning / Decision Making 5

EXAMPLES: DEEP LEARNING ➤ Convolutional Neural Networks ➤ Image Classiﬁcation
(Cats vs. Dogs, OCR, ...) ➤ Object Detection, Image Segmentation ➤ Translation ➤ Recurrent Neural Networks ➤ Time Series Prediction ➤ Question Answering, Translation ➤ Generative Adversarial Networks ➤ "Super Resolution" ➤ Face Generation ➤ Image Style Transfer 6

DEEP LEARNING But... ➤ DL is hungry for data ➤
usually more than 5-digit labeled examples ➤ cats vs dogs: 25k ➤ MNIST digits: 28k ➤ ImageNet Object Detection: 57 GB ➤ Amazon Rain Forest Competition: 150k (22 GB) ➤ TSA 3D Passenger Screening Competition: 57 GB ➤ word2vec: All of Wikipedia ➤ DL is resource-hungry ➤ Training generally requires ➤ fast GPUs ➤ lots of time (15 mins to several days or more) ➤ Transfer-Learning possible 7

EXAMPLES: REINFORCEMENT LEARNING ➤ Massively hyped ➤ Often showcased in
video games or robotics ➤ Often relies on simulations More later... 8

MACHINE LEARNING BASICS Basic Idea: ➤ Learning without explicit programming
➤ Trying to ﬁnd an unknown function f(x)  through observations of (noisy) data ➤ Optimization problem Taking the "magic" out of AI ➤ "Statistical Learning" ➤ What is the most likely result,  given the input? 9

MACHINE LEARNING BASICS ➤ Relies heavily on data,  therefore →
➤ Data must be clean  and accurate 10

A FEW ML ALGORITHMS ➤ Supervised ➤ Linear Regression ➤
Support Vector Machines ➤ Decision Trees ➤ Unsupervised ➤ k-Nearest Neighbor ➤ Clustering ➤ Collaborative Filtering ➤ Principal Component Analysis 11

ENTER: ARTIFICIAL NEURAL NETWORKS 12

ARTIFICIAL NEURAL NETWORKS ➤ Invented in the 1940s ➤ First
called "perceptron" ➤ Inspired by human brain cells ➤ Matrix and vector multiplications ➤ Experienced several "winters" ➤ Only linear functions ➤ Limited computation power and data ➤ Non-linearity through activation function ➤ sigmoid, tanh, ReLU, etc. ➤ Backpropagation algorithm → learning ➤ Network topology 13

DEEP LEARNING ➤ More hidden layers ➤ Ability to abstract
complex relations ➤ Learns complex features ➤ Fueled by renown competitions ➤ ImageNet ➤ PASCAL Visual Object Classes ➤ SQuAD (The Stanford Question Answering Dataset) 14

NETWORK TOPOLOGY 15 GoogLeNet

IT'S A MESS 16

NETWORK TOPOLOGY ➤ But all for a good cause 17

ARTIFICIAL NEURAL NETWORKS ➤ Lots of parameters to tune ➤
Topology is a major design decision ➤ Trade-oﬀ: accuracy vs. generalization ➤ Activation functions ➤ Regularization ➤ Learning rate ➤ Cost function, more about that now... 18

A LITTLE BIT OF MATH 19

COMMON LOSS FUNCTIONS ➤ Regression ➤ Mean Squared Error (MSE)
➤ Mean Absolute Error (MAE) ➤ Classiﬁcation ➤ Log Loss 20

(STOCHASTIC) GRADIENT DESCENT ➤ step-wise minimization of loss function ➤
loss function must be ➤ continuous ➤ diﬀerentiable 21

LOSS / REWARD FUNCTIONS: DEMO vs. 22

NOW: THE BORING PART Data Engineering 23

DATA ENGINEERING Machine Learning Engineering Data Science Data Cleansing Data
Engineering Data Wrangling Data Munging clean hands down n dirty }80% 24

DATA ENGINEERING ➤ Getting data in the ﬁrst place... ➤
Database, JSON, CSV, XML, ... ➤ API ➤ Crawling websites ➤ Raw text, images, video, etc. ➤ Generating data from simulators / games ➤ Data format is important ➤ input and output format ➤ determines convergence / learnability ➤ Feature Engineering ➤ what not to feed ➤ balancing data 25

THE TOOLS: IN OUR CASE ➤ Python ➤ Jupyter ➤
Pandas ➤ NumPy ➤ Plotly, Seaborn ➤ SciKit Learn ➤ TensorFlow + Keras 26

PUTTING IT TOGETHER 27

WORK & DATA FLOW ➤ Acquire data via API (or
ElasticSearch) ➤ Prepare data, bring into usable format ➤ Save preprocessed data ➤ Visualize data to get a ﬁrst idea ➤ Extract features and labels ➤ Evaluate ML models ➤ Save best model ➤ Deploy and run 28

“ How to test this? -something Yulia could be thinking
now P("How to test this?" | Yulia Kozak) > 0.9 29

VERIFYING MACHINE LEARNING MODELS ☑ Split data set (train /
validation / test) ☑ Replay real world data ☑ Plausibility checks ☑ Shadow mode ❓ Simulation ❓ "Expert" assessment 30

CAMPAIGN SUGGESTIONS 31

OBJECTIVES AND IDEAS ➤ Try to emulate editors' campaign behavior
➤ Improve sales performance by placing good campaign ➤ Detect popular topics ➤ Predict sales performance ➤ More ideas welcome! 32

STEPS & CHALLENGES ➤ Getting the data (duh) ➤ Thankfully
lots of help ➤ Handling / correcting inconsistent data ➤ Publication date ➤ Campaign prices ≥ base price ➤ Transforming / Normalizing data ➤ Evaluating which features to use ➤ Tuning hyper-parameters ➤ Balancing the data ➤ Stopping the model from "cheating"  and being "lazy" 33

DOTBOOKS CAMPAIGN DATA 34

DOTBOOKS CAMPAIGN DATA 35 campaign price change in 10%

DOTBOOKS SALES DATA 36 Observation: ➤ suspicious periodic spikes ➤
monthly sales imports

CODE SAMPLES @classmethod def _create_campaign_matrix(cls, df_prices: pd.DataFrame) -> pd.DataFrame: df
= df_prices.drop(['price'], axis=1) today = pd.Timestamp(datetime.date.today()) future_days = pd.Timedelta(30, unit='d') df.loc[df['from'].isnull(), 'from'] = df.loc[df['from'].isnull(), 'publication_date'] df.loc[df['to'].isnull() & (df['from'] > today), 'to'] = \ df.loc[df['to'].isnull() & (df['from'] > today), 'from'] + future_days df.loc[df['to'].isnull(), 'to'] = today + future_days df = cls._merge_non_campaigns(df) df = cls._merge_campaign_count(df) df['publication_month'] = df['publication_date'].dt.month df['publication_year'] = df['publication_date'].dt.year df['publication_dayofyear'] = df['publication_date'].dt.dayofyear df['publication_weekday'] = df['publication_date'].dt.weekday assert df[df['from'] > df['to']].empty df['from_day'] = (df['from'] - df['publication_date']).dt.days df['to_day'] = (df['to'] - df['publication_date']).dt.days df.loc[(df.ratio > 0.0), 'ratio'] = 0.0 df.loc[~df.campaign, 'ratio'] = 0.0 return df 37

BALANCING TIME SERIES DATA ➤ Most of the time 0%
price change ➤ Just sample the relevant data, i.e. days when price changes  (into & out of campaigns) plus a few more random picks ➤ "Augment" by sampling around such interesting dates d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 € 0 % 0 % 0 % 0 % 0 % 0 % 0 % -50 % -50 % -50 % -50 % -50 % -50 % -50 % 0 % 0 % 0 % 0 % 0 % 0 % 38

CODE SAMPLES: HANDLING “CIRCULAR” DATA @classmethod def _transform_data(cls, df_: pd.DataFrame)
-> pd.DataFrame: df = df_.copy() df['current_month'] = (df['publication_date'] + pd.to_timedelta(df['current_day'], unit='d')).dt.month df['current_dayofyear'] = (df['publication_date'] + pd.to_timedelta(df['current_day'], unit='d')).dt.dayofyear df['current_weekday'] = (df['publication_date'] + pd.to_timedelta(df['current_day'], unit='d')).dt.weekday df.loc[df['campaign'], 'campaign_count'] = df.loc[df['campaign'], 'campaign_count'] - 1 df = df.drop( ['document_id', 'from', 'to', 'campaign', 'from_day', 'to_day', 'ratio', 'publication_date', 'dist_kindle', 'base_price'], axis=1 ) # encode circular data (months, weekdays) as sin() + cos() for column, max_value in ( ('publication_month', 12), ('publication_dayofyear', 365), ('publication_weekday', 7), ('current_month', 12), ('current_dayofyear', 365), ('current_weekday', 7) ): df['{}_sin'.format(column)] = np.sin(2 * np.pi / max_value * df[column]) df['{}_cos'.format(column)] = np.cos(2 * np.pi / max_value * df[column]) df.drop([column], axis=1, inplace=True) return df 39

CODE SAMPLES: NN MODEL model = Sequential() model.add(BatchNormalization(input_shape=(INPUT_DIM,))) model.add(Dense(64, activation='elu',
kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dropout(0.2)) model.add(Dense(32, activation='elu', kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dropout(0.1)) model.add(Dense(16, activation='elu', kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dropout(0.1)) model.add(Dense(16, activation='elu', kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dense(OUTPUT_DIM, activation='sigmoid', kernel_initializer=glorot_normal())) model.compile(optimizer=Adam(lr=0.015, decay=0.0001), loss=custom_loss) 40

TRAINING OUTPUT Train on 25140 samples, validate on 5005 samples
Epoch 1/96 25140/25140 [==============================] - 3s - loss: 167.0875 - val_loss: 128.8099 Epoch 2/96 25140/25140 [==============================] - 0s - loss: 130.5025 - val_loss: 120.0291 Epoch 3/96 25140/25140 [==============================] - 0s - loss: 123.4949 - val_loss: 112.4073 Epoch 4/96 25140/25140 [==============================] - 0s - loss: 107.4423 - val_loss: 128.2385 Epoch 5/96 25140/25140 [==============================] - 0s - loss: 99.7719 - val_loss: 134.3354 ..... Epoch 96/96 25140/25140 [==============================] - 0s - loss: 76.0061 - val_loss: 93.3046 In [241]: mean_absolute_error(y_test, y_pred), mean_squared_error(y_test, y_pred) Out[241]: (0.094952911, 0.042776108) In [242]: y_test.mean(), y_pred.mean(), y_test.std(), y_pred.std() Out[242]: (0.14201689, 0.13921063, 0.26143825, 0.24002805) In [243]: y_test.max(), y_pred.max(), y_test.min(), y_pred.min() Out[243]: (1.0, 0.8296386, 0.0, 1.4799974e-07) 41

(CUSTOM) LOSS FUNCTIONS 42 ➤ Punishing the cheater

SAMPLE SUGGESTIONS v321827: [ 0. 0.68 0.68 0.68 0.68 0.68
0.68 0.68 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.7 0.7 0.7 0.69 0.69 0.69 0.69 0.66 0.66] v371756: [ 0.47 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.52 0.52 0.52 0.52 0.52 0.52 0.52 0.62 0.61] v310734: [ 0. 0.32 0.32 0.32 0.32 0.32 0.32 0.32 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.39 0.4 ] v230513: [ 0. 0. 0. 0. 0. 0. 0. 0. 0.37 0.37 0.37 0.37 0.37 0.37 0.37 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.34 0.31 0.31 0.31 0.31 0.31 0.31 0.19 0.19] v369140: [ 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.48 0.48] v278229: [ 0. 0. 0. 0. 0. 0. 0. 0. 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0. 0. 0. 0. 0. 0. 0. 0. 0. ] v209205: [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0. 0. 0. 0. 0. 0. 0. 0. 0. ] v371770: [ 0.44 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.49 0.49 0.49 0.49 0.49 0.49 0.48 0.53 0.53] v342268: [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] v377658: [ 0.43 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.58 0.58 0.58 0.58 0.58 0.58 0.57 0.64 0.64] For the first 31 days 43 ✅ Looking good…

UNCERTAINTY 44

PROBLEM WITH ML / DL ➤ Black Box ➤ No
insight into how it determines result ➤ No clear conﬁdence intervals ➤ Bayesian Neural Networks to the rescue ➤ allows for deeper investigation of uncertain areas ➤ or focussed training / data acquisition for those areas ➤ human intervention possible in  low conﬁdence situations 45

WHAT'S NEXT? 46

MORE MINING ➤ External Data Sources ➤ Amazon: analyze top
sellers ➤ learn to distinguish beststellers from non-selling books ➤ Google: popular search terms ➤ News (Zeit, FAZ, etc.) ➤ Popular book lists (Spiegel, ...) ➤ Reviews (shops, newspapers, magazines, etc.) ➤ NLP / Topic Modeling / Interestingness ➤ Use abstract, preview, full-text ➤ Cover 47

Q&A 48

Thank you for your attention!

Machine Learning (Intro)

Machine Learning (Intro)

More Decks by Norman

Other Decks in Programming

Featured

Transcript