Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning (Intro)

Norman
February 02, 2018

Machine Learning (Intro)

Intro to Machine Learning presentation, prepared privately, but aimed at the devs of the company.

Norman

February 02, 2018
Tweet

More Decks by Norman

Other Decks in Programming

Transcript

  1. OVERVIEW ➤ Disciplines within Artificial Intelligence ➤ Examples ➤ Machine

    Learning Basics ➤ Algorithms ➤ Neural Networks ➤ Loss Functions ➤ Data Engineering ➤ Data Acquisition ➤ Data Cleansing ➤ Feature Engineering ➤ Campaign Suggestions for dotbooks ➤ Knowing about Uncertainty ➤ What's Next? 3
  2. Artificial Intelligence Machine Learning Supervised Learning Unsupervised Learning DISCIPLINES WITHIN

    AI ➤ AI – Algorithm Driven ➤ Search, A* Search ➤ Heuristics (e.g. Simulated Annealing) ➤ ML – Data Driven ➤ Linear Regression ➤ Support Vector Machines ➤ Decision Trees / Random Forests ➤ Clustering ➤ Reinforcement Learning ➤ DL – Data Driven Neural Nets ➤ Image Classification / CV ➤ Voice Recognition ➤ Feature Extraction ➤ GANs Deep Learning 4
  3. EXAMPLES: MACHINE LEARNING ➤ Classification ➤ Regression ➤ Clustering ➤

    grouping unknown data ➤ Anomaly Detection ➤ Credit Card Fraud ➤ Intrusion Detection ➤ Recommender Systems / Collaborative Filtering ➤ Dimensionality Reduction ➤ Reinforcement Learning / Decision Making 5
  4. EXAMPLES: DEEP LEARNING ➤ Convolutional Neural Networks ➤ Image Classification

    (Cats vs. Dogs, OCR, ...) ➤ Object Detection, Image Segmentation ➤ Translation ➤ Recurrent Neural Networks ➤ Time Series Prediction ➤ Question Answering, Translation ➤ Generative Adversarial Networks ➤ "Super Resolution" ➤ Face Generation ➤ Image Style Transfer 6
  5. DEEP LEARNING But... ➤ DL is hungry for data ➤

    usually more than 5-digit labeled examples ➤ cats vs dogs: 25k ➤ MNIST digits: 28k ➤ ImageNet Object Detection: 57 GB ➤ Amazon Rain Forest Competition: 150k (22 GB) ➤ TSA 3D Passenger Screening Competition: 57 GB ➤ word2vec: All of Wikipedia ➤ DL is resource-hungry ➤ Training generally requires ➤ fast GPUs ➤ lots of time (15 mins to several days or more) ➤ Transfer-Learning possible 7
  6. EXAMPLES: REINFORCEMENT LEARNING ➤ Massively hyped ➤ Often showcased in

    video games or robotics ➤ Often relies on simulations More later... 8
  7. MACHINE LEARNING BASICS Basic Idea: ➤ Learning without explicit programming

    ➤ Trying to find an unknown function f(x)
 through observations of (noisy) data ➤ Optimization problem Taking the "magic" out of AI ➤ "Statistical Learning" ➤ What is the most likely result,
 given the input? 9
  8. A FEW ML ALGORITHMS ➤ Supervised ➤ Linear Regression ➤

    Support Vector Machines ➤ Decision Trees ➤ Unsupervised ➤ k-Nearest Neighbor ➤ Clustering ➤ Collaborative Filtering ➤ Principal Component Analysis 11
  9. ARTIFICIAL NEURAL NETWORKS ➤ Invented in the 1940s ➤ First

    called "perceptron" ➤ Inspired by human brain cells ➤ Matrix and vector multiplications ➤ Experienced several "winters" ➤ Only linear functions ➤ Limited computation power and data ➤ Non-linearity through activation function ➤ sigmoid, tanh, ReLU, etc. ➤ Backpropagation algorithm → learning ➤ Network topology 13
  10. DEEP LEARNING ➤ More hidden layers ➤ Ability to abstract

    complex relations ➤ Learns complex features ➤ Fueled by renown competitions ➤ ImageNet ➤ PASCAL Visual Object Classes ➤ SQuAD (The Stanford Question Answering Dataset) 14
  11. ARTIFICIAL NEURAL NETWORKS ➤ Lots of parameters to tune ➤

    Topology is a major design decision ➤ Trade-off: accuracy vs. generalization ➤ Activation functions ➤ Regularization ➤ Learning rate ➤ Cost function, more about that now... 18
  12. COMMON LOSS FUNCTIONS ➤ Regression ➤ Mean Squared Error (MSE)

    ➤ Mean Absolute Error (MAE) ➤ Classification ➤ Log Loss 20
  13. (STOCHASTIC) GRADIENT DESCENT ➤ step-wise minimization of loss function ➤

    loss function must be ➤ continuous ➤ differentiable 21
  14. DATA ENGINEERING Machine Learning Engineering Data Science Data Cleansing Data

    Engineering Data Wrangling Data Munging clean hands down n dirty }80% 24
  15. DATA ENGINEERING ➤ Getting data in the first place... ➤

    Database, JSON, CSV, XML, ... ➤ API ➤ Crawling websites ➤ Raw text, images, video, etc. ➤ Generating data from simulators / games ➤ Data format is important ➤ input and output format ➤ determines convergence / learnability ➤ Feature Engineering ➤ what not to feed ➤ balancing data 25
  16. THE TOOLS: IN OUR CASE ➤ Python ➤ Jupyter ➤

    Pandas ➤ NumPy ➤ Plotly, Seaborn ➤ SciKit Learn ➤ TensorFlow + Keras 26
  17. WORK & DATA FLOW ➤ Acquire data via API (or

    ElasticSearch) ➤ Prepare data, bring into usable format ➤ Save preprocessed data ➤ Visualize data to get a first idea ➤ Extract features and labels ➤ Evaluate ML models ➤ Save best model ➤ Deploy and run 28
  18. “ How to test this? -something Yulia could be thinking

    now P("How to test this?" | Yulia Kozak) > 0.9 29
  19. VERIFYING MACHINE LEARNING MODELS ☑ Split data set (train /

    validation / test) ☑ Replay real world data ☑ Plausibility checks ☑ Shadow mode ❓ Simulation ❓ "Expert" assessment 30
  20. OBJECTIVES AND IDEAS ➤ Try to emulate editors' campaign behavior

    ➤ Improve sales performance by placing good campaign ➤ Detect popular topics ➤ Predict sales performance ➤ More ideas welcome! 32
  21. STEPS & CHALLENGES ➤ Getting the data (duh) ➤ Thankfully

    lots of help ➤ Handling / correcting inconsistent data ➤ Publication date ➤ Campaign prices ≥ base price ➤ Transforming / Normalizing data ➤ Evaluating which features to use ➤ Tuning hyper-parameters ➤ Balancing the data ➤ Stopping the model from "cheating"
 and being "lazy" 33
  22. CODE SAMPLES @classmethod def _create_campaign_matrix(cls, df_prices: pd.DataFrame) -> pd.DataFrame: df

    = df_prices.drop(['price'], axis=1) today = pd.Timestamp(datetime.date.today()) future_days = pd.Timedelta(30, unit='d') df.loc[df['from'].isnull(), 'from'] = df.loc[df['from'].isnull(), 'publication_date'] df.loc[df['to'].isnull() & (df['from'] > today), 'to'] = \ df.loc[df['to'].isnull() & (df['from'] > today), 'from'] + future_days df.loc[df['to'].isnull(), 'to'] = today + future_days df = cls._merge_non_campaigns(df) df = cls._merge_campaign_count(df) df['publication_month'] = df['publication_date'].dt.month df['publication_year'] = df['publication_date'].dt.year df['publication_dayofyear'] = df['publication_date'].dt.dayofyear df['publication_weekday'] = df['publication_date'].dt.weekday assert df[df['from'] > df['to']].empty df['from_day'] = (df['from'] - df['publication_date']).dt.days df['to_day'] = (df['to'] - df['publication_date']).dt.days df.loc[(df.ratio > 0.0), 'ratio'] = 0.0 df.loc[~df.campaign, 'ratio'] = 0.0 return df 37
  23. BALANCING TIME SERIES DATA ➤ Most of the time 0%

    price change ➤ Just sample the relevant data, i.e. days when price changes
 (into & out of campaigns) plus a few more random picks ➤ "Augment" by sampling around such interesting dates d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 € 0 % 0 % 0 % 0 % 0 % 0 % 0 % -50 % -50 % -50 % -50 % -50 % -50 % -50 % 0 % 0 % 0 % 0 % 0 % 0 % 38
  24. CODE SAMPLES: HANDLING “CIRCULAR” DATA @classmethod def _transform_data(cls, df_: pd.DataFrame)

    -> pd.DataFrame: df = df_.copy() df['current_month'] = (df['publication_date'] + pd.to_timedelta(df['current_day'], unit='d')).dt.month df['current_dayofyear'] = (df['publication_date'] + pd.to_timedelta(df['current_day'], unit='d')).dt.dayofyear df['current_weekday'] = (df['publication_date'] + pd.to_timedelta(df['current_day'], unit='d')).dt.weekday df.loc[df['campaign'], 'campaign_count'] = df.loc[df['campaign'], 'campaign_count'] - 1 df = df.drop( ['document_id', 'from', 'to', 'campaign', 'from_day', 'to_day', 'ratio', 'publication_date', 'dist_kindle', 'base_price'], axis=1 ) # encode circular data (months, weekdays) as sin() + cos() for column, max_value in ( ('publication_month', 12), ('publication_dayofyear', 365), ('publication_weekday', 7), ('current_month', 12), ('current_dayofyear', 365), ('current_weekday', 7) ): df['{}_sin'.format(column)] = np.sin(2 * np.pi / max_value * df[column]) df['{}_cos'.format(column)] = np.cos(2 * np.pi / max_value * df[column]) df.drop([column], axis=1, inplace=True) return df 39
  25. CODE SAMPLES: NN MODEL model = Sequential() model.add(BatchNormalization(input_shape=(INPUT_DIM,))) model.add(Dense(64, activation='elu',

    kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dropout(0.2)) model.add(Dense(32, activation='elu', kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dropout(0.1)) model.add(Dense(16, activation='elu', kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dropout(0.1)) model.add(Dense(16, activation='elu', kernel_initializer=glorot_normal(), kernel_regularizer=l2(0.01))) model.add(BatchNormalization()) model.add(Dense(OUTPUT_DIM, activation='sigmoid', kernel_initializer=glorot_normal())) model.compile(optimizer=Adam(lr=0.015, decay=0.0001), loss=custom_loss) 40
  26. TRAINING OUTPUT Train on 25140 samples, validate on 5005 samples

    Epoch 1/96 25140/25140 [==============================] - 3s - loss: 167.0875 - val_loss: 128.8099 Epoch 2/96 25140/25140 [==============================] - 0s - loss: 130.5025 - val_loss: 120.0291 Epoch 3/96 25140/25140 [==============================] - 0s - loss: 123.4949 - val_loss: 112.4073 Epoch 4/96 25140/25140 [==============================] - 0s - loss: 107.4423 - val_loss: 128.2385 Epoch 5/96 25140/25140 [==============================] - 0s - loss: 99.7719 - val_loss: 134.3354 ..... Epoch 96/96 25140/25140 [==============================] - 0s - loss: 76.0061 - val_loss: 93.3046 In [241]: mean_absolute_error(y_test, y_pred), mean_squared_error(y_test, y_pred) Out[241]: (0.094952911, 0.042776108) In [242]: y_test.mean(), y_pred.mean(), y_test.std(), y_pred.std() Out[242]: (0.14201689, 0.13921063, 0.26143825, 0.24002805) In [243]: y_test.max(), y_pred.max(), y_test.min(), y_pred.min() Out[243]: (1.0, 0.8296386, 0.0, 1.4799974e-07) 41
  27. SAMPLE SUGGESTIONS v321827: [ 0. 0.68 0.68 0.68 0.68 0.68

    0.68 0.68 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.7 0.7 0.7 0.69 0.69 0.69 0.69 0.66 0.66] v371756: [ 0.47 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.49 0.49 0.49 0.49 0.49 0.49 0.49 0.52 0.52 0.52 0.52 0.52 0.52 0.52 0.62 0.61] v310734: [ 0. 0.32 0.32 0.32 0.32 0.32 0.32 0.32 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.39 0.4 ] v230513: [ 0. 0. 0. 0. 0. 0. 0. 0. 0.37 0.37 0.37 0.37 0.37 0.37 0.37 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.34 0.31 0.31 0.31 0.31 0.31 0.31 0.19 0.19] v369140: [ 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.48 0.48] v278229: [ 0. 0. 0. 0. 0. 0. 0. 0. 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0. 0. 0. 0. 0. 0. 0. 0. 0. ] v209205: [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.27 0.27 0.27 0.27 0.27 0.27 0.27 0. 0. 0. 0. 0. 0. 0. 0. 0. ] v371770: [ 0.44 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.49 0.49 0.49 0.49 0.49 0.49 0.48 0.53 0.53] v342268: [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] v377658: [ 0.43 0.46 0.46 0.46 0.46 0.46 0.46 0.46 0.47 0.47 0.47 0.47 0.47 0.47 0.47 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.58 0.58 0.58 0.58 0.58 0.58 0.57 0.64 0.64] For the first 31 days 43 ✅ Looking good…
  28. PROBLEM WITH ML / DL ➤ Black Box ➤ No

    insight into how it determines result ➤ No clear confidence intervals ➤ Bayesian Neural Networks to the rescue ➤ allows for deeper investigation of uncertain areas ➤ or focussed training / data acquisition for those areas ➤ human intervention possible in
 low confidence situations 45
  29. MORE MINING ➤ External Data Sources ➤ Amazon: analyze top

    sellers ➤ learn to distinguish beststellers from non-selling books ➤ Google: popular search terms ➤ News (Zeit, FAZ, etc.) ➤ Popular book lists (Spiegel, ...) ➤ Reviews (shops, newspapers, magazines, etc.) ➤ NLP / Topic Modeling / Interestingness ➤ Use abstract, preview, full-text ➤ Cover 47