And Then There Are Algorithms – Part 1

And Then There Are Algorithms – Part 1

Machine Learning for the Enterprise Conference, Rome, October 28th, 2019

Machine Learning = Algorithms + Data + Tools

Part 1

Danilo Poccia

October 28, 2019

  1. © 2019, Amazon Web Services, Inc. or its Affiliates. Danilo

    Poccia Principal Evangelist AWS @danilop danilop.net And Then There Are Algorithms
  2. Machine Learning Supervised Learning Unsupervised Learning Inferring a model from

    labeled training data Inferring a model to describe hidden structure from unlabeled data
  3. Reinforcement Learning Perform a certain goal in a dynamic environment

    Machine Learning Supervised Learning Unsupervised Learning
  4. Re:Tip Try topic modeling with your own emails ;-) Unsupervised

    Learning Topic Modeling Discovering abstract “topics” that occur in a collection of documents For example, looking for “infrequent” words that are used more often in a document
  5. Supervised Learning Regression “How many bikes will be rented tomorrow?”

    Happy, Sad, Angry, Confused, Disgusted, Surprised, Calm, Unknown Binary Classification Multi-Class Classification “Is this email spam?” “What is the sentiment of this tweet, or of this social media comment?” 1, 0, 100K Yes / No True / False %
  6. Letter from Ada Lovelace to Charles Babbage 1843 In this

    letter, Lovelace suggests an example of a calculation which “may be worked out by the engine without having been worked out by human head and hands first”.
  7. Diagram of an algorithm for the Analytical Engine for the

    computation of Bernoulli numbers, from Sketch of The Analytical Engine Invented by Charles Babbage by Luigi Menabrea with notes by Ada Lovelace
  8. “You use code to tell a computer what to do.

    Before you write code you need an algorithm. An algorithm is a list of rules to follow in order to solve a problem.” BBC Bitesize What is an Algorithm? https://commons.wikimedia.org/wiki/File:Euclid_flowchart.svg By Somepics (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
  9. Linear Learner Regression Estimate a real valued function Binary Classification

    Predict a 0/1 class Supervised Classification, Regression
  10. Bike Sharing Prediction (Regression) Date Time Temperature (Celsius) Relative Humidity

    Rain (mm/h) Rented Bikes 2018-04-01 08:30 13 64 2 45 2018-04-01 11:30 18 57 0 156 2018-04-02 08:30 14 69 8 87 2018-04-02 11:30 17 73 12 34 … … … … … …
  11. Bike Sharing Prediction (Regression) Date Time Temperature (Celsius) Relative Humidity

    Rain (mm/h) Rented Bikes 2018-04-01 08:30 13 64 2 45 2018-04-01 11:30 18 57 0 156 2018-04-02 08:30 14 69 8 87 2018-04-02 11:30 17 73 12 34 2018-06-14 16:30 23 56 0 ??? Date & Time
  12. Bike Sharing Prediction (Regression) Day of the Year Weekday Public

    Holiday Time (seconds) Temperature (Celsius) Relative Humidity Rain (mm/h) Rented Bikes 91 7 1 30600 13 64 2 45 91 7 1 41400 18 57 0 156 92 1 1 30600 14 69 8 87 92 1 1 41400 17 73 12 34 104 6 0 59400 23 56 0 ??? Date & Time (Feature Engineering)
  13. Minimizing the Error you know the expected values (use separate

    datasets for training and validation) this is always positive (convex function) Supervised
  14. Objective Function loss function regularization term measures how predictive our

    model is on your data measures the complexity of the model Supervised
  15. Factorization Machines • It is an extension of a linear

  16. Factorization Machines not in a Linear Learner 2010 Supervised Classification,

    Regression Alternative least square (ALS) features
  17. Factorization Machines (k=4) Movie 1 action 2 romantic 3 thriller

    4 horror Blade Runner 0.4 0.3 0.5 0.2 Notting Hill 0.2 0.8 0.1 0.01 Arrival 0.2 0.4 0.6 0.1 But you cannot really control how features are used! 2010 Supervised Classification, Regression Intuitively, each “feature” describes a property of the “items”
    K-Nearest Neighbors (k-NN) 1991 Supervised Classification, Regression

  20. Principal Component Analysis (PCA) • PCA is an unsupervised learning

    algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible • This is done by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another • They are also constrained so that the first component accounts for the largest possible variability in the data, the second component the second most variability, and so on Pearson, K. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2:559-572. http://pbil.univ-lyon1.fr/R/pearson1901.pdf 1901 Unsupervised Dim ensionality Reduction
  22. Gradient Boosting Gradient boosting: Distance to target – Terence Parr

    and Jeremy Howard https://explained.ai/gradient-boosting/L2-loss.html
