Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NMM Lunch & Learn: Machine Learning Techniques

NMM Lunch & Learn: Machine Learning Techniques

Having a conversation with a machine is no longer a figment of our imagination or science fiction. We are having intelligent conversations with machines every day. AI (Artificial Intelligence) or Machine Learning techniques have made this reality possible.

The personal assistant in your smartphone, video games, and the marketing industry have been all using Machine Learning to analyze and predict the actions of their clients for a long time now. The ultimate goal of machine learning is to predict something based on pre-existing conditions. Sometimes this prediction is as easy as answering a question with yes or no, but sometimes the prediction could be as complicated as identifying the model and manufacturer of an airplane flying 36,000 feet above our heads.

New Media Manitoba, Red River College and The Ace Project Space are happy to present this informative Lunch+Learn that will try to explain some of the techniques used in machine learning.

Haider Al-Saidi presented Machine Learning Techniques on October 20th at Red River College for New Media Manitoba.

Transcript

  1. Machine learning Haider Al-Saidi “Learning is a change in cognitive

    structures that occurs as a result of experience.” Author Unknown
  2. Which one is written by human? 1. “Kitty couldn’t fall

    asleep for a long time. Her nerves were strained as two tight strings, and even a glass of hot wine, that Vronsky made her drink, did not help her. Lying in bed she kept going over and over that monstrous scene at the meadow.” 2. “A shallow magnitude 4.7 earthquake was reported Monday morning five miles from Westwood, California, according to the U.S. Geological Survey. The temblor occurred at 6:25 a.m. Pacific time at a depth of 5.0 miles.” 3. “I was laid out sideways on a soft American van seat, several young men still plying me with vodkas that I dutifully drank, because for a Russian it is impolite to refuse.” NY Times quiz: which one of the above versus written by human?
  3. Machine learning at Red River College  Coming up 15

    hour introductory Machine Learning course  Various research projects in Machine learning  Planning a Centre for Machine Learning Studies  Hardware high computational capabilities to support ML activities  More academic courses and seminars to be offered in the field
  4. The Field of Artificial Intelligence Artificial Intelligence Machine Learning Deep

    Learning
  5. Charles Babbage (1791-1871)

  6. Lady Ada Lovelace - 1837 “It can do whatever we

    know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths.” Lady Lovelace describing the “Analytical Engine” by Charles Babbage
  7. Channel 4 Humanoid

  8. ML Applications  Natural Language Processing (NLP)  Virtual Agents

     Decision Management  Robotic Process Automation  Weather and Earthquake Prediction  Netflix movie suggestions  Spam messages identifier  Biometric recognition  Medical diagnosis and analysis  Machine vision  Understanding human Genome “Hitchhiker’s Guide To the Galaxy” A book by Douglas Adams
  9. Plato vs. Aristotle

  10. Scientific Modeling Observe a phenomenon 1 Construct a model for

    that phenomenon 2 Make predictions using this model 3 “Introduction to Statistical Learning Theory” by Bousquet, Boucheron, & Lugosi
  11. Crossing a street  Pedestrian crossing from the green side

    of the street to the red side.  ∆ = ∆ ------------ (No acceleration)  ∆ = 0 . ∆ + ⁄ 1 2 . . ∆2 d1 d2 C1 C2 m ∆
  12. Identifying the features space  Feature: is a measurable property

    of an observable.  Examples: distance, speed, colour, weight, … ∆ m 20 meters - 20 km/hr + 5 m 2 km/hr 30 meters + 20 km/hr + 5 m 2 km/hr 40 meters + 35 km/hr - 5 m 2 km/hr 50 meters + 30 km/hr - 5 m 2 km/hr Feature Space
  13. Collecting the training set ∆ m 20 meters - 20

    km/hr + 5 m 2 km/hr Don’t cross 30 meters + 20 km/hr + 5 m 2 km/hr Cross 40 meters + 35 km/hr - 5 m 2 km/hr Don’t cross 50 meters + 30 km/hr - 5 m 2 km/hr Cross Features Labels/Classes Training Set
  14. Decision Boundary Cross Don’t cross ∆ m 20 meters -

    20 km/hr + 5 m 2 km/hr Don’t cross 30 meters + 20 km/hr + 5 m 2 km/hr Cross 40 meters + 35 km/hr - 5 m 2 km/hr Don’t cross 50 meters + 30 km/hr - 5 m 2 km/hr Cross
  15. Machine Learning Approach (Classification) Apply Apply the new rules to

    new unclassified data Classify The decision boundaries will become the classification rules Divide Use the training set to divide the feature space into regions separated by decision boundaries. Collect Collect a training set (Features + Labels) Identify Identify features space Training Classification
  16. Training-Prediction system overview Training Dataset Training Algorithm Classification New Data

    Result
  17. Logistic Regression Features Space Cross Don’t cross

  18. Optimizing Logistic Regression A B C

  19. Support Vector Machine (SVP)  The idea of SVM is

    to find the Line, Plane, or Hyperplane that maximizes the margin. Margin Decision Boundary
  20. From Regression to Perceptron 1 , 2 , … ,

    = 1 1 + 2 2 + ⋯ + = . R L x 1 x 2 x 3 x n . . . w 1 w 2 w 3 w n � . Output R/L
  21. The Perceptron Model Output Source: Wikipedia x 1 x 2

    x 3 x n . . . w 1 w 2 w 3 w n � . Training path Output
  22. Neural Network Source: “Using neural nets to recognize handwritten digits”

    Book by: Michael Nielsen
  23. Deep Learning Part of Machine Learning Multiple layers to determine

    what it needs to learn Unsupervised (adaptive) learning techniques
  24. Deep Learning  Deep learning is a type of machine

    learning in which a model learns to perform classification tasks directly from images, text, or sound. Deep learning is usually implemented using a neural network architecture. The term “deep” refers to the number of layers in the network—the more layers, the deeper the network. Traditional neural networks contain only 2 or 3 layers, while deep networks can have hundreds. “Introducing Deep Learning with Matlab” Mathworks  In Machine Learning we feed the features to the machine, in Deep Learning the machine will learn the features itself.
  25. K- Nearest Neighbors (KNN)  Based on closest K neighbors

    majority votes  Let k = 5  The unknown point is u  Out of the 5 closest points to u, {a, b, d, e} belong to one class while c belongs to different class  u must belong to the same class of a, b, d, and e d1 d2 d3 d4 d5 b a c u d e
  26. So, is the green thing an apple or an orange?

    The picture is one of PMI ads signifying their uniqueness
  27. Invoking probability theory  How about problems where Euclidian metric

    is not an option?  Examples: Identifying spam messages, separating objects by colours  In these cases we rely on probability theory for classification.
  28. Decision Tree Using Shannon Entropy to split the tree based

    on paths that minimize the information Entropy
  29. Identifying SPAM e-mails Document Class Please don’t forget the milk,

    red label is the best h Secure your investment with our bank s Our bank offers the best mortgage rates s You need to pick up the kids h No need to think, our bank is the best s Using Naïve Bayes approach to identify spam messages
  30. Using Bayesian Probabilities please don’t forget the milk red label

    is best secure your investment with our bank offers mortgage rates you need to Pick up kids no think Stats s 0 0 0 2 0 0 0 1 2 1 1 1 1 3 3 1 1 1 0 1 1 0 0 0 1 1 h 1 1 1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 Probability s 0 0 0 2 26 0 0 0 1 26 2 26 1 26 1 26 1 26 1 26 3 26 3 26 1 26 1 26 1 26 0 1 26 1 26 0 0 0 1 26 1 26 h 1 26 1 26 1 26 2 26 1 26 1 26 1 26 1 26 1 26 0 0 0 0 0 0 0 0 0 1 26 1 26 1 26 1 26 1 26 1 26 0 0
  31. Eliza the Psychotherapist- 1966 http://www.masswerk.at/elizabot/eliza.html Weizenbaum, Joseph "ELIZA – A

    Computer Program For the Study of Natural Language Communication Between Man and Machine" Communications of the ACM; Volume 9 , Issue 1 (January 1966): p 36-45
  32. Tools • I like Python, but any other language will

    do • You may want to implement all the algorithms from scratch
  33. Differential Analyser - 1935

  34. Entsceidungsproblem (Decision Problem) Posed by David Hilbert in 1928 •Is

    there such an algorithm that takes logical statements as an input and produces, based on this input, a “YES” or “NO” answer that is always correct. The problem is stated as:
  35. Alan Turing – Father of computer Science (1935) Worked on

    a solution to the Entsceidungsproblem (Decision Problem) In 1935 he proposed what is known now as the “Turing Machine” Turing test: “A computer would deserve to be called intelligent if it could deceive a human into believing that it was a human.”
  36. Turing Machine “Unless there is something mystical or magical about

    human thinking, intelligence can be achieved by computer” [Our Final Invention, by James Barrat] It’s a hypothetical machine consists of an infinite recording tape divided into cells. In each cell there is a symbol that instructs the machine to move the tape back and forth and places the machine in one state of pre- defined table of states.
  37. State A Write 1 If 0 move R If 1

    move L State C Write 1 If 0 move R If 1 move L State B If 0 Write 1 If 1 write 0 If 0 move R If 1 move L HALT 0 1 1 0 0 Turing Machine - Example 0 1 1 0 1 0 1 Recording Head
  38. Elliot 401 - 1954

  39. Expert System 1970’s-1980’s  Emulates human experts  Contains a

    knowledge base which contains data represents the rules and the facts.  Never gained momentum  Lack intelligence  Few applications in:  Medicine  Business  Law
  40. Tensorflow playground  Tensorflow Playground app  Java script app

     Free
  41. Data classification  Nominal data {True , False}, {Red, Green,

    Blue}, (1, 0}, {Apple, Orange, Kiwi, Mango}  Continuous data  Temperature readings  Humidity data  Probabilities
  42. Features, and Inputs  Feature: is a measurable property of

    an observable.  Example: Colour is a feature of an apple (observable).  Input: is the measurement observed for a certain feature that belongs to an observable.  Example: The input of the colour feature is Red.  The inputs vector is denoted by = 1 , 2 , … , . Where n is the number of features.  We also define the set of input vectors: ()where j represents the jth input vector.
  43. Adding Sigmoid Function5 x0 x1 x2 xn . . .

    w0 w1 w2 wn � . Sigmoid Output Training Algorithm
  44. Outputs, Targets, Labels, and Training Examples  Output ()is the

    output of the jth input vector.  If the output is a desired value for a known input vector then it is called a target or a label.  Training example is the set that contains an input vector and a target: 𝑇𝑇 𝐸𝐸𝐸𝐸 = { , }
  45. () Classification Training h(x) () Classification Training h(x) 1 ()

    2 () 3 () () ()
  46. Data sets  Dataset: set or collection of value sets

     Datasets presented as tables where each column in the table contains measurements for particular feature.  One of the columns is reserved for the Label  The Data set could be training data set if it contains labels column or testing data set if it does not contain a label column Dataset Colour Shape Weight Label
  47. The MNIST Datasets  It’s a set of hand written

    digits with the corresponding labels  MNIST set: http://yann.lecun.com/exdb/mnist/
  48. The Iris Dataset Introduced by the British statistician and biologist

    Ronal d Fisher in his 1936 paper Source: Wikipedia
  49. Classification vs. Regression  If the target variable of the

    output () is continuous, then the learning problem is a regression problem.  If the target variable of the output () is nominal (discrete), then the learning problem is a classification problem.
  50. What is noise?  Any unwanted or undesirable outcome of

    an experiment  Noise is associated with measurements  Example:  Assume the desired outcome of an experiment to identify motorized vehicle is a “car”  If the outcome is “bicycle”, then the dataset is a noise
  51. Training-Prediction system overview Supervised system Training Dataset Training Algorithm Classification

    New Data Prediction Action