Slide 1

Slide 1 text

Machine learning Haider Al-Saidi “Learning is a change in cognitive structures that occurs as a result of experience.” Author Unknown

Slide 2

Slide 2 text

Which one is written by human? 1. “Kitty couldn’t fall asleep for a long time. Her nerves were strained as two tight strings, and even a glass of hot wine, that Vronsky made her drink, did not help her. Lying in bed she kept going over and over that monstrous scene at the meadow.” 2. “A shallow magnitude 4.7 earthquake was reported Monday morning five miles from Westwood, California, according to the U.S. Geological Survey. The temblor occurred at 6:25 a.m. Pacific time at a depth of 5.0 miles.” 3. “I was laid out sideways on a soft American van seat, several young men still plying me with vodkas that I dutifully drank, because for a Russian it is impolite to refuse.” NY Times quiz: which one of the above versus written by human?

Slide 3

Slide 3 text

Machine learning at Red River College  Coming up 15 hour introductory Machine Learning course  Various research projects in Machine learning  Planning a Centre for Machine Learning Studies  Hardware high computational capabilities to support ML activities  More academic courses and seminars to be offered in the field

Slide 4

Slide 4 text

The Field of Artificial Intelligence Artificial Intelligence Machine Learning Deep Learning

Slide 5

Slide 5 text

Charles Babbage (1791-1871)

Slide 6

Slide 6 text

Lady Ada Lovelace - 1837 “It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths.” Lady Lovelace describing the “Analytical Engine” by Charles Babbage

Slide 7

Slide 7 text

Channel 4 Humanoid

Slide 8

Slide 8 text

ML Applications  Natural Language Processing (NLP)  Virtual Agents  Decision Management  Robotic Process Automation  Weather and Earthquake Prediction  Netflix movie suggestions  Spam messages identifier  Biometric recognition  Medical diagnosis and analysis  Machine vision  Understanding human Genome “Hitchhiker’s Guide To the Galaxy” A book by Douglas Adams

Slide 9

Slide 9 text

Plato vs. Aristotle

Slide 10

Slide 10 text

Scientific Modeling Observe a phenomenon 1 Construct a model for that phenomenon 2 Make predictions using this model 3 “Introduction to Statistical Learning Theory” by Bousquet, Boucheron, & Lugosi

Slide 11

Slide 11 text

Crossing a street  Pedestrian crossing from the green side of the street to the red side.  ∆ = ∆ ------------ (No acceleration)  ∆ = 0 . ∆ + ⁄ 1 2 . . ∆2 d1 d2 C1 C2 m ∆

Slide 12

Slide 12 text

Identifying the features space  Feature: is a measurable property of an observable.  Examples: distance, speed, colour, weight, … ∆ m 20 meters - 20 km/hr + 5 m 2 km/hr 30 meters + 20 km/hr + 5 m 2 km/hr 40 meters + 35 km/hr - 5 m 2 km/hr 50 meters + 30 km/hr - 5 m 2 km/hr Feature Space

Slide 13

Slide 13 text

Collecting the training set ∆ m 20 meters - 20 km/hr + 5 m 2 km/hr Don’t cross 30 meters + 20 km/hr + 5 m 2 km/hr Cross 40 meters + 35 km/hr - 5 m 2 km/hr Don’t cross 50 meters + 30 km/hr - 5 m 2 km/hr Cross Features Labels/Classes Training Set

Slide 14

Slide 14 text

Decision Boundary Cross Don’t cross ∆ m 20 meters - 20 km/hr + 5 m 2 km/hr Don’t cross 30 meters + 20 km/hr + 5 m 2 km/hr Cross 40 meters + 35 km/hr - 5 m 2 km/hr Don’t cross 50 meters + 30 km/hr - 5 m 2 km/hr Cross

Slide 15

Slide 15 text

Machine Learning Approach (Classification) Apply Apply the new rules to new unclassified data Classify The decision boundaries will become the classification rules Divide Use the training set to divide the feature space into regions separated by decision boundaries. Collect Collect a training set (Features + Labels) Identify Identify features space Training Classification

Slide 16

Slide 16 text

Training-Prediction system overview Training Dataset Training Algorithm Classification New Data Result

Slide 17

Slide 17 text

Logistic Regression Features Space Cross Don’t cross

Slide 18

Slide 18 text

Optimizing Logistic Regression A B C

Slide 19

Slide 19 text

Support Vector Machine (SVP)  The idea of SVM is to find the Line, Plane, or Hyperplane that maximizes the margin. Margin Decision Boundary

Slide 20

Slide 20 text

From Regression to Perceptron 1 , 2 , … , = 1 1 + 2 2 + ⋯ + = . R L x 1 x 2 x 3 x n . . . w 1 w 2 w 3 w n � . Output R/L

Slide 21

Slide 21 text

The Perceptron Model Output Source: Wikipedia x 1 x 2 x 3 x n . . . w 1 w 2 w 3 w n � . Training path Output

Slide 22

Slide 22 text

Neural Network Source: “Using neural nets to recognize handwritten digits” Book by: Michael Nielsen

Slide 23

Slide 23 text

Deep Learning Part of Machine Learning Multiple layers to determine what it needs to learn Unsupervised (adaptive) learning techniques

Slide 24

Slide 24 text

Deep Learning  Deep learning is a type of machine learning in which a model learns to perform classification tasks directly from images, text, or sound. Deep learning is usually implemented using a neural network architecture. The term “deep” refers to the number of layers in the network—the more layers, the deeper the network. Traditional neural networks contain only 2 or 3 layers, while deep networks can have hundreds. “Introducing Deep Learning with Matlab” Mathworks  In Machine Learning we feed the features to the machine, in Deep Learning the machine will learn the features itself.

Slide 25

Slide 25 text

K- Nearest Neighbors (KNN)  Based on closest K neighbors majority votes  Let k = 5  The unknown point is u  Out of the 5 closest points to u, {a, b, d, e} belong to one class while c belongs to different class  u must belong to the same class of a, b, d, and e d1 d2 d3 d4 d5 b a c u d e

Slide 26

Slide 26 text

So, is the green thing an apple or an orange? The picture is one of PMI ads signifying their uniqueness

Slide 27

Slide 27 text

Invoking probability theory  How about problems where Euclidian metric is not an option?  Examples: Identifying spam messages, separating objects by colours  In these cases we rely on probability theory for classification.

Slide 28

Slide 28 text

Decision Tree Using Shannon Entropy to split the tree based on paths that minimize the information Entropy

Slide 29

Slide 29 text

Identifying SPAM e-mails Document Class Please don’t forget the milk, red label is the best h Secure your investment with our bank s Our bank offers the best mortgage rates s You need to pick up the kids h No need to think, our bank is the best s Using Naïve Bayes approach to identify spam messages

Slide 30

Slide 30 text

Using Bayesian Probabilities please don’t forget the milk red label is best secure your investment with our bank offers mortgage rates you need to Pick up kids no think Stats s 0 0 0 2 0 0 0 1 2 1 1 1 1 3 3 1 1 1 0 1 1 0 0 0 1 1 h 1 1 1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 Probability s 0 0 0 2 26 0 0 0 1 26 2 26 1 26 1 26 1 26 1 26 3 26 3 26 1 26 1 26 1 26 0 1 26 1 26 0 0 0 1 26 1 26 h 1 26 1 26 1 26 2 26 1 26 1 26 1 26 1 26 1 26 0 0 0 0 0 0 0 0 0 1 26 1 26 1 26 1 26 1 26 1 26 0 0

Slide 31

Slide 31 text

Eliza the Psychotherapist- 1966 http://www.masswerk.at/elizabot/eliza.html Weizenbaum, Joseph "ELIZA – A Computer Program For the Study of Natural Language Communication Between Man and Machine" Communications of the ACM; Volume 9 , Issue 1 (January 1966): p 36-45

Slide 32

Slide 32 text

Tools • I like Python, but any other language will do • You may want to implement all the algorithms from scratch

Slide 33

Slide 33 text

Differential Analyser - 1935

Slide 34

Slide 34 text

Entsceidungsproblem (Decision Problem) Posed by David Hilbert in 1928 •Is there such an algorithm that takes logical statements as an input and produces, based on this input, a “YES” or “NO” answer that is always correct. The problem is stated as:

Slide 35

Slide 35 text

Alan Turing – Father of computer Science (1935) Worked on a solution to the Entsceidungsproblem (Decision Problem) In 1935 he proposed what is known now as the “Turing Machine” Turing test: “A computer would deserve to be called intelligent if it could deceive a human into believing that it was a human.”

Slide 36

Slide 36 text

Turing Machine “Unless there is something mystical or magical about human thinking, intelligence can be achieved by computer” [Our Final Invention, by James Barrat] It’s a hypothetical machine consists of an infinite recording tape divided into cells. In each cell there is a symbol that instructs the machine to move the tape back and forth and places the machine in one state of pre- defined table of states.

Slide 37

Slide 37 text

State A Write 1 If 0 move R If 1 move L State C Write 1 If 0 move R If 1 move L State B If 0 Write 1 If 1 write 0 If 0 move R If 1 move L HALT 0 1 1 0 0 Turing Machine - Example 0 1 1 0 1 0 1 Recording Head

Slide 38

Slide 38 text

Elliot 401 - 1954

Slide 39

Slide 39 text

Expert System 1970’s-1980’s  Emulates human experts  Contains a knowledge base which contains data represents the rules and the facts.  Never gained momentum  Lack intelligence  Few applications in:  Medicine  Business  Law

Slide 40

Slide 40 text

Tensorflow playground  Tensorflow Playground app  Java script app  Free

Slide 41

Slide 41 text

Data classification  Nominal data {True , False}, {Red, Green, Blue}, (1, 0}, {Apple, Orange, Kiwi, Mango}  Continuous data  Temperature readings  Humidity data  Probabilities

Slide 42

Slide 42 text

Features, and Inputs  Feature: is a measurable property of an observable.  Example: Colour is a feature of an apple (observable).  Input: is the measurement observed for a certain feature that belongs to an observable.  Example: The input of the colour feature is Red.  The inputs vector is denoted by = 1 , 2 , … , . Where n is the number of features.  We also define the set of input vectors: ()where j represents the jth input vector.

Slide 43

Slide 43 text

Adding Sigmoid Function5 x0 x1 x2 xn . . . w0 w1 w2 wn � . Sigmoid Output Training Algorithm

Slide 44

Slide 44 text

Outputs, Targets, Labels, and Training Examples  Output ()is the output of the jth input vector.  If the output is a desired value for a known input vector then it is called a target or a label.  Training example is the set that contains an input vector and a target: 𝑇𝑇 𝐸𝐸𝐸𝐸 = { , }

Slide 45

Slide 45 text

() Classification Training h(x) () Classification Training h(x) 1 () 2 () 3 () () ()

Slide 46

Slide 46 text

Data sets  Dataset: set or collection of value sets  Datasets presented as tables where each column in the table contains measurements for particular feature.  One of the columns is reserved for the Label  The Data set could be training data set if it contains labels column or testing data set if it does not contain a label column Dataset Colour Shape Weight Label

Slide 47

Slide 47 text

The MNIST Datasets  It’s a set of hand written digits with the corresponding labels  MNIST set: http://yann.lecun.com/exdb/mnist/

Slide 48

Slide 48 text

The Iris Dataset Introduced by the British statistician and biologist Ronal d Fisher in his 1936 paper Source: Wikipedia

Slide 49

Slide 49 text

Classification vs. Regression  If the target variable of the output () is continuous, then the learning problem is a regression problem.  If the target variable of the output () is nominal (discrete), then the learning problem is a classification problem.

Slide 50

Slide 50 text

What is noise?  Any unwanted or undesirable outcome of an experiment  Noise is associated with measurements  Example:  Assume the desired outcome of an experiment to identify motorized vehicle is a “car”  If the outcome is “bicycle”, then the dataset is a noise

Slide 51

Slide 51 text

Training-Prediction system overview Supervised system Training Dataset Training Algorithm Classification New Data Prediction Action