Machine learning
Haider Al-Saidi
“Learning is a change in
cognitive structures that
occurs as a result of
experience.”
Author Unknown
Slide 2
Slide 2 text
Which one is written by human?
1. “Kitty couldn’t fall asleep for a long time. Her nerves were strained as two
tight strings, and even a glass of hot wine, that Vronsky made her drink, did
not help her. Lying in bed she kept going over and over that monstrous
scene at the meadow.”
2. “A shallow magnitude 4.7 earthquake was reported Monday morning five
miles from Westwood, California, according to the U.S. Geological Survey.
The temblor occurred at 6:25 a.m. Pacific time at a depth of 5.0 miles.”
3. “I was laid out sideways on a soft American van seat, several young men
still plying me with vodkas that I dutifully drank, because for a Russian it is
impolite to refuse.”
NY Times quiz: which one of the above versus written by human?
Slide 3
Slide 3 text
Machine learning at Red River College
Coming up 15 hour introductory Machine Learning course
Various research projects in Machine learning
Planning a Centre for Machine Learning Studies
Hardware high computational capabilities to support ML activities
More academic courses and seminars to be offered in the field
Slide 4
Slide 4 text
The Field of Artificial Intelligence
Artificial Intelligence
Machine Learning
Deep Learning
Slide 5
Slide 5 text
Charles Babbage (1791-1871)
Slide 6
Slide 6 text
Lady Ada Lovelace -
1837
“It can do whatever we know how to order
it to perform. It can follow analysis; but it has
no power of anticipating any analytical
relations or truths.”
Lady Lovelace describing the “Analytical Engine” by
Charles Babbage
Slide 7
Slide 7 text
Channel 4
Humanoid
Slide 8
Slide 8 text
ML Applications
Natural Language Processing (NLP)
Virtual Agents
Decision Management
Robotic Process Automation
Weather and Earthquake Prediction
Netflix movie suggestions
Spam messages identifier
Biometric recognition
Medical diagnosis and analysis
Machine vision
Understanding human Genome “Hitchhiker’s Guide To the Galaxy”
A book by Douglas Adams
Slide 9
Slide 9 text
Plato vs. Aristotle
Slide 10
Slide 10 text
Scientific Modeling
Observe a
phenomenon
1
Construct a
model for that
phenomenon
2
Make
predictions using
this model
3
“Introduction to Statistical Learning Theory” by Bousquet, Boucheron, & Lugosi
Slide 11
Slide 11 text
Crossing a street
Pedestrian crossing from the
green side of the street to the
red side.
∆ = ∆
------------ (No acceleration)
∆ = 0
. ∆ + ⁄
1
2
. . ∆2
d1
d2
C1
C2
m ∆
Slide 12
Slide 12 text
Identifying the features space
Feature: is a measurable property of an observable.
Examples: distance, speed, colour, weight, …
∆ m
20 meters - 20 km/hr + 5 m 2 km/hr
30 meters + 20 km/hr + 5 m 2 km/hr
40 meters + 35 km/hr - 5 m 2 km/hr
50 meters + 30 km/hr - 5 m 2 km/hr
Feature Space
Slide 13
Slide 13 text
Collecting the training set
∆ m
20 meters - 20 km/hr + 5 m 2 km/hr Don’t cross
30 meters + 20 km/hr + 5 m 2 km/hr Cross
40 meters + 35 km/hr - 5 m 2 km/hr Don’t cross
50 meters + 30 km/hr - 5 m 2 km/hr Cross
Features
Labels/Classes
Training Set
Machine
Learning
Approach
(Classification)
Apply Apply the new rules to new unclassified data
Classify The decision boundaries will become the classification rules
Divide Use the training set to divide the feature space into regions
separated by decision boundaries.
Collect Collect a training set (Features + Labels)
Identify Identify features space
Training
Classification
Slide 16
Slide 16 text
Training-Prediction system overview
Training Dataset
Training Algorithm
Classification
New Data Result
Slide 17
Slide 17 text
Logistic Regression
Features Space
Cross
Don’t cross
Slide 18
Slide 18 text
Optimizing Logistic Regression
A B C
Slide 19
Slide 19 text
Support Vector Machine (SVP)
The idea of SVM is to find the Line,
Plane, or Hyperplane that maximizes
the margin.
Margin
Decision Boundary
Slide 20
Slide 20 text
From Regression to Perceptron
1
, 2
, … ,
= 1
1
+ 2
2
+ ⋯ +
= .
R
L
x
1
x
2
x
3
x
n
.
.
.
w
1
w
2
w
3
w
n
�
.
Output
R/L
Slide 21
Slide 21 text
The Perceptron Model
Output
Source: Wikipedia
x
1
x
2
x
3
x
n
.
.
.
w
1
w
2
w
3
w
n
�
.
Training path
Output
Slide 22
Slide 22 text
Neural Network
Source: “Using neural nets to recognize handwritten digits”
Book by: Michael Nielsen
Slide 23
Slide 23 text
Deep Learning
Part of
Machine
Learning
Multiple layers
to determine
what it needs
to learn
Unsupervised
(adaptive)
learning
techniques
Slide 24
Slide 24 text
Deep Learning
Deep learning is a type of machine learning in which a model learns to
perform classification tasks directly from images, text, or sound. Deep
learning is usually implemented using a neural network architecture. The
term “deep” refers to the number of layers in the network—the more layers,
the deeper the network. Traditional neural networks contain only 2 or 3
layers, while deep networks can have hundreds.
“Introducing Deep Learning with Matlab” Mathworks
In Machine Learning we feed the features to the machine, in Deep
Learning the machine will learn the features itself.
Slide 25
Slide 25 text
K- Nearest Neighbors (KNN)
Based on closest K neighbors majority votes
Let k = 5
The unknown point is u
Out of the 5 closest points to u,
{a, b, d, e} belong to one class while
c belongs to different class
u must belong to the same class of
a, b, d, and e
d1
d2
d3
d4
d5
b
a
c
u
d
e
Slide 26
Slide 26 text
So, is the green thing an apple or an
orange?
The picture is one of PMI ads signifying their uniqueness
Slide 27
Slide 27 text
Invoking probability theory
How about problems where Euclidian metric is not an option?
Examples: Identifying spam messages, separating objects by colours
In these cases we rely on probability theory for classification.
Slide 28
Slide 28 text
Decision Tree
Using Shannon Entropy to split the tree based on paths
that minimize the information Entropy
Slide 29
Slide 29 text
Identifying SPAM e-mails
Document Class
Please don’t forget the milk, red label is the best h
Secure your investment with our bank s
Our bank offers the best mortgage rates s
You need to pick up the kids h
No need to think, our bank is the best s
Using Naïve Bayes approach to identify spam messages
Slide 30
Slide 30 text
Using Bayesian Probabilities
please
don’t
forget
the
milk
red
label
is
best
secure
your
investment
with
our
bank
offers
mortgage
rates
you
need
to
Pick
up
kids
no
think
Stats
s 0 0 0 2 0 0 0 1 2 1 1 1 1 3 3 1 1 1 0 1 1 0 0 0 1 1
h 1 1 1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0
Probability
s 0 0 0
2
26
0 0 0
1
26
2
26
1
26
1
26
1
26
1
26
3
26
3
26
1
26
1
26
1
26
0
1
26
1
26
0 0 0
1
26
1
26
h 1
26
1
26
1
26
2
26
1
26
1
26
1
26
1
26
1
26
0 0 0 0 0 0 0 0 0
1
26
1
26
1
26
1
26
1
26
1
26
0 0
Slide 31
Slide 31 text
Eliza the Psychotherapist- 1966
http://www.masswerk.at/elizabot/eliza.html
Weizenbaum, Joseph "ELIZA – A Computer
Program For the Study of Natural Language
Communication Between Man and
Machine"
Communications of the ACM; Volume 9 ,
Issue 1 (January 1966): p 36-45
Slide 32
Slide 32 text
Tools
• I like Python, but any
other language will
do
• You may want to
implement all the
algorithms from
scratch
Slide 33
Slide 33 text
Differential Analyser - 1935
Slide 34
Slide 34 text
Entsceidungsproblem (Decision Problem)
Posed by David Hilbert in 1928
•Is there such an algorithm that takes logical
statements as an input and produces, based
on this input, a “YES” or “NO” answer that is
always correct.
The problem is stated as:
Slide 35
Slide 35 text
Alan Turing
– Father of
computer
Science
(1935)
Worked on a solution to the
Entsceidungsproblem (Decision
Problem)
In 1935 he proposed what is known now
as the “Turing Machine”
Turing test: “A computer would deserve
to be called intelligent if it could
deceive a human into believing that it
was a human.”
Slide 36
Slide 36 text
Turing Machine
“Unless there is something
mystical or magical about human
thinking, intelligence can be
achieved by computer” [Our
Final Invention, by James Barrat]
It’s a hypothetical machine
consists of an infinite recording
tape divided into cells. In each
cell there is a symbol that instructs
the machine to move the tape
back and forth and places the
machine in one state of pre-
defined table of states.
Slide 37
Slide 37 text
State A
Write 1
If 0 move R
If 1 move L
State C
Write 1
If 0 move R
If 1 move L
State B
If 0 Write 1
If 1 write 0
If 0 move R
If 1 move L
HALT
0
1
1
0
0
Turing Machine - Example
0 1 1 0
1
0
1
Recording Head
Slide 38
Slide 38 text
Elliot 401 - 1954
Slide 39
Slide 39 text
Expert System 1970’s-1980’s
Emulates human experts
Contains a knowledge base which contains data represents the rules and
the facts.
Never gained momentum
Lack intelligence
Few applications in:
Medicine
Business
Law
Data classification
Nominal data
{True , False}, {Red, Green, Blue}, (1, 0}, {Apple, Orange, Kiwi, Mango}
Continuous data
Temperature readings
Humidity data
Probabilities
Slide 42
Slide 42 text
Features, and Inputs
Feature: is a measurable property of an observable.
Example: Colour is a feature of an apple (observable).
Input: is the measurement observed for a certain feature that belongs to an
observable.
Example: The input of the colour feature is Red.
The inputs vector is denoted by = 1
, 2
, … ,
. Where n is the number of
features.
We also define the set of input vectors: ()where j represents the jth input vector.
Outputs, Targets, Labels, and Training
Examples
Output ()is the output of the jth input vector.
If the output is a desired value for a known input vector then it is called a
target or a label.
Training example is the set that contains an input vector and a target:
𝑇𝑇 𝐸𝐸𝐸𝐸 = {
, }
Slide 45
Slide 45 text
() Classification
Training
h(x)
()
Classification
Training
h(x)
1
()
2
()
3
()
()
()
Slide 46
Slide 46 text
Data sets
Dataset: set or collection of value sets
Datasets presented as tables where
each column in the table contains
measurements for particular feature.
One of the columns is reserved for the
Label
The Data set could be training data set
if it contains labels column or testing
data set if it does not contain a label
column
Dataset
Colour
Shape
Weight
Label
Slide 47
Slide 47 text
The MNIST Datasets
It’s a set of hand written digits with the corresponding labels
MNIST set: http://yann.lecun.com/exdb/mnist/
Slide 48
Slide 48 text
The Iris Dataset
Introduced by the
British statistician and biologist Ronal
d Fisher in his 1936 paper
Source: Wikipedia
Slide 49
Slide 49 text
Classification vs. Regression
If the target variable of the output () is continuous, then the learning
problem is a regression problem.
If the target variable of the output () is nominal (discrete), then the
learning problem is a classification problem.
Slide 50
Slide 50 text
What is noise?
Any unwanted or undesirable outcome of an experiment
Noise is associated with measurements
Example:
Assume the desired outcome of an experiment to identify motorized vehicle is a
“car”
If the outcome is “bicycle”, then the dataset is a noise
Slide 51
Slide 51 text
Training-Prediction system overview
Supervised system
Training Dataset
Training Algorithm
Classification
New Data Prediction
Action