Machine Learning 101
Ali Akbar Septiandri
Universitas Al Azhar Indonesia
Slide 2
Slide 2 text
Previously...
Slide 3
Slide 3 text
Cross Industry Standard Process for Data Mining
(CRISP-DM)
Slide 4
Slide 4 text
Data Science Venn Diagram
Slide 5
Slide 5 text
What is the role of machine
learning algorithms?
Slide 6
Slide 6 text
“Fundamentally, machine
learning involves building
mathematical models to
help understand data.”
- Jake VanderPlas
Slide 7
Slide 7 text
Tasks in Machine Learning
1. Predicting stock price
2. Differentiating cat vs. dog pictures
3. Spam identification
4. Community detection
5. Mimicking famous painting style
6. Mastering the game of go and chess
7. etc.
Slide 8
Slide 8 text
Task Categories
1. Supervised learning
a. Predicting stock price
b. Differentiating cat vs. dog pictures
c. Spam identification
2. Unsupervised learning
a. Community detection
b. Mimicking famous painting style
3. Reinforcement learning
a. Mastering the game of go and chess
Slide 9
Slide 9 text
- Iris Dataset
- by R.A. Fisher (1936)
- 4 attributes: sepal length, sepal width, petal length, petal
width
- 3 labels: Iris Setosa, Iris Versicolour, Iris Virginica
Let’s take an example dataset...
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
Nearest Neighbour
- Finding the closest reference
- What does it mean by “closest”?
- Humans comprehend visualisations very well
- Can computers do the same?
Slide 16
Slide 16 text
At the lowest level,
computers only
understand 0 or 1
Slide 17
Slide 17 text
Euclidean Distance
Slide 18
Slide 18 text
Euclidean Distance
Slide 19
Slide 19 text
Are you sure?
Slide 20
Slide 20 text
1. Find some k closest references
2. Use majority vote
3. We need to compute pairwise distances
k-Nearest Neighbours
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
Conventional
statistics can not
do that
Slide 23
Slide 23 text
We need high
computational
power
Slide 24
Slide 24 text
What if we only want to see the
subgroups in the data?
Slide 25
Slide 25 text
Clustering
- Finding subgroups in the data
- Your neighbours in the same housing complex regardless
of their class
- Unsupervised learning
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
k-Means Clustering
Slide 28
Slide 28 text
k-Means Clustering
1. Uses Euclidean distance as well
2. k = number of clusters
3. Centroids to represent clusters
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
Deep Learning
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
Digit
Recognition
MNIST Dataset
Slide 35
Slide 35 text
Classifying objects from pictures [Krizhevsky, 2009]