Slide 1

Slide 1 text

Unsupervised  Learning  with  Python A  presentation  for  Python Brasil [12] by  Breandan Considine @breandan

Slide 2

Slide 2 text

Why  machine  learning? •ML  is  just  a  bunch  of  tools,  for,  example: •Prediction •Categorization •Anomaly  detection •Personalization •Adaptive  control •Playing  games

Slide 3

Slide 3 text

Types  of  machine  learning

Slide 4

Slide 4 text

Reinforcement  Learning •Agent  has  a  context  and  set  of  choices •Each  choice  has  an  (unknown)  reward •Goal:  Maximize  cumulative  reward

Slide 5

Slide 5 text

A  quick  taste  of  supervised  learning

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

0 1

Slide 11

Slide 11 text

Cool  learning  algorithm def classify(datapoint, weights):

Slide 12

Slide 12 text

Cool  learning  algorithm def classify(datapoint, weights): prediction = sum(x * y for x, y in zip([1] + datapoint, weights))

Slide 13

Slide 13 text

Cool  learning  algorithm def classify(datapoint, weights): prediction = sum(x * y for x, y in zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1

Slide 14

Slide 14 text

Cool  learning  algorithm def classify(datapoint, weights): prediction = sum(x * y for x, y in zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1

Slide 15

Slide 15 text

Cool  learning  algorithm def train(data_set):

Slide 16

Slide 16 text

Cool  learning  algorithm def train(data_set):

Slide 17

Slide 17 text

Cool  learning  algorithm def train(data_set): class Datum: def __init__(self, features, label): self.features = [1] + features self.label = label

Slide 18

Slide 18 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) [0, 0, 0]

Slide 19

Slide 19 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) total_error = threshold + 1

Slide 20

Slide 20 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)

Slide 21

Slide 21 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)

Slide 22

Slide 22 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)

Slide 23

Slide 23 text

weights = [w + RATE * error * i for w, i in zip(weights, item.features)] Cool  learning  algorithm * 1 i1 i2 in * * * w0 w1 w2 wn Σ

Slide 24

Slide 24 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)

Slide 25

Slide 25 text

Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features) total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

A  brief  look  at  unsupervised  learning •Where  did  my  labels  go? •Mostly  clustering,  seperation,  association •Many  different  methods • Self  organizing  map • Expectation-­‐maximization • Association  rule  learning • Hierarchical  clustering

Slide 30

Slide 30 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000):

Slide 31

Slide 31 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False

Slide 32

Slide 32 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels

Slide 33

Slide 33 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1

Slide 34

Slide 34 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers)

Slide 35

Slide 35 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers)

Slide 36

Slide 36 text

Cool  clustering  algorithm def update_labels(data, centers): for datum in data: datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]

Slide 37

Slide 37 text

Cool  clustering  algorithm def update_labels(data, centers): for datum in data: datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]

Slide 38

Slide 38 text

Cool  clustering  algorithm def update_labels(data, centers): for datum in data: datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]

Slide 39

Slide 39 text

Cool  clustering  algorithm def update_labels(data, centers): for datum in data: datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]

Slide 40

Slide 40 text

Cool  clustering  algorithm def update_labels(data, centers): for datum in data: datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]

Slide 41

Slide 41 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers)

Slide 42

Slide 42 text

Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)

Slide 43

Slide 43 text

Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)

Slide 44

Slide 44 text

Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)

Slide 45

Slide 45 text

Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)

Slide 46

Slide 46 text

Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)

Slide 47

Slide 47 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers) return labeled

Slide 48

Slide 48 text

Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data, np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers) return labeled

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Data  pre-­‐processing •Data  selection •Data  processing • Formatting &  Cleaning • Sampling •Data  transformation • Feature  scaling  &  Normalization • Decomposition  &  Aggregation • Dimensionality  reduction

Slide 56

Slide 56 text

Principal  Component  Analysis

Slide 57

Slide 57 text

Further  resources •Code  for  slides  github.com/breandan/ml-­‐exercises •Neural  Networks  Demystified,  Stephen  Welch, •Machine  Learning,  Andrew  Ng   https://www.coursera.org/learn/machine-­‐learnin •Awesome  public  data  sets   github.com/caesar0301/awesome-­‐public-­‐datasets

Slide 58

Slide 58 text

Special  Thanks Hanneli Tavante Ilma Rodriguez Vilma  Rodriguez