Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unsupervised Learning with Python

Unsupervised Learning with Python

Unsupervised learning is a set of techniques que helps discover patterns in unlabeled data. For example, suppose you're looking at dogs - some dogs are tall, others may have long fur. Although you're not exactly sure what they're called, you may notice some repeating patterns and Similarities. Without Knowing what is a "Greyhound", or "German Shepherd" you've already learned Their important attributes! This is one type of unsupervised learning, called clustering. Unsupervised learning is powerful tool for detecting anomolies, finding useful features, and visualizing the structure of unlabeled data. In this workshop we will give you an introduction to unsupervised learning techniques in machine learning. After a crash course on unsupervised learning, we will cover a few strategies used by popular algorithms, including K-means and PCA. During the workshop, we will be using the popular Python machine learning library, scikit-learn. http://scikit-learn.org/

Breandan Considine

October 13, 2016
Tweet

More Decks by Breandan Considine

Other Decks in Programming

Transcript

  1. Why  machine  learning? •ML  is  just  a  bunch  of  tools,

     for,  example: •Prediction •Categorization •Anomaly  detection •Personalization •Adaptive  control •Playing  games
  2. Reinforcement  Learning •Agent  has  a  context  and  set  of  choices

    •Each  choice  has  an  (unknown)  reward •Goal:  Maximize  cumulative  reward
  3. 0 1

  4. Cool  learning  algorithm def classify(datapoint, weights): prediction = sum(x *

    y for x, y in zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1
  5. Cool  learning  algorithm def classify(datapoint, weights): prediction = sum(x *

    y for x, y in zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1
  6. Cool  learning  algorithm def train(data_set): class Datum: def __init__(self, features,

    label): self.features = [1] + features self.label = label
  7. Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  8. Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  9. Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  10. weights = [w + RATE * error * i for

    w, i in zip(weights, item.features)] Cool  learning  algorithm * 1 i1 i2 in * * * w0 w1 w2 wn Σ
  11. Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  12. Cool  learning  algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  13. A  brief  look  at  unsupervised  learning •Where  did  my  labels

     go? •Mostly  clustering,  seperation,  association •Many  different  methods • Self  organizing  map • Expectation-­‐maximization • Association  rule  learning • Hierarchical  clustering
  14. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False
  15. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels
  16. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1
  17. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers)
  18. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers)
  19. Cool  clustering  algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  20. Cool  clustering  algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  21. Cool  clustering  algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  22. Cool  clustering  algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  23. Cool  clustering  algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  24. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers)
  25. Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  26. Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  27. Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  28. Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  29. Cool  clustering  algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  30. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers) return labeled
  31. Cool  clustering  algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers) return labeled
  32. Data  pre-­‐processing •Data  selection •Data  processing • Formatting &  Cleaning

    • Sampling •Data  transformation • Feature  scaling  &  Normalization • Decomposition  &  Aggregation • Dimensionality  reduction
  33. Further  resources •Code  for  slides  github.com/breandan/ml-­‐exercises •Neural  Networks  Demystified,  Stephen

     Welch, •Machine  Learning,  Andrew  Ng   https://www.coursera.org/learn/machine-­‐learnin •Awesome  public  data  sets   github.com/caesar0301/awesome-­‐public-­‐datasets