Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning: An Introduction [Devoxx Belgium ...

Deep Learning: An Introduction [Devoxx Belgium 2016]

Neural networks have seen renewed interest from data scientists and machine learning experts for their ability to accurately classify high-dimensional data. In this session we will discuss the fundamental algorithms behind neural networks, and develop an intuition for how to train a deep neural network using large data sets. We will then use the algorithms we have developed to train a simple handwritten digit recognizer, and illustrate how to generalize this technique to different images. In the second and final part, we will show you how to apply the same algorithms using DL4J, a Spark-based Java library for deep learning. You will learn how to implement a neural network, monitor it's training progress and test its accuracy over time. Prior experience with Java and some basic algebra is a pre-requirement.

Breandan Considine

November 09, 2016
Tweet

More Decks by Breandan Considine

Other Decks in Programming

Transcript

  1. Who am I? • Background in Computer Science, Machine Learning

    • Worked for a small ad-tech startup out of university • Spent two years as Developer Advocate @JetBrains • Interested in machine learning and speech recognition • Enjoy writing code, traveling to conferences, reading • Say hello! @breandan | breandan.net | [email protected]
  2. Who am I? • Background in Computer Science, Machine Learning

    • Worked for a small ad-tech startup out of university • Spent two years as Developer Advocate @JetBrains • Interested in machine learning and speech recognition • Enjoy writing code, traveling to conferences, reading • Say hello! @breandan | breandan.net | [email protected]
  3. Traditional ASR • Requires lots of handmade feature engineering •

    Poor results: >25% WER for HMM architectures
  4. Why machine learning? • Prediction • Categorization • Anomaly detection

    • Personalization • Adaptive control • Playing games
  5. Traditional childhood education • One-size-fits-all curriculum • Teaching process is

    repetitive • Students are not fully engaged • Memorization over understanding • Encouragement can be inconsistent • Teaches to the test (not the real world)
  6. How can we disrupt early education? • Personalized learning •

    Teaching assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  7. How can we disrupt early education? • Personalized learning •

    Teaching assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  8. How can we disrupt early education? • Personalized learning •

    Teaching assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  9. How can we disrupt early education? • Personalized learning •

    Teaching assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  10. How can we disrupt early education? • Personalized learning •

    Teaching assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  11. How can we disrupt early education? • Personalized learning •

    Teaching assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  12. Machine learning, for humans • Self improvement • Language learning

    • Computer training • Special education • Reading comprehension • Content generation
  13. So what’s a Tensor? • A “tensor’ is just an

    n-dimensional array • Useful for working with complex data • We use (tiny) tensors every day!
  14. So what’s a Tensor anyway? • A “tensor’ is just

    an n-dimensional array • Useful for working with complex data • We use (tiny) tensors every day! 't'
  15. So what’s a Tensor anyway? • A “tensor’ is just

    an n-dimensional array • Useful for working with complex data • We use (tiny) tensors every day! 't'
  16. So what’s a Tensor anyway? • A “tensor’ is just

    an n-dimensional array • Useful for working with complex data • We use (tiny) tensors every day! 't'
  17. So what’s a Tensor anyway? • A “tensor’ is just

    an n-dimensional array • Useful for working with complex data • We use (tiny) tensors every day! 't'
  18. So what’s a Tensor anyway? • A “tensor’ is just

    an n-dimensional array • Useful for working with complex data • We use tiny (and large) tensors every day! 't'
  19. What are they good for? • Modeling complex systems, data

    sets • Capturing higher order correlations • Representing dynamic relationships • Doing machine learning!
  20. 0 1

  21. Cool learning algorithm def classify(datapoint, weights): prediction = sum(x *

    y for x, y in zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1
  22. Cool learning algorithm def classify(datapoint, weights): prediction = sum(x *

    y for x, y in zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1
  23. Cool learning algorithm def train(data_set): class Datum: def __init__(self, features,

    label): self.features = [1] + features self.label = label
  24. Cool learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  25. Cool learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  26. Cool learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  27. weights = [w + RATE * error * i for

    w, i in zip(weights, item.features)] Cool learning algorithm * 1 i1 i2 in * * * w0 w1 w2 wn Σ
  28. Cool learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  29. Cool learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: error = item.label – classify(item.features, weights) weights = [w + RATE * error * i for w, i in zip(weights, item.features)] total_error += abs(error)
  30. Even Cooler Algorithm! (Backprop) train(trainingSet) : initialize network weights randomly

    until average error stops decreasing (or you get tired): for each sample in trainingSet: prediction = network.output(sample) compute error (prediction – sample.output) compute error of (hidden -> output) layer weights compute error of (input -> hidden) layer weights update weights across the network save the weights
  31. A brief look at unsupervised learning • Where did my

    labels go? • Mostly clustering, separation, association • Many different methods • Self organizing map • Expectation-maximization • Association rule learning • Reccomender systems
  32. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False
  33. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels
  34. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1
  35. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers)
  36. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers)
  37. Cool clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  38. Cool clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  39. Cool clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  40. Cool clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  41. Cool clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist < min: min = dist datum[-1] = center[-1]
  42. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers)
  43. Cool clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  44. Cool clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  45. Cool clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  46. Cool clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  47. Cool clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = data[data[:, -1] == i, :-1] centers[i - 1, :-1] = np.mean(cluster, axis=0)
  48. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers) return labeled
  49. Cool clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None while it < max_it and not np.array_equal(old_centers, centers): it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled, centers) return labeled
  50. Data pre-processing • Data selection • Data processing • Formatting

    & Cleaning • Sampling • Data transformation • Feature scaling & Normalization • Decomposition & Aggregation • Dimensionality reduction
  51. Reinforcement Learning • Agent has a context and set of

    choices • Each choice has an (unknown) reward • Goal: Maximize cumulative reward
  52. Further resources • CS231 Course Notes • TensorFlow Models •

    Visualizing MNIST • Neural Networks and Deep Learning • Andrew Ng’s Machine Learning class • Awesome Public Datasets • Amy Unruh & Eli Bixby's TensorFlow Workshop