Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tutorial: Deep Learning on Java (Jfokus 2017)

Tutorial: Deep Learning on Java (Jfokus 2017)

Machine learning has recently made enormous progress in a variety of real-world applications, such as computer vision, speech recognition and language processing. And now, Java has the libraries to help you apply these techniques to large data sets, with new Spark-based tools like deeplearning4java (DL4J). In this workshop, you will learn the basic building blocks for deep learning: gradient descent, backpropogation, model training and evaluation. We'll cover how to build and train supervised machine learning models, give you an overview of deep learning and show you how to recognize handwritten digits. You will gain an intuition for how to develop custom models to discover new insights and untapped patterns big data. No prior experience in machine learning is required.

Breandan Considine

February 06, 2017
Tweet

More Decks by Breandan Considine

Other Decks in Programming

Transcript

  1. Who  am I? • Background  in  Computer  Science,  Machine Learning

    • Worked  for  a  small  ad-­‐tech  startup  out  of university • Spent  two  years  as  Developer  Advocate @JetBrains • Interested  in  machine  learning  and  speech recognition • Enjoy  writing  code,  traveling  to  conferences, reading • Say  hello!  @breandan |  breandan.net | [email protected]
  2. Early  Speech  Recognition • Requires  lots  of  handmade  feature engineering

    • Poor  results:  >25%  WER  for  HMM architectures
  3. What  is  machine learning? • Prediction • Categorization • Anomaly

    detection • Personalization • Adaptive control • Playing games
  4. Traditional education • One-­‐size-­‐fits-­‐all curriculum • Teaching  process  is repetitive

    • Students  are  not  fully engaged • Memorization  over understanding • Encouragement  can  be inconsistent • Teaches  to  the  test  (not  the  real  world)
  5. How can we improve  education? • Personalized learning • Teaching

    assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  6. How can we improve  education? • Personalized learning • Teaching

    assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  7. How can we improve  education? • Personalized learning • Teaching

    assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  8. • Personalized learning • Teaching assistance • Adaptive feedback •

    Active engagement • Spaced repetition • Assistive technology How can we improve  education?
  9. • Personalized learning • Teaching assistance • Adaptive feedback •

    Active engagement • Spaced repetition • Assistive technology How can we impvove  education?
  10. How can we impvove  education? • Personalized learning • Teaching

    assistance • Adaptive feedback • Active engagement • Spaced repetition • Assistive technology
  11. Machine  learning,  for humans • Self improvement • Language learning

    • Computer training • Special education • Reading comprehension • Content generation
  12. • A  “tensor’  is  just  an  n-­‐dimensional array • Useful

     for  working  with  complex data • We  use  (tiny)  tensors  every day! What’s  a  Tensor?
  13. 't' What’s  a  Tensor? • A  “tensor’  is  just  an

     n-­‐dimensional array • Useful  for  working  with  complex data • We  use  (tiny)  tensors  every day!
  14. 't' What’s  a  Tensor? • A  “tensor’  is  just  an

     n-­‐dimensional array • Useful  for  working  with  complex data • We  use  (tiny)  tensors  every day!
  15. 't' What’s  a  Tensor? • A  “tensor’  is  just  an

     n-­‐dimensional array • Useful  for  working  with  complex data • We  use  (tiny)  tensors  every day!
  16. 't' What’s  a  Tensor? • A  “tensor’  is  just  an

     n-­‐dimensional array • Useful  for  working  with  complex data • We  use  (tiny)  tensors  every day!
  17. What’s  a  Tensor? 't' • A  “tensor’  is  just  an

     n-­‐dimensional array • Useful  for  working  with  complex data • We  use  (tiny)  tensors  every day!
  18. 0 1

  19. Cool  learning algorithm def classify(datapoint, weights): y for x, y

    in prediction = sum(x * zip([1] + datapoint, weights))
  20. Cool  learning algorithm def classify(datapoint, weights): y for x, y

    in prediction = sum(x * zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1
  21. Cool  learning algorithm def classify(datapoint, weights): y for x, y

    in prediction = sum(x * zip([1] + datapoint, weights)) if prediction < 0: return 0 else: return 1
  22. Cool  learning algorithm def train(data_set): class Datum: def init (self,

    features, label): self.features = [1] + features self.label = label
  23. Cool  learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: weights) error = item.label – classify(item.features, weights = [w + RATE for w, i * error * i in zip(weights, item.features)] total_error += abs(error)
  24. Cool  learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: weights) error = item.label – classify(item.features, weights = [w + RATE for w, i * error * i in zip(weights, item.features)] total_error += abs(error)
  25. Cool  learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: weights) error = item.label – classify(item.features, weights = [w + RATE for w, i * error * i in zip(weights, item.features)] total_error += abs(error)
  26. weights zip(weights, item.features)] Cool  learning algorithm 1 i1 i2 in

    * * * * = [w + RATE * error * i for w, i in w0 w1 w2 wn Σ
  27. Cool  learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: weights) error = item.label – classify(item.features, weights = [w + RATE for w, i * error * i in zip(weights, item.features)] total_error += abs(error)
  28. Cool  learning algorithm def train(data_set): weights = [0] * len(data_set[0].features)

    total_error = threshold + 1 while total_error > threshold: total_error = 0 for item in data_set: weights) error = item.label – classify(item.features, weights = [w + RATE for w, i * error * i in zip(weights, item.features)] total_error += abs(error)
  29. Backpropogation train(trainingSet) : initialize network weights randomly until average error

    stops decreasing (or you get tired): for each sample in trainingSet: prediction = network.output(sample) compute error (prediction – sample.output) compute error of (hidden -> output) layer weights compute error of (input -> hidden) layer weights update weights across the network save the weights
  30. What  is  a  kernel? • A  kernel  is  just  a

     matrix • Used  for  edge  detection,  blurs,  filters
  31. Multi-­‐layer  Network  Configuration MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(12345).optimizationAlgo(STOCHASTIC_GRADIENT_DESCENT) .iterations(1).learningRate(0.006).updater(NESTEROVS).momentum(0.9)

    .regularization(true).l2(1e-4).list().layer(0, new DenseLayer.Builder() .nIn(28 * 28) // Number of input datapoints. .nOut(1000) // Number of output datapoints. .activation(Activation.RELU).weightInit(XAVIER) .build()) .layer(1, new OutputLayer.Builder(NEGATIVELOGLIKELIHOOD) .nIn(1000) .nOut(10) .activation(SOFTMAX).weightInit(XAVIER).build()) .pretrain(false).backprop(true) .build();
  32. Training  the  model DataSetIterator dataSetIterator = ... for(int i=0; i

    < numEpochs; i++) { model.fit(dataSetIterator); }
  33. Evaluation log.info("Evaluating model...."); evaluator = new Evaluation(outputNum); while(dataSetIterator.hasNext()){ DataSet next

    = dataSetIterator.next(); evalaluator.eval(next.getLabels(), output); } log.info(eval.stats());
  34. Data Science/Engineering • Data selection • Data processing • Formatting

     & Cleaning • Sampling • Data transformation • Feature  scaling  & Normalization • Decomposition  & Aggregation • Dimensionality reduction
  35. Common  Mistakes •Training  set  – 70%/30%  split •Test  set  –

    Do  not  show  this  to  your  model! •Sensitivity  vs.  specificity •Overfitting
  36. Training  your  own  model •Requirements •Clean,  labeled  data  set •Clear

     decision  problem •Patience  and/or  GPUs •Before  you  start
  37. Preparing  data  for  ML •Generating  Labels •Dimensionality  reduction •Determining  salient

     features •Visualizing  the  shape  of  your  data •Correcting  statistical  bias •Getting  data  in  the  right  format
  38. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False
  39. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels
  40. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None < max_it and not np.array_equal(old_centers, centers): while it it += 1
  41. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:,-1] = range(1, k + 1) # Assign labels it = 0 old_centers = None not np.array_equal(old_centers, centers): while it < max_it and it += 1 old_centers = np.copy(centers)
  42. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None max_it and not np.array_equal(old_centers, centers): while it < it += 1 old_centers = np.copy(centers) update_labels(labeled, centers)
  43. Cool  clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] centers[0, :-1]) min = distance.euclidean(datum[:-1], for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist min = < min: dist datum[-1] = center[-1]
  44. Cool  clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] centers[0, :-1]) min = distance.euclidean(datum[:-1], for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist min = < min: dist datum[-1] = center[-1]
  45. Cool  clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist min = < min: dist datum[-1] = center[-1]
  46. Cool  clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] min = distance.euclidean(datum[:-1], centers[0, :-1]) for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist min = < min: dist datum[-1] = center[-1]
  47. Cool  clustering algorithm def update_labels(data, centers): for datum in data:

    datum[-1] = centers[0, -1] centers[0, :-1]) min = distance.euclidean(datum[:-1], for center in centers: dist = distance.euclidean(datum[:-1], center[:-1]) if dist min = < min: dist datum[-1] = center[-1]
  48. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None max_it and not np.array_equal(old_centers, centers): while it < it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled,centers)
  49. Cool  clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = centers[i data[data[:, -1] == i, :-1] - 1, :-1] = np.mean(cluster, axis=0)
  50. Cool  clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = centers[i data[data[:, -1] == i, :-1] - 1, :-1] = np.mean(cluster, axis=0)
  51. Cool  clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = centers[i data[data[:, -1] == i, :-1] - 1, :-1] = np.mean(cluster, axis=0)
  52. Cool  clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = centers[i data[data[:, -1] == i, :-1] - 1, :-1] = np.mean(cluster, axis=0)
  53. Cool  clustering algorithm def update_centers(data, centers): k = len(centers) for

    i in range(1, k + 1): cluster = centers[i data[data[:, -1] == i, :-1] - 1, :-1] = np.mean(cluster, axis=0)
  54. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None max_it and not np.array_equal(old_centers, centers): while it < it += 1 old_centers = np.copy(centers) update_labels(labeled, centers) update_centers(labeled,centers) return labeled
  55. Cool  clustering algorithm def cluster(data, k, max_it=1000): labeled = np.append(data,

    np.zeros((len(data), 1)), axis=1) random_pts = np.random.choice(len(labeled), k, replace=False centers = labeled[random_pts] centers[:, -1] = range(1, k + 1) # Assign labels it = 0 old_centers = None max_it and not np.array_equal(old_centers, centers): while it < it += 1 old_centers = np.copy(centers) update_labels(labeled, update_centers(labeled, centers) centers) return labeled
  56. Further resources • CS231  Course Notes • Deeplearning4j  Examples •

    Visualizing MNIST • Neural  Networks  and  Deep  Learning • Andrew  Ng’s  Machine  Learning class • Awesome  Public Datasets