Hugo Larochelle: Neural Networks

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain

Neural Networks Types of learning problems

SUPERVISED LEARNING 3 Topics: supervised learning • Training time ‣
data :      ‣ setting : • Test time ‣ data :      ‣ setting : { x (t), y(t)} { x (t), y(t)} • Example ‣ classiﬁcation ‣ regression x (t), y(t) ⇠ p( x , y) x (t), y(t) ⇠ p( x , y)

UNSUPERVISED LEARNING 4 Topics: unsupervised learning • Training time ‣
data :      ‣ setting : • Test time ‣ data :      ‣ setting : { x (t)} { x (t)} x (t) ⇠ p( x ) x (t) ⇠ p( x ) • Example ‣ distribution estimation ‣ dimensionality reduction

SEMI-SUPERVISED LEARNING 5 Topics: semi-supervised learning • Training time ‣
data :      ‣ setting : • Test time ‣ data :      ‣ setting : { x (t), y(t)} { x (t), y(t)} { x (t)} x (t) ⇠ p( x ) x (t), y(t) ⇠ p( x , y) x (t), y(t) ⇠ p( x , y)

MULTITASK LEARNING 6 Topics: multitask learning • Training time ‣
data :      ‣ setting : • Test time ‣ data :      ‣ setting : { x (t), y(t) 1 , . . . , y(t) M } { x (t), y(t) 1 , . . . , y(t) M } x (t), y(t) 1 , . . . , y(t) M ⇠ p( x , y1, . . . , yM ) x (t), y(t) 1 , . . . , y(t) M ⇠ p( x , y1, . . . , yM ) • Example ‣ object recognition in images with multiple objects

MULTITASK LEARNING 7 Topics: multitask learning ... Hugo Larochelle D´
epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ... ... ... • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • h( L +1) (x) = o(a( L +1) (x)) = f(x) • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • h( L +1) (x) = o(a( L +1) (x)) = f(x) • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • h( L +1) (x) = o(a( L +1) (x)) = f(x) y

MULTITASK LEARNING 7 Topics: multitask learning ... Hugo Larochelle D´
epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ... ... ... • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • h( L +1) (x) = o(a( L +1) (x)) = f(x) • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • h( L +1) (x) = o(a( L +1) (x)) = f(x) • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • p(y = c | x) • o(a) = softmax(a) = h exp( a1) P c exp( ac) . . . exp( aC ) P c exp( ac) i> • f(x) • h(1) (x) h(2) (x) W(1) W(2) W(3) b(1) b(2) b(3) • a( k ) (x) = b( k ) + W( k ) h( k 1) x (h(0) (x) = x) • h( k ) (x) = g(a( k ) (x)) • h( L +1) (x) = o(a( L +1) (x)) = f(x) y ... ... y3 y1 2

TRANSFER LEARNING 8 Topics: transfer learning • Training time ‣
data :      ‣ setting : • Test time ‣ data :      ‣ setting : { x (t), y(t) 1 , . . . , y(t) M } x (t), y(t) 1 , . . . , y(t) M ⇠ p( x , y1, . . . , yM ) { x (t), y(t) 1 } x (t), y(t) 1 ⇠ p( x , y1)

STRUCTURED OUTPUT PREDICTION 9 Topics: structured output prediction • Training
time ‣ data :      ‣ setting : • Test time ‣ data :      ‣ setting : • Example ‣ image caption generation ‣ machine translation x (t), y (t) ⇠ p( x , y ) x (t), y (t) ⇠ p( x , y ) { x (t), y (t)} { x (t), y (t)} of arbitrary structure (vector, sequence, graph)

DOMAIN ADAPTATION 10 Topics: domain adaptation, covariate shift • Training
time ‣ data :      ‣ setting : • Test time ‣ data :      ‣ setting : { x (t), y(t)} x (t) ⇠ p( x ) ⇡ p( x ) • Example ‣ classify sentiment in reviews of different products y(t) ⇠ p(y| x (t)) { ¯ x (t), y(t)} ¯ x (t) ⇠ q( x ) y(t) ⇠ p(y| ¯ x (t)) { ¯ x (t0)} ¯ x (t) ⇠ q( x )

DOMAIN ADAPTATION 11 Topics: domain adaptation, covariate shift x October
17, 2012 Abstract Math for my slides “Autoencoders”. • h(x) = g(a(x)) = sigm(b + Wx) • b x = o( b a(x)) f( x ) W V b c o ( h ( x )) w d • Domain-adversarial networks (Ganin et al. 2015)  train hidden layer representation to be 1. predictive of the target class 2. indiscriminate of the domain • Trained by stochastic gradient descent ‣ for each random pair 1. update W,V,b,c in opposite direction of gradient 2. update w,d in direction of gradient x (t), ¯ x (t0)

DOMAIN ADAPTATION 11 Topics: domain adaptation, covariate shift x October
17, 2012 Abstract Math for my slides “Autoencoders”. • h(x) = g(a(x)) = sigm(b + Wx) • b x = o( b a(x)) f( x ) W V b c o ( h ( x )) w d • Domain-adversarial networks (Ganin et al. 2015)  train hidden layer representation to be 1. predictive of the target class 2. indiscriminate of the domain • Trained by stochastic gradient descent ‣ for each random pair 1. update W,V,b,c in opposite direction of gradient 2. update w,d in direction of gradient x (t), ¯ x (t0) May also be used to promote  fair and unbiased models …

ONE-SHOT LEARNING 12 Topics: one-shot learning • Training time ‣
data :      ‣ setting : • Test time ‣ data :      ‣ setting :        ‣ side information : - a single labeled example from each of the M new classes { x (t), y(t)} { x (t), y(t)} • Example ‣ recognizing a person based on a single picture of him/her subject to y(t) 2 {1, . . . , C} y(t) 2 {C + 1, . . . , C + M} subject to x (t), y(t) ⇠ p( x , y) x (t), y(t) ⇠ p( x , y)

ONE-SHOT LEARNING 13 Topics: one-shot learning W W W W
W W W W 500 500 500 500 2000 Learning Similarity Metric 30 2000 1 2 3 4 30 1 2 3 4 y X X a b y a b D[y ,y ] a b We can then learn the non-linear trans mizing the log probability of the pairs t the training set. The normalizing term in the number of training cases rather the number of pixels or the number of cause we are only attempting to mode pairings, not the structure in the indiv mutual information between the code v The idea of using Eq. 2 to train a m work was originally described in [9]. a network would extract a two-dimen plicitly represented the size and orien was trained on pairs of face images th and orientation but were otherwise very to extract more elaborate properties w partly because of the difﬁculty of train Siamese architecture (ﬁgure taken from Salakhutdinov   and Hinton, 2007)

ZERO-SHOT LEARNING 14 Topics: zero-shot learning, zero-data learning • Training
time ‣ data :      ‣ setting :        ‣ side information : - description vector zc of each of the C classes • Test time ‣ data :      ‣ setting :        ‣ side information : - description vector zc of each of the new M classes { x (t), y(t)} { x (t), y(t)} • Example ‣ recognizing an object based on a worded description of it subject to y(t) 2 {1, . . . , C} y(t) 2 {C + 1, . . . , C + M} subject to x (t), y(t) ⇠ p( x , y) x (t), y(t) ⇠ p( x , y)

ZERO-SHOT LEARNING 15 Topics: zero-shot learning, zero-data learning n Swersky
Sanja Fidler Ruslan Salakhutdinov University of Toronto wersky,fidler,[email protected] t Learning of vi- ibutes to accom- at learning from icles, avoids the e attributes. We nseen categories we use text fea- he convolutional nvolutional neu- the architecture yers, rather than h modalities, as proposed model a list of pseudo- ng of words from s end-to-end us- CNN MLP Class score Dot product Wikipedia article TF-IDF Image g f The Cardinals or Cardinalidae are a family of passerine birds found in North and South America The South American cardinals in the genus… family north genus birds south america … Cxk 1xk 1xC Ba, Swersky, Fidler, Salakhutdinov arxiv 2015

DESIGNING NEW ARCHITECTURES 16 Topics: designing new architectures • Tackling
a new learn problem often requires designing   an adapted neural architecture • Approach 1: use our intuition for how a human would reason the problem • Approach 2: take an existing algorithm/procedure and   turn it into a neural network

DESIGNING NEW ARCHITECTURES 17 Topics: designing new architectures • Many
other examples ‣ structured prediction by unrolling probabilistic inference in an MRF ‣ planning by unrolling the value iteration algorithm  (Tamar et al., NIPS 2016) ‣ few-shot learning by unrolling gradient descent on small training set Under review as a conference paper at ICLR 2017 Figure 1: Computational graph for the forward pass of the meta-learner. The dashed line divides as a conference paper at ICLR 2017 2017 paper at ICLR 2017 Neural  network Learning  algorithm Ravi and Larochelle, ICLR 2017 _ _ _ _

Neural networks Unintuitive properties of neural networks

THEY CAN MAKE DUMB ERRORS 19 Topics: adversarial examples •
Intriguing Properties of Neural Networks  Szegedy, Zaremba, Sutskever, Bruna, Erhan, Goodfellow, Fergus, ICLR 2014 (b) Correctly  classiﬁed Badly  classiﬁed Difference

THEY CAN MAKE DUMB ERRORS 20 Topics: adversarial examples •
Humans have adversarial examples too • However they don’t match those of neural networks

21 Topics: adversarial examples • Humans have adversarial examples too
• However they don’t match those of neural networks THEY CAN MAKE DUMB ERRORS

THEY ARE STRANGELY NON-CONVEX 22 Topics: non-convexity, saddle points •
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization  Dauphin, Pascanu, Gulcehre, Cho, Ganguli, Bengio, NIPS 2014 avg loss θ

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization  Dauphin, Pascanu, Gulcehre, Cho, Ganguli, Bengio, NIPS 2014

Qualitatively Characterizing Neural Network Optimization Problems  Goodfellow, Vinyals, Saxe, ICLR 2015 hm struggles to make progress.

If dataset is created by labeling points using a N-hidden units neural network ‣ training another N-hidden units network is likely to fail ‣ but training a larger neural network is more likely to work!   (saddle points seem to be a blessing)

THEY WORK BEST WHEN BADLY TRAINED 26 Topics: sharp vs.
flat miniman • Flat Minima  Hochreiter, Schmidhuber, Neural Computation 1997 literature. (Hochreiter & Schmidhuber, 1997) (informally) define a flat minimizer ¯ x as one for which the function varies slowly in a relatively large neighborhood of ¯ x . In contrast, a sharp minimizer ˆ x is such that the function increases rapidly in a small neighborhood of ˆ x . A flat minimum can be described with low precision, whereas a sharp minimum requires high precision. The large sensitivity of the training function at a sharp minimizer negatively impacts the ability of the trained model to generalize on new data; see Figure 1 for a hypothetical illustration. This can be explained through the lens of the minimum description length (MDL) theory, which states that statistical models that require fewer bits to describe (i.e., are of low complexity) generalize better (Rissanen, 1983). Since flat minimizers can be specified with lower precision than to sharp minimizers, they tend to have better generalization performance. Alternative explanations are proffered through the Bayesian view of learning (MacKay, 1992), and through the lens of free Gibbs energy; see e.g. Chaudhari et al. (2016). Flat Minimum Sharp Minimum Training Function Testing Function f(x) avg loss θ

THEY WORK BEST WHEN BADLY TRAINED 27 Topics: sharp vs.
ﬂat miniman • On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima  Keskar, Mudigere, Nocedal, Smelyanskiy, Tang, ICLR 2017 ‣ found that using large batch sizes tends to ﬁnd sharped minima and generalize worse • This means that we can’t talk about generalization without taking the training algorithm into account

THEY CAN EASILY MEMORIZE 28 Topics: model capacity vs. training
algorithm • Understanding Deep Learning Requires Rethinking Generalization  Zhang, Bengio, Hardt, Recth, Vinyals, ICLR 2017

THEY CAN BE COMPRESSED 29 Topics: knowledge distillation • Distilling
the Knowledge in a Neural Network  Hinton, Vinyals, Dean, arXiv 2015 ... Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • a(x) = b + P i wixi = b + w > x • h(x) = g(a(x)) = g(b + P i wixi) • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ... ... ...

the Knowledge in a Neural Network  Hinton, Vinyals, Dean, arXiv 2015 ... Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • a(x) = b + P i wixi = b + w > x • h(x) = g(a(x)) = g(b + P i wixi) • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ... ... ... ... Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Feedforward neural networ Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • a(x) = b + P i wixi = b + w > x • h(x) = g(a(x)) = g(b + P i wixi) • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ...

the Knowledge in a Neural Network  Hinton, Vinyals, Dean, arXiv 2015 ... Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • a(x) = b + P i wixi = b + w > x • h(x) = g(a(x)) = g(b + P i wixi) • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ... ... ... ... Feedforward neural network Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd Feedforward neural networ Hugo Larochelle D´ epartement d’informatique Universit´ e de Sherbrooke [email protected] September 6, 2012 Abstract Math for my slides “Feedforward neural network”. • a ( x ) = b + P i wixi = b + w > x • h ( x ) = g ( a ( x )) = g ( b + P i wixi) • x1 xd ... • a(x) = b + P i wixi = b + w > x • h(x) = g(a(x)) = g(b + P i wixi) • x1 xd b w1 wd • w • { • g(a) = a • g(a) = sigm(a) = 1 1+exp( a ) • g(a) = tanh(a) = exp( a ) exp( a ) exp( a )+exp( a ) = exp(2 a ) 1 exp(2 a )+1 • g(a) = max(0, a) • g(a) = reclin(a) = max(0, a) • g( · ) b • W (1) i,j b (1) i xj h(x)i ... ... ... y

THEY CAN BE COMPRESSED 32 Topics: knowledge distillation • Can
successfully distill ‣ a large neural network ‣ an ensemble of neural network • Works better than training it from scratch! ‣ Do Deep Nets Really Need to be Deep?  Jimmy Ba, Rich Caruana, NIPS 2014

THEY ARE INFLUENCED BY INITIALIZATION 33 Topics: impact of initialization
• Why Does Unsupervised Pre- Training Help Deep Learning  Erhan, Bengio, Courville, Manzagol, Vincent, JMLR 2010 ERHAN, BENGIO, COURVILLE, MANZAGOL, VINCENT AND BENGIO focus respectively on local9 and global structure.10 Each point is colored according to the training iteration, to help follow the trajectory movement. −100 −80 −60 −40 −20 0 20 40 60 80 100 −100 −80 −60 −40 −20 0 20 40 60 80 100 2 layers without pre−training 2 layers with pre−training

THEY ARE INFLUENCED BY FIRST EXAMPLES 34 Topics: impact of
early examples • Why Does Unsupervised Pre- Training Help Deep Learning  Erhan, Bengio, Courville, Manzagol, Vincent, JMLR 2010 WHY DOES UNSUPERVISED PRE-TRAINING HELP DEEP LEARNING? the first million examples (across 10 different random draws, sampling a different set of 1 million examples each time) and keep the other ones fixed. After training the (10) models, we measure the variance (across the 10 draws) of the output of the networks on a fixed test set (i.e., we measure the variance in function space). We then vary the next million examples in the same fashion, and so on, to see how much each of the ten parts of the training set influenced the final function.

YET THEY FORGET WHAT THEY LEARNED 35 Topics: lifelong learning,
continual learning • Overcoming Catastrophic Forgetting in Neural Networks  Kirkpatrick et al. PNAS 2017

SO THERE IS A LOT   MORE TO UNDERSTAND!! 36

MERCI! 37

Hugo Larochelle: Neural Networks

Hugo Larochelle: Neural Networks

More Decks by ML Review

Other Decks in Research

Featured

Transcript