Machine Learning with Python

Slide 1

Slide 1 text

Sebastian Raschka, Ph.D. MSU Data Science workshop East Lansing, Michigan State University • Feb 21, 2018 Machine Learning with Python

Slide 50

Slide 50 text

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (Preliminary White Paper, November 9, 2015) Mart´ ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man´ e, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi´ egas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng Google Research⇤ Abstract TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is ﬂexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of sequence prediction [47], move selection for Go [34], pedestrian detection [2], reinforcement learning [38], and other areas [17, 5]. In addition, often in close collab- oration with the Google Brain team, more than 50 teams at Google and other Alphabet companies have deployed deep neural networks using DistBelief in a wide variety of products, including Google Search [11], our advertis- ing products, our speech recognition systems [50, 6, 46], Google Photos [43], Google Maps and StreetView [19], Google Translate [18], YouTube, and many others. Based on our experience with DistBelief and a more complete understanding of the desirable system proper- ties and requirements for training and using neural net- https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf Figure 1: Example TensorFlow code fragm W b x MatMul Add ReLU ... C 50

Slide 63

Slide 63 text

g = tf.Graph() with g.as_default(): # Input data tf_x = tf.placeholder(tf.float32, [None, n_input], name='features') tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets') # Model parameters weights = { 'h1': tf.Variable(tf.truncated_normal([n_input, n_hidden_1], stddev=0.1)), 'out': tf.Variable(tf.truncated_normal([n_hidden_2, n_classes], stddev=0.1)) } biases = { 'b1': tf.Variable(tf.zeros([n_hidden_1])), 'out': tf.Variable(tf.zeros([n_classes])) } # Multilayer perceptron layer_1 = tf.add(tf.matmul(tf_x, weights['h1']), biases['b1']) layer_1 = tf.nn.relu(layer_1) out_layer = tf.matmul(layer_1, weights['out']) + biases['out'] # Loss and optimizer loss = tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=tf_y) cost = tf.reduce_mean(loss, name='cost') optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) train = optimizer.minimize(cost, name='train') # Prediction correct_prediction = tf.equal(tf.argmax(tf_y, 1), tf.argmax(out_layer, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy') with tf.Session(graph=g) as sess: sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): avg_cost = 0. total_batch = mnist.train.num_examples // batch_size for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size) _, c = sess.run(['train', 'cost:0'], feed_dict={'features:0': batch_x, 'targets:0': batch_y}) class MultilayerPerceptron(torch.nn.Module): def __init__(self, num_features, num_classes): super(MultilayerPerceptron, self).__init__() ### 1st hidden layer self.linear_1 = torch.nn.Linear(num_features, num_hidden_1) ### Output layer self.linear_out = torch.nn.Linear(num_hidden_2, num_classes) def forward(self, x): out = self.linear_1(x) out = F.relu(out) logits = self.linear_out(out) probas = F.softmax(logits, dim=1) return logits, probas model = MultilayerPerceptron(num_features=num_features, num_classes=num_classes) if torch.cuda.is_available(): model.cuda() for epoch in range(num_epochs): for batch_idx, (features, targets) in enumerate(train_loader): features = Variable(features.view(-1, 28*28)) targets = Variable(targets) if torch.cuda.is_available(): features, targets = features.cuda(), targets.cuda() ### FORWARD AND BACK PROP logits, probas = model(features) cost = cost_fn(logits, targets) optimizer.zero_grad() cost.backward() ### UPDATE MODEL PARAMETERS optimizer.step() 63

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text