Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2016 - Kendall Chuang, David Clark - TensorFlow on the Web

PyBay
August 21, 2016

2016 - Kendall Chuang, David Clark - TensorFlow on the Web

Description
This talk will be about walking through the steps to put a TensorFlow project into production on the web with Flask and Heroku. The goal is to introduce the project and show how TensorFlow can be used online for real data tasks, and discuss other considerations for deployment of a TensorFlow project.

Abstract
TensorFlow is a deep learning library with Python and C++ bindings that was released in 2015. The talk start with a brief intro to TensorFlow, and then dive into the specific steps to set up a simple project that can be served online.

Bio
Kendall is a lead software engineer at YesGraph, where he uses machine learning and Flask to power better invite flows for mobile and web apps. Previously he worked as an independent software consultant for four years, and before that he was a hardware designer at Qualcomm in San Diego for three years. Kendall was an an organizer of the San Diego Python Users Group, where he helped plan six one-day workshops on various Python topics.

Bio2
David Clark has a background in astrophysics, where he used Python extensively to analyze astronomical data. He recently transitioned careers to data science. Currently he is doing consulting for two startups. At Palo Alto Scientific, Inc., he uses the machine learning library TensorFlow to model sensor data from a wearable and infer a runner’s performance. He is also doing work for Quantea, Inc., making a dashboard using the Python libraries Bokeh and Pandas.

https://youtu.be/nZDAyugqXCQ

PyBay

August 21, 2016
Tweet

More Decks by PyBay

Other Decks in Programming

Transcript

  1. • Web Framework • Written in Python • Lightweight •

    Developed by Armin Ronacher • Machine Learning Framework • Written in C++ and includes Python interface • Developed by Google
  2. Stack • Flask (running on Heroku with gunicorn+nginx) • Flask-WTF

    forms • Pandas • TensorFlow • Boto for AWS S3 storage access
  3. Machine Learning Trained TensorFlow Model Training Features Training Labels Training

    Trained TensorFlow Model Test Features Predicted Labels Training Prediction
  4. Training Form class TrainingDataForm(Form): training_data = FileField('Training Data CSV File')

    learning_rate = DecimalField('Learning Rate') batch_size = DecimalField('Batch Size') model_name = StringField('Model Name', validators=[DataRequired()])
  5. /train/ @app.route('/train/', methods=('GET', 'POST')) def upload(): form = TrainingDataForm() if

    form.validate_on_submit(): model_name = form.model_name.data learning_rate = float(form.learning_rate.data) batch_size = int(form.batch_size.data) filename = secure_filename(form.training_data.data.filename) form.training_data.data.save('wine_quality/data/' + filename) dataframe = pd.read_csv('wine_quality/data/' + filename, sep=',') my_model.train(dataframe, learning_rate, batch_size, model_name) else: filename = None return render_template('test_data_upload.html', form=form, filename=filename)
  6. Prediction Form class TestParameterForm(Form): alcohol = DecimalField('Alcohol Percentage') volatile_acidity =

    DecimalField('Fixed Acidity') citric_acid = DecimalField('Citric Acid') residual_sugar = DecimalField('Residual Sugar') chlorides = DecimalField('Chlorides') free_sulfur_dioxide = DecimalField('Free Sulfur Dioxide') total_sulfur_dioxide = DecimalField('Total Sulfur Dioxide') density = DecimalField('Density') ph = DecimalField('pH') sulphates = DecimalField('Sulphates')
  7. /predict/ @app.route('/predict/', methods=['GET', 'POST']) def test_parameters(): form = TestParameterForm(request.form) if

    request.method == 'POST' and form.validate(): alcohol = float(form.alcohol.data) volatile_acidity = float(form.volatile_acidity.data) … input_row = [alcohol, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, ph, sulphates] results = my_model.run([input_row]) return render_template('test_parameters.html', form=form, result=results[0]) return render_template('test_parameters.html', form=form)
  8. Save to AWS S3 def save_to_s3(self, filename, model_name): try: AWS_ACCESS_KEY_ID

    = os.environ.get('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY') c = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) b = c.get_bucket('flasktensorflow') # substitute your bucket name here k = b.new_key(model_name) f = open(filename, 'rb') k.set_contents_from_file(f, encrypt_key=True) except: return False return True
  9. Load from AWS S3 def load_from_s3(self, filename, model_name): try: AWS_ACCESS_KEY_ID

    = os.environ.get('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY') c = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) b = c.get_bucket('flasktensorflow') # substitute your bucket name here k = b.Key(b) k.key = model_name f = open(filename, 'wb') k.get_contents_to_file(f) except: return False return True
  10. Heroku deployment • Install Heroku CLI: • https://devcenter.heroku.com/articles/heroku-command • Procfile

    • web: gunicorn main:app --log-file - • Deployment Command: • heroku create • git push heroku HEAD:master • https://devcenter.heroku.com/articles/git
  11. What is TensorFlow? • Developed by the Google Brain Team.

    • Graph based. • Nodes are operations. • Edges are multi-dimensional arrays called tensors. • All operations done outside of Python. • Inputs are stored in a placeholder() or a Variable(). • placeholder(): fixed input. • Variable(): variable input. • Inputs populated during the TensorFlow session.
  12. Why is TensorFlow better than other machine learning tools? •

    Portable to many types of hardware, from mobile devices to distributed GPUs. • Includes a visualization module (TensorBoard). • Events are logged and model training progress can be followed interactively using TensorBoard. • Model training is stored in checkpoints. The training can be stopped, evaluated, and then restarted at the checkpoint. • Open source.
  13. What is a tensor? • Vector: One-dimension. • Matrix: Two-dimensions.

    • Tensor: n-dimensions. Image from http://noaxiom.org/tensor. • TensorFlow stores data in tensors.
  14. The Wine Quality Data Set • Data set taken from

    the UCI Machine Learning Repository (http:// archive.ics.uci.edu/ml)
 • Includes chemical properties of wine. • Portuguese “Vinho Verde” variants • Properties include: volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates and alcohol • These are the features.
 • Does not include: • Grape type, brand name, price, etc. • Each wine is assigned a subjective quality score from 0 - 10. • These are the labels. • More average wines than excellent or bad.
  15. Data Preparation • Outliers: • Features with extreme values •

    > 5σ from mean • Values set to distribution mean def _outliers(df, threshold, columns):
 for col in columns:
 mask = df[col] > float(threshold)*df[col].std()+df[col].mean()
 df.loc[mask == True,col] = np.nan
 mean_property = df.loc[:,col].mean()
 df.loc[mask == True,col] = mean_property
 return df
  16. Data Preparation • Collinearity: • Strong relationship between two or

    more features • Problem for regression models • Removed fixed acidity
  17. Grouping Data by Category • Grouped data by quality into

    two categories, Good and Bad. • Good: quality equals 7 and 8 • Bad: quality equals 4 and 5
  18. One-hot Vectors • Special way to express categorical labels. •

    Vector of 0’s and 1’s. e.g.: • Bad ⇒ [1, 0] • Good ⇒ [0, 1] • Required label format for TensorFlow def _dense_to_one_hot(labels_dense, num_classes=2):
 # Convert class labels from scalars to one-hot vectors
 num_labels = len(labels_dense)
 index_offset = np.arange(num_labels) * num_classes
 labels_one_hot = np.zeros((num_labels, num_classes))
 labels_one_hot.flat[index_offset + labels_dense] = 1
 return labels_one_hot
  19. Softmax Regression • Regression algorithm for categorical data • Evidence:

    Weighted sum of chemical properties. • Negative weight: is not of that quality. • Positive weight: is of that quality. def softmax_regression(x):
 W = tf.Variable(tf.zeros([10,2]), name="W")
 b = tf.Variable(tf.zeros([2]), name="b")
 y = tf.nn.softmax(tf.matmul(x,W) + b)
 return y, [W, b]
  20. Cost Function • Function to measure model fit. • cross-entropy

    function: Hy0 (y) = P i y0 i log(yi ) y'i = training labels, yi = model
  21. Optimization Function • A function to define how the cost

    function is minimized. • Learning rate = how fast or how slow function should be minimized. • Too Large: Overshoot the minima. • Too Small: Could take forever to find the minima. • Gradient descent. y_ = tf.placeholder("float", [None, 2])
 cost = -tf.reduce_mean(y_*tf.log(y))
 optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
 correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
 accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
  22. Batches and Epochs • Progressively train model on portions of

    the training set using batches. • Fit model to first batch. • Update the model. • Use updated model on next batch. • Repeat procedure to end of training data set. • Repeat training step many times. • Each repetition is called an epoch.
  23. Model Training saver = tf.train.Saver(variables)
 init = tf.initialize_all_variables()
 with tf.Session()

    as sess: • Define object to save model • Initialize all variables • Start a TensorFlow session
  24. Model Training • Loop through each epoch • Loop through

    each batch sess.run(init)
 log_list = [] # List to store logging of model progress
 for i in range(100):
 average_cost = 0
 number_of_batches = int(len(X_train) / batch_size)
 for start, end in zip(range(0, len(X_train), batch_size), range(batch_size, len(X_train), batch_size)):
 sess.run(optimizer, feed_dict={X: X_train[start:end], y_: y_train[start:end]})
 # Compute average loss
 average_cost += sess.run(cost, feed_dict={X: X_train[start:end], y_: y_train[start:end]}) / number_of_batches
 if i % 10 == 0:
 print("Epoch:", '%04d' % (i + 1), "cost=", "{:.9f}".format(average_cost))
 log_cost = "Epoch {:d}: cost = {:.9f}".format(i + 1, average_cost)
 log_list.append(log_cost)
  25. Model Training print("Accuracy: {0}".format(sess.run(accuracy, feed_dict={X: X_test, y_: y_test})))
 log_accuracy =

    "Accuracy: {0}".format(sess.run(accuracy, feed_dict={X: X_test, y_: y_test}))
 log_list.append(log_accuracy)
 
 path = saver.save(sess, os.path.join(os.path.dirname(__file__), "data/ softmax_regression.ckpt"))
 print("Saved:", path)
 
 return log_list • Print model accuracy and save to a list • Save model to a checkpoint final • Return model log list for Flask x = tf.placeholder("float", [None, 10])
 sess = tf.Session()
 
 with tf.variable_scope("softmax_regression"):
 y1, variables = model.softmax_regression(x)
 saver = tf.train.Saver(variables)
 saver.restore(sess, "wine_quality/data/softmax_regression.ckpt")