2016 - Kendall Chuang, David Clark - TensorFlow on the Web

TensorFlow on the Web Kendall Chuang and David Clark

• Web Framework • Written in Python • Lightweight •
Developed by Armin Ronacher • Machine Learning Framework • Written in C++ and includes Python interface • Developed by Google

Stack • Flask (running on Heroku with gunicorn+nginx) • Flask-WTF
forms • Pandas • TensorFlow • Boto for AWS S3 storage access

Machine Learning Trained TensorFlow Model Training Features Training Labels Training
Trained TensorFlow Model Test Features Predicted Labels Training Prediction

Training Endpoint for Data Scientists Flask Server Client Database Tensorﬂow
Model

Prediction Endpoint Flask Server Client Database Tensorﬂow Model Flask Server
Client Flask Server Client

Training Form class TrainingDataForm(Form): training_data = FileField('Training Data CSV File')
learning_rate = DecimalField('Learning Rate') batch_size = DecimalField('Batch Size') model_name = StringField('Model Name', validators=[DataRequired()])

/train/ @app.route('/train/', methods=('GET', 'POST')) def upload(): form = TrainingDataForm() if
form.validate_on_submit(): model_name = form.model_name.data learning_rate = float(form.learning_rate.data) batch_size = int(form.batch_size.data) filename = secure_filename(form.training_data.data.filename) form.training_data.data.save('wine_quality/data/' + filename) dataframe = pd.read_csv('wine_quality/data/' + filename, sep=',') my_model.train(dataframe, learning_rate, batch_size, model_name) else: filename = None return render_template('test_data_upload.html', form=form, filename=filename)

Prediction Form class TestParameterForm(Form): alcohol = DecimalField('Alcohol Percentage') volatile_acidity =
DecimalField('Fixed Acidity') citric_acid = DecimalField('Citric Acid') residual_sugar = DecimalField('Residual Sugar') chlorides = DecimalField('Chlorides') free_sulfur_dioxide = DecimalField('Free Sulfur Dioxide') total_sulfur_dioxide = DecimalField('Total Sulfur Dioxide') density = DecimalField('Density') ph = DecimalField('pH') sulphates = DecimalField('Sulphates')

/predict/ @app.route('/predict/', methods=['GET', 'POST']) def test_parameters(): form = TestParameterForm(request.form) if
request.method == 'POST' and form.validate(): alcohol = ﬂoat(form.alcohol.data) volatile_acidity = ﬂoat(form.volatile_acidity.data) … input_row = [alcohol, volatile_acidity, citric_acid, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, ph, sulphates] results = my_model.run([input_row]) return render_template('test_parameters.html', form=form, result=results[0]) return render_template('test_parameters.html', form=form)

Training Endpoint for Data Scientists Flask Server Client Database Tensorﬂow
Model

Save to AWS S3 def save_to_s3(self, filename, model_name): try: AWS_ACCESS_KEY_ID
= os.environ.get('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY') c = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) b = c.get_bucket('flasktensorflow') # substitute your bucket name here k = b.new_key(model_name) f = open(filename, 'rb') k.set_contents_from_file(f, encrypt_key=True) except: return False return True

Prediction Endpoint Flask Server Client Database Tensorﬂow Model Flask Server
Client Flask Server Client

Load from AWS S3 def load_from_s3(self, filename, model_name): try: AWS_ACCESS_KEY_ID
= os.environ.get('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY') c = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) b = c.get_bucket('flasktensorflow') # substitute your bucket name here k = b.Key(b) k.key = model_name f = open(filename, 'wb') k.get_contents_to_file(f) except: return False return True

Heroku deployment • Install Heroku CLI: • https://devcenter.heroku.com/articles/heroku-command • Procﬁle
• web: gunicorn main:app --log-ﬁle - • Deployment Command: • heroku create • git push heroku HEAD:master • https://devcenter.heroku.com/articles/git

What is TensorFlow? • Developed by the Google Brain Team.
• Graph based. • Nodes are operations. • Edges are multi-dimensional arrays called tensors. • All operations done outside of Python. • Inputs are stored in a placeholder() or a Variable(). • placeholder(): ﬁxed input. • Variable(): variable input. • Inputs populated during the TensorFlow session.

Why is TensorFlow better than other machine learning tools? •
Portable to many types of hardware, from mobile devices to distributed GPUs. • Includes a visualization module (TensorBoard). • Events are logged and model training progress can be followed interactively using TensorBoard. • Model training is stored in checkpoints. The training can be stopped, evaluated, and then restarted at the checkpoint. • Open source.

What is a tensor? • Vector: One-dimension. • Matrix: Two-dimensions.
• Tensor: n-dimensions. Image from http://noaxiom.org/tensor. • TensorFlow stores data in tensors.

The Wine Quality Data Set • Data set taken from
the UCI Machine Learning Repository (http:// archive.ics.uci.edu/ml)  • Includes chemical properties of wine. • Portuguese “Vinho Verde” variants • Properties include: volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates and alcohol • These are the features.  • Does not include: • Grape type, brand name, price, etc. • Each wine is assigned a subjective quality score from 0 - 10. • These are the labels. • More average wines than excellent or bad.

Data Preparation • Outliers: • Features with extreme values •
> 5σ from mean • Values set to distribution mean def _outliers(df, threshold, columns):  for col in columns:  mask = df[col] > float(threshold)*df[col].std()+df[col].mean()  df.loc[mask == True,col] = np.nan  mean_property = df.loc[:,col].mean()  df.loc[mask == True,col] = mean_property  return df

Data Preparation • Collinearity: • Strong relationship between two or
more features • Problem for regression models • Removed ﬁxed acidity

Grouping Data by Category • Grouped data by quality into
two categories, Good and Bad. • Good: quality equals 7 and 8 • Bad: quality equals 4 and 5

One-hot Vectors • Special way to express categorical labels. •
Vector of 0’s and 1’s. e.g.: • Bad ⇒ [1, 0] • Good ⇒ [0, 1] • Required label format for TensorFlow def _dense_to_one_hot(labels_dense, num_classes=2):  # Convert class labels from scalars to one-hot vectors  num_labels = len(labels_dense)  index_offset = np.arange(num_labels) * num_classes  labels_one_hot = np.zeros((num_labels, num_classes))  labels_one_hot.flat[index_offset + labels_dense] = 1  return labels_one_hot

Train/Test Split • Dataset split into two parts: • Training
set (90%) • Test set (10%)

Softmax Regression • Regression algorithm for categorical data • Evidence:
Weighted sum of chemical properties. • Negative weight: is not of that quality. • Positive weight: is of that quality. def softmax_regression(x):  W = tf.Variable(tf.zeros([10,2]), name="W")  b = tf.Variable(tf.zeros([2]), name="b")  y = tf.nn.softmax(tf.matmul(x,W) + b)  return y, [W, b]

Cost Function • Function to measure model ﬁt. • cross-entropy
function: Hy0 (y) = P i y0 i log(yi ) y'i = training labels, yi = model

Optimization Function • A function to deﬁne how the cost
function is minimized. • Learning rate = how fast or how slow function should be minimized. • Too Large: Overshoot the minima. • Too Small: Could take forever to ﬁnd the minima. • Gradient descent. y_ = tf.placeholder("float", [None, 2])  cost = -tf.reduce_mean(y_*tf.log(y))  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)  correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))  accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Batches and Epochs • Progressively train model on portions of
the training set using batches. • Fit model to ﬁrst batch. • Update the model. • Use updated model on next batch. • Repeat procedure to end of training data set. • Repeat training step many times. • Each repetition is called an epoch.

Model Training saver = tf.train.Saver(variables)  init = tf.initialize_all_variables()  with tf.Session()
as sess: • Deﬁne object to save model • Initialize all variables • Start a TensorFlow session

Model Training • Loop through each epoch • Loop through
each batch sess.run(init)  log_list = [] # List to store logging of model progress  for i in range(100):  average_cost = 0  number_of_batches = int(len(X_train) / batch_size)  for start, end in zip(range(0, len(X_train), batch_size), range(batch_size, len(X_train), batch_size)):  sess.run(optimizer, feed_dict={X: X_train[start:end], y_: y_train[start:end]})  # Compute average loss  average_cost += sess.run(cost, feed_dict={X: X_train[start:end], y_: y_train[start:end]}) / number_of_batches  if i % 10 == 0:  print("Epoch:", '%04d' % (i + 1), "cost=", "{:.9f}".format(average_cost))  log_cost = "Epoch {:d}: cost = {:.9f}".format(i + 1, average_cost)  log_list.append(log_cost)

Model Training print("Accuracy: {0}".format(sess.run(accuracy, feed_dict={X: X_test, y_: y_test})))  log_accuracy =
"Accuracy: {0}".format(sess.run(accuracy, feed_dict={X: X_test, y_: y_test}))  log_list.append(log_accuracy)    path = saver.save(sess, os.path.join(os.path.dirname(__file__), "data/ softmax_regression.ckpt"))  print("Saved:", path)    return log_list • Print model accuracy and save to a list • Save model to a checkpoint ﬁnal • Return model log list for Flask x = tf.placeholder("float", [None, 10])  sess = tf.Session()    with tf.variable_scope("softmax_regression"):  y1, variables = model.softmax_regression(x)  saver = tf.train.Saver(variables)  saver.restore(sess, "wine_quality/data/softmax_regression.ckpt")

2016 - Kendall Chuang, David Clark - TensorFlow...

2016 - Kendall Chuang, David Clark - TensorFlow on the Web

PyBay

More Decks by PyBay

Other Decks in Programming

Featured

Transcript

TensorFlow on the Web Kendall Chuang and David Clark

• Web Framework • Written in Python • Lightweight •

Stack • Flask (running on Heroku with gunicorn+nginx) • Flask-WTF

Machine Learning Trained TensorFlow Model Training Features Training Labels Training

Training Endpoint for Data Scientists Flask Server Client Database Tensorﬂow

Prediction Endpoint Flask Server Client Database Tensorﬂow Model Flask Server

Training Form class TrainingDataForm(Form): training_data = FileField('Training Data CSV File')

/train/ @app.route('/train/', methods=('GET', 'POST')) def upload(): form = TrainingDataForm() if

Prediction Form class TestParameterForm(Form): alcohol = DecimalField('Alcohol Percentage') volatile_acidity =

/predict/ @app.route('/predict/', methods=['GET', 'POST']) def test_parameters(): form = TestParameterForm(request.form) if

Training Endpoint for Data Scientists Flask Server Client Database Tensorﬂow

Save to AWS S3 def save_to_s3(self, ﬁlename, model_name): try: AWS_ACCESS_KEY_ID

Prediction Endpoint Flask Server Client Database Tensorﬂow Model Flask Server

Load from AWS S3 def load_from_s3(self, ﬁlename, model_name): try: AWS_ACCESS_KEY_ID

Heroku deployment • Install Heroku CLI: • https://devcenter.heroku.com/articles/heroku-command • Procﬁle

What is TensorFlow? • Developed by the Google Brain Team.

Why is TensorFlow better than other machine learning tools? •

What is a tensor? • Vector: One-dimension. • Matrix: Two-dimensions.

The Wine Quality Data Set • Data set taken from

Data Preparation • Outliers: • Features with extreme values •

Data Preparation • Collinearity: • Strong relationship between two or

Grouping Data by Category • Grouped data by quality into

One-hot Vectors • Special way to express categorical labels. •

Train/Test Split • Dataset split into two parts: • Training

Softmax Regression • Regression algorithm for categorical data • Evidence:

Cost Function • Function to measure model ﬁt. • cross-entropy

Optimization Function • A function to deﬁne how the cost

Batches and Epochs • Progressively train model on portions of

Model Training saver = tf.train.Saver(variables)  init = tf.initialize_all_variables()  with tf.Session()

Model Training • Loop through each epoch • Loop through

Model Training print("Accuracy: {0}".format(sess.run(accuracy, feed_dict={X: X_test, y_: y_test})))  log_accuracy =