Practical machine learning for everyday web apps

Practical machine learning for everyday web apps

Making TensorFlow the brain of your Django app.

A new wave of machine learning is in full swing. This talk gives an overview of the modern Python machine learning stack based on TensorFlow and my practical experiences from using it in a typical Django web app.

You've probably heard of the recent buzz surrounding deep learning, self-driving cars, Amazon Echo's speech interface, Google DeepMind's AlphaGo program beating a human Go master… The modern applications of machine learning are exploding! But can you benefit from the advances in AI in your everyday web apps? How much data do you need to be able to solve some concrete classification problems? What sort of machinery do you need to run it?

This talk will give an overveiw of Google's recently open sourced TensorFlow library and how it can be used for machine learning. To keep things grounded to realistic problems, we will go through an example Django web app for finding events where we want to classify images based on their content. We'll show how to train the desired image classifier using TensorFlow and use it to classify unknown images. We'll also cover the sort of infrastructure you need to use TensorFlow and give an overview of the available cloud services with specialised hardware support for high performance use cases.

PyDays, 6.5.2017.
https://cfp.linuxwochen.at/de/LWW17/public/events/594

7874898d532b989d3a1108cade372cd2?s=128

Dražen Lučanin

May 06, 2017
Tweet

Transcript

  1. PRACTICAL MACHINE LEARNING FOR EVERYDAY WEB APPS Making TensorFlow the

    brain of your Django app Dražen Lučanin @metakermit
  2. THE INTERNET IS FOR CAT PHOTOS!

  3. IS THAT MIFFLES IN YOUR PHOTO?

  4. AI: THE NEXT BIG THING™ • Communication was the Internet’s

    killer app – web apps storing data to DBs ruled – Social networks, online stores, messaging apps, productivity apps, online courses, … • My view… AI will be the next killer app – Faster & cheaper GPU hardware – Lots of R&D around machine learning – Cool applications • Self-driving cars • Good speech recognition • Automating repetitive manual tasks • “The secret sauce”
  5. GETTING INTO AI • …probably a good idea • Applying

    AI != researching AI • Modern AI frameworks – Torch (Facebook) – Theano (academy) – TensorFlow (Google)
  6. TENSORFLOW (TF) • Great AI framework built in Google –

    Easy for developers and researchers – Production-ready • MapReduce – White paper only – Hadoop became the standard • TF open sourced to became the standard • Model marketplace
  7. TF OVERVIEW • DataFlow programming language • describe a graph

    of interacting operations that run entirely outside Python – Graph – Session • Abstraction levels – Low-level API (for researchers) – High-level API (GTD)
  8. LOW-LEVEL API import numpy as np import tensorflow as tf

    W = tf.Variable([.3], tf.float32) b = tf.Variable([-.3], tf.float32) x = tf.placeholder(tf.float32); y = tf.placeholder(tf.float32) linear_model = W * x + b loss = tf.reduce_sum(tf.square(linear_model - y)) optimizer = tf.train.GradientDescentOptimizer(0.01) train = optimizer.minimize(loss) x_train = [1,2,3,4] y_train = [0,-1,-2,-3] init = tf.global_variables_initializer() sess = tf.Session() sess.run(init) for i in range(1000): sess.run(train, {x:x_train, y:y_train}) curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x:x_train, y:y_train}) print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
  9. EVENTS EXAMPLE • AI for event discovery – Django web

    app ( https://www.posterbat.com ) – Side-project I’m working on • Baby steps – Scrapped some event cover images from the web – Does an image have text in it? • AI can be used on everyday problems • Not only for cutting edge research problems – e.g. speech recognition
  10. GETTING DATA FROM DJANGO $ ./manage.py shell -c 'from export_images

    import run; run()' from events.models import Event def run(): for event in Event.objects.all(): prepare_event_image(event)
  11. NORMALISING IMAGES USING PIL from PIL import Image def prepare_event_image(event):

    with open(f'./images/{event.id}.png', 'wb') as f: size = (256, 256) try: image = Image.open(event.img) except ValueError: return image.thumbnail(size, Image.ANTIALIAS) region = image.crop((0, 0, *size)) region.save(f, 'png')
  12. LABELING CLASSES

  13. IMPORTING DATA INTO TF labels = pd.read_csv('./labels.csv', index_col=0) def read_data(folder):

    path = './data/images/' + folder + '/' x = []; y = [] for filename in os.listdir(path): image_id = int(filename.split('.')[0]) # convert input 256x256 image to grayscale # flatten to a 1-d array of floats (0-255) im = misc.imread( path + filename,flatten=True ).flatten() x.append(im) y.append(int(labels.ix[image_id])) x = np.array(x) return tf.constant(x), tf.constant(y) train, train_labels = read_data('train')
  14. TRAINING THE MODEL feature_columns = [tf.contrib.layers.real_valued_column("", dimension=65536)] classifier = tf.contrib.learn.DNNClassifier(

    feature_columns=feature_columns, hidden_units=[1024, 512, 256], n_classes=2, model_dir="/tmp/model_dir", ) classifier.fit( x=train, y=train_labels, steps=20000 )
  15. LOAD THE MODEL IN DJANGO • Save • Add the

    model directory to your code repository • Load in Django saver = tf.train.Saver() saver.save(session, 'my-model') new_saver = tf.train.import_meta_graph('events/my-model.meta') new_saver.restore(session, 'events/my-model')
  16. APPLY THE MODEL def get_new_img(): x = [] img_path =

    'image.png' im = misc.imread(img_path, flatten=True).flatten() x.append(im) x = np.array(x) return x classifier.predict(input_fn=get_new_img, as_iterable=False)
  17. TENSORBOARD – MONITORING • Open http://localhost:6006 • Monitor training at

    runtime $ tensorboard --logdir /tmp/model_dir
  18. TENSORBOARD – GRAPHS

  19. TENSORBOARD – CLASSES

  20. TENSORBOARD – CLASSES

  21. PERFORMANCE • CPU (C++ implementation – pretty efficient) • GPU

    – even faster! • JIT compiler – Speed things up by adding a single line of code – Experimental • XLA compiler – Ahead-of time compilation – Run on embedded devices (phones, IoT)
  22. DEPLOYMENT • Google Cloud & AWS offer VMs with GPUs

    • FloydHub – Heroku for AI – https://www.floydhub.com/ Provider AWS Google Floyd Cost per hour ($) 0.99 0.795 0.432
  23. LEARNING • Easy riding – https://changelog.com/podcast/219 – TF Dev Summit

    ‘17 videos – https://events.withgoogle.com/tensorflow-dev-summit/ • Docs & tutorial – https://www.tensorflow.org/get_started/get_started – https://medium.freecodecamp.com/big-picture-machine-learning-classifying-text-with- neural-networks-and-tensorflow-d94036ac2274 • Good free books – ESL – http://statweb.stanford.edu/~tibs/ElemStatLearn/ – Michael Nielsen – http://neuralnetworksanddeeplearning.com/ • Research – http://distill.pub/
  24. THANKS! • Dražen Lučanin • @metakermit • Building apps with

    a kick! https://punkrockdev.com/