Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML on Mobile DroidKaigi 2018

rejasupotaro
February 09, 2018

ML on Mobile DroidKaigi 2018

rejasupotaro

February 09, 2018
Tweet

More Decks by rejasupotaro

Other Decks in Programming

Transcript

  1. Ready for AI?
    ML on Moble
    » Machine Learning & TensorFlow
    » Build your own model and optimize it
    DroidKaigi 2018 @rejasupotaro 1

    View Slide

  2. » Kentaro Takiguchi / @rejasupotaro
    » Software Engineer
    » Android (since 2010)
    » ! Web
    » ! Search
    » ! ML
    » Discovery team at Cookpad UK
    2

    View Slide

  3. Discovery
    = Search + Recommendation
    Provide users with the right contents at the right time
    3

    View Slide

  4. Deep Learning Achievements Over The
    Past Year
    https://blog.statsbot.co/deep-learning-achievements-4c563e034257 4

    View Slide

  5. Artifical Intelligence, Blockchain, IoT, VR,
    Serverless architecture, ...
    5

    View Slide

  6. Best Apps in 2017
    Socratic, Pinterest, Faceapp 6

    View Slide

  7. Google Flights will now predict airline
    delays before the airlines do
    7

    View Slide

  8. One Day
    » When I send a resume to a company, AI predicts my
    future performance, then they reject without
    having an interview
    8

    View Slide

  9. Another Day
    » When I send a resume to a company, AI predicts my
    future performance, then they reject without
    having an interview
    » When I attempt to buy an insurance, AI detects a
    risk of my future illness, then they reject my
    application
    9

    View Slide

  10. Another Day
    » When I send a resume to a company, AI predicts my
    future performance, then they reject without
    having an interview
    » When I attempt to buy an insurance, AI detects a
    risk of my future illness, then they reject my
    application
    » When I pass in front of a surveillance camera, AI
    detects that I will commit a crime, then I'm
    arrested
    10

    View Slide

  11. ! Sounds fun !
    11

    View Slide

  12. ! This is the talk for you
    » ✅ "I'm just interested in AI"
    » Welcome!
    » ✅ "I've investigated the feasibility. I want to
    release"
    » This talk focuses on what's happening inside
    » You would be able to release your app with
    confidence
    12

    View Slide

  13. API development
    is matured
    Server: "Call this API with the param"
    GET /users?id=1
    Client: "Okay"
    » We get to know how to use it by just looking at
    this one line
    13

    View Slide

  14. Productioning ML
    has just started
    ML: "I trained a model. Use this"
    awesome_model.pb
    Client: ???
    » Machine learning researchers need to know the real
    environment
    » Product developers need to learn machine learning
    14

    View Slide

  15. My Motivation
    It's like Android development around 2012 (Everything
    was difficult...)
    I feel we need to share knowledge.
    ! I want to share what I learned.
    15

    View Slide

  16. Build
    This
    AI
    App
    Together
    https://github.com/rejasupotaro/logic-gate 16

    View Slide

  17. Why TensorFlow?
    » There are many libraries such as PyTorch, Caffe2,
    Theano, Chainer, ...
    17

    View Slide

  18. Runtime
    We would ❤ the official way
    Android iOS
    TensorFlow
    Deeplearning4j
    Core ML
    TensorFlow
    ” Let's ask if the library they love supports these
    platforms
    18

    View Slide

  19. Train ! (IR) ! Inference
    19

    View Slide

  20. What is TensorFlow?
    “TensorFlow is an interface for expressing machine
    learning algorithms, and an implementation for
    executing such algorithms, focusing on a wide variety
    of heterogeneous systems, ranging from mobile devices
    up to large-scale distributed systems of hundred
    machines and thousands of computational devices”
    https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf 20

    View Slide

  21. What is TensorFlow?
    https://www.tensorflow.org/programmers_guide/graphs 21

    View Slide

  22. What is TensorFlow?
    import tensorflow as tf
    a = tf.constant(2)
    b = tf.constant(3)
    with tf.Session() as session:
    print(session.run(a + b)) # => 5
    22

    View Slide

  23. Why TensorFlow?
    » There are many libraries such as PyTorch, Caffe2,
    Theano, Chainer, ...
    » ! Most of libraries are designed for research, but
    TensorFlow is designed to run models in
    production, for various environments including
    mobile.
    23

    View Slide

  24. Architecture of TensorFlow
    24

    View Slide

  25. TensorFlow Mobile & Lite
    25

    View Slide

  26. Arhictecture of TensorFlow Lite
    https://developer.android.com/ndk/guides/neuralnetworks/index.html 26

    View Slide

  27. Arm support for Android NNAPI
    gives >4x performance boost
    https://arxiv.org/pdf/1801.06274.pdf 27

    View Slide

  28. TensorFlow Mobile (17.2 MB)
    TensorFlow Lite (1.5 MB)
    28

    View Slide

  29. TensorFlow Mobile & Lite
    TensorFlow Mobile TensorFlow Lite
    1.4.0 0.1.1
    » TensorFlow Mobile
    » = TensorFlow Java + Android
    » TensorFlow Lite
    » = Minimum implementation optimized for mobile
    and embedded devices
    as of 2018/02/01 29

    View Slide

  30. “Google will make
    Android the best
    platform for
    machine learning”
    ૉ੖Β͍͠NDKͷੈք / Building High Performance Android Apps with NDK ʵ 2018/02/08 15:40-16:30 30

    View Slide

  31. Let's See How to run a trained
    model
    31

    View Slide

  32. Run a trained model
    » Add the dependency.
    implementation "org.tensorflow:tensorflow-android:1.4.0"
    » Put a model in src/main/assets
    » Run the computation graph
    inferenceInterface.feed(inputName, input, shape.first, shape.second)
    inferenceInterface.run(arrayOf(outputName))
    inferenceInterface.fetch(outputName, output)
    » Give [0, 1] ! Get [0]
    32

    View Slide

  33. That's it!
    » It can be done within 5 min if I have a trained
    model
    » You don't need to write C++. You don't need to use
    CMake. Just use the library.
    33

    View Slide

  34. Logic AND Gate
    Input Output
    0, 0 0
    0, 1 0
    1, 0 0
    1, 1 1
    34

    View Slide

  35. » , : input
    » : output
    » , : variables
    » Fomula:
    35

    View Slide

  36. Let's define a graph
    36

    View Slide

  37. x = tf.placeholder(tf.float32, shape=[None, 2], name='x')
    y = tf.placeholder(tf.float32, shape=[None, 1], name='y')
    w = tf.Variable(tf.zeros([2, 1]), name='weight')
    b = tf.Variable(tf.zeros([1]), name='bias')
    y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred')
    with tf.name_scope("loss"):
    loss = tf.reduce_sum(tf.square(y_pred - y), name='loss')
    with tf.name_scope("train"):
    optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer')
    train_step = optimizer.minimize(loss, name='train_step')
    with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for epoch in range(self.args.epochs):
    _, summary, l = session.run(
    [train_step, merged, loss],
    feed_dict={
    x: input, y: output
    }
    )
    37

    View Slide

  38. x = tf.placeholder(tf.float32, shape=[None, 2], name='x')
    y = tf.placeholder(tf.float32, shape=[None, 1], name='y')
    w = tf.Variable(tf.zeros([2, 1]), name='weight')
    b = tf.Variable(tf.zeros([1]), name='bias')
    y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred')
    with tf.name_scope("loss"):
    loss = tf.reduce_sum(tf.square(y_pred - y), name='loss')
    with tf.name_scope("train"):
    optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer')
    train_step = optimizer.minimize(loss, name='train_step')
    with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for epoch in range(self.args.epochs):
    _, summary, l = session.run(
    [train_step, merged, loss],
    feed_dict={
    x: input, y: output
    }
    )
    38

    View Slide

  39. x = tf.placeholder(tf.float32, shape=[None, 2], name='x')
    y = tf.placeholder(tf.float32, shape=[None, 1], name='y')
    w = tf.Variable(tf.zeros([2, 1]), name='weight')
    b = tf.Variable(tf.zeros([1]), name='bias')
    y_pred = tf.nn.sigmoid(tf.matmul(x, w) + bias, name='y_pred')
    with tf.name_scope("loss"):
    loss = tf.reduce_sum(tf.square(y_pred - y), name='loss')
    with tf.name_scope("train"):
    optimizer = tf.train.AdamOptimizer(learning_rate=0.1, name='optimizer')
    train_step = optimizer.minimize(loss, name='train_step')
    with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    for epoch in range(self.args.epochs):
    _, summary, l = session.run(
    [train_step, merged, loss],
    feed_dict={
    x: input, y: output
    }
    )
    39

    View Slide

  40. Training
    » Define the problem (inputs and expected outputs)
    » Build the computation graph
    » : tf.placeholder
    » , : tf.Variable
    » : tf.nn.sigmoid(tf.matmul(x, w) + b)
    » Find the appropriate values by iterating training
    steps
    40

    View Slide

  41. Visualizing a Model
    41

    View Slide

  42. Congrats! !
    » Once you understand how model is built, you can
    build any models (by modifying the graph a little)
    » You don't need to hardcode the logic. Machine
    learns how to do it instead of you.
    42

    View Slide

  43. Hardcode Logic
    fun predict(x1: Int, x2: Int): Int {
    return if (x1 == 0 && x2 == 0) {
    0
    } else if (x1 == 1 && x2 == 0) {
    0
    } else if (x1 == 0 && x2 == 1) {
    0
    } else if (x1 == 1 && x2 == 1) {
    1
    } else {
    throw RuntimeException("I hope it won't happen")
    }
    }
    ” List all possible cases
    43

    View Slide

  44. Machine Learns
    // [[0, 0], [0, 1], [1, 0], [1, 1]]
    fun predict(x1: Int, x2: Int): Int {
    ??? // Machine learns what value to return
    }
    // => [[0], [0], [0], [1]]
    ” You don't need to write the logic
    44

    View Slide

  45. XOR Gate
    // [[0, 0], [0, 1], [1, 0], [1, 1]]
    fun predict(x1: Int, x2: Int): Int {
    ??? // Machine learns what value to return
    }
    // => [[0], [1], [1], [0]]
    Nest more layers when we work on a complex problem 45

    View Slide

  46. Estimate Weight
    // [170cm, Man, ...]
    fun predict(...): Int {
    ??? // Machine learns what value to return
    }
    // => 70kg
    46

    View Slide

  47. Translation
    » "I am learning machine learning"
    » ! ["i", "am", "learning", "machine", "learning"]
    » ! [0.53177921, 0.7965138, 0.66959208, ...]
    » ! ["ࢲ", "͸", "ػցֶश", "Λ", "ษڧ", "͠", "ͯ",
    "͍", "·͢"]
    » ! "ࢲ͸ػցֶशΛษڧ͍ͯ͠·͢"
    47

    View Slide

  48. Image Classification
    » [#000000, #00001A, ...]
    » ! [[0, 0, 0], [0, 0, 0.1], ...]
    » ! "Cat" or "Burrito"
    48

    View Slide

  49. Deploy to Production
    » Optimize Model
    » Deploy Model
    49

    View Slide

  50. Practically, we don't run such a
    graph in production
    50

    View Slide

  51. Unnecessary nodes +
    No need to be variables
    51

    View Slide

  52. Training
    » Define a problem (inputs and expected outputs)
    » Build a computation graph
    » Find the appropriate values by iterating training
    steps
    Inference
    » Get the predicted value
    52

    View Slide

  53. Why we need to transform graph
    1.To enable the model to run on mobile
    » Supported operations are limited
    2.For better performance
    » Computer resources are restricted
    » Shouldn't drain users' phone battery
    53

    View Slide

  54. 1. To enable the model to run on
    mobile
    $ find ./tensorflow/python/ -type f | grep '.py$' | wc -l
    968
    $ find ./tensorflow/java/ -type f | grep '.java$' | wc -l
    42
    Feature Python Java
    Run a predefined Graph tf.import_graph_def, tf.Session TensorFlowInferenceInterface
    Graph construction Yes
    Gradients tf.gradients
    Functions tf.python.framework.function.Defun
    Control Flow tf.cond, tf.while_loop
    Neural Network library tf.train, tf.nn, tf.contrib.layers,
    tf.contrib.slim
    54

    View Slide

  55. 2. For better performance
    Pixel 2 XL (my phone) Oppo A37f (most used)
    Chipset Qualcomm MSM8996
    Snapdragon 821
    Qualcomm MSM8916
    Snapdragon 410
    CPU Quad-core (2x2.15
    GHz Kryo & 2x1.6
    GHz Kryo)
    Quad-core 1.2 GHz
    Cortex-A53
    GPU Adreno 530 Adreno 306
    RAM 4 GB 2 GB
    There is a big gap between developers and users 55

    View Slide

  56. Graph Transformation
    Model Size !, Persormance ", (Accuracy !)
    56

    View Slide

  57. Freeze: Variable to Constant
    node {
    name: "x"
    op: "Placeholder"
    ...
    }
    node {
    name: "bias"
    op: "VariableV2"
    attr {
    key: "container"
    value {
    s: ""
    }
    }
    attr {
    key: "shape"
    value {
    shape {
    dim {
    size: 1
    }
    }
    }
    }
    ...
    57

    View Slide

  58. Freeze: Variable to Constant
    » ! Train
    » ! Checkpoint file is created
    » ! Freeze
    $ bazel build tensorflow/python/tools:optimize_for_inference
    $ bazel-bin/tensorflow/python/tools/optimize_for_inference \
    --input=../logic-gate/logic-gate-python/models/and.pb \
    --output=../logic-gate/logic-gate-python/models/optimized_and.pb \
    --frozen_graph=True \
    --input_names=x \
    --output_names=y_pred
    58

    View Slide

  59. Freeze: Variable to Constant
    node {
    name: "x"
    op: "Placeholder"
    ...
    }
    node {
    name: "bias"
    op: "Const"
    attr {
    key: "value"
    value {
    tensor {
    dtype: DT_FLOAT
    tensor_shape {
    dim {
    size: 1
    }
    }
    float_val: -15.988138198852539
    }
    }
    }
    ...
    bias became Const with a fixed float value 59

    View Slide

  60. Before (Training)
    60

    View Slide

  61. After (Inference)
    61

    View Slide

  62. Performance
    » Computational Cost
    » Model Size
    62

    View Slide

  63. Pre-trained Models for Image
    Classification
    » Inception, AlexNet, MobileNet, ...
    63

    View Slide

  64. Model Comparison
    » Retrained Inception-v3, Retrained MobileNet,
    PatchNet (Modified Inception-v3)
    » Specialized for food
    64

    View Slide

  65. https://speakerdeck.com/lunardog/cooking-with-food-photos 65

    View Slide

  66. MobileNet ! PatchNet
    66

    View Slide

  67. Performance Comparison
    Retrained
    Inception-v3
    Retrained
    MobileNet
    PatchNet (Modified
    Inception-v3)
    Model Size 77.4 MB 11.6 MB 85.9 MB
    Execution
    Time
    05m 44s
    166ms
    01m 19s
    535ms
    18m 34s
    624ms
    CPU Usage 80% 50% 80%
    Memory Usage 500 MB 150 MB 500 MB
    67

    View Slide

  68. 68

    View Slide

  69. Graph Complexity
    » ≈ Computational Cost
    » ≈ Model Size
    69

    View Slide

  70. Inception-v3, MobileNet, PatchNet 70

    View Slide

  71. Benchmark
    $ bazel build tensorflow/tools/benchmark/benchmark_model
    $ bazel-bin/tensorflow/tools/benchmark/benchmark_model \
    --graph=... \
    --input_layer=... \
    --input_layer_shape=... \
    --input_layer_type=... \
    --output_layer=... \
    ...
    71

    View Slide

  72. Benchmark
    Retrained Inception-v3 Retrained MobileNet PatchNet
    Number of nodes
    executed
    507 264 704
    AVG Timings
    (microseconds)
    368284 35551.5 1.28228e+06
    Actual
    Execution Time
    05m 44s 166ms 01m 19s 535ms 18m 34s 624ms
    How much faster
    than PatchNet
    3x faster 18x faster -
    72

    View Slide

  73. Graph Transformation
    $ bazel build tensorflow/tools/graph_transforms:transform_graph
    $ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    --in_graph=../models/mobilenet.pb \
    --out_graph=../models/optimized-mobilenet.pb \
    --inputs='input_1:0' \
    --outputs='final:0' \
    --transforms='
    strip_unused_nodes(type=float, shape="1,224,224,3")
    remove_nodes(op=Identity, op=CheckNumerics)
    fold_constants(ignore_errors=true)
    fold_batch_norms
    fold_old_batch_norms
    quantize_weights
    sort_by_execution_order'
    73

    View Slide

  74. Graph Transformation
    Original quantize_weights More Options
    Model Size 12 MB 3.3 MB 3.1 MB
    Accuracy 0.97 0.97 0.9633
    Benchmark 01m 19s 535ms 01m 17s 716ms 01m 13s 816ms
    CPU Usage 40-60% 40-60% 40-60%
    Memory Usage 120 MB 90 MB 90 MB
    Number of nodes
    executed
    264 264 236
    ” MobileNet 74

    View Slide

  75. Accuracy
    Basically, accuracy should be the same if we run the
    same computation graph with the same inputs but...
    » Computation Graph
    » Transformed for mobile
    » Data preprocessing
    » Vectorization
    » Feature Scaling
    75

    View Slide

  76. Feature Scaling
    Make sure that features are on the same scale.
    ! One particular feature which has a broad range of
    values governs the result "
    76

    View Slide

  77. Normalization (Number)
    Rescaling
    Mean normalization
    Standardization
    77

    View Slide

  78. Normalization (Color)
    val pixels: IntArray ! How to process this data?
    78

    View Slide

  79. Load from Asset Manager / File
    constructor(assets: AssetManager) {
    this.inferenceInterface = TensorFlowInferenceInterface(assets, modelName)
    }
    constructor(file: File) {
    this.inferenceInterface = file.inputStream().use {
    TensorFlowInferenceInterface(it)
    }
    }
    79

    View Slide

  80. Still worry about app size?
    » Put the model in assets (bundling)
    » Pros: No need to manage versions
    » Cons: Increased app size
    » Download a model into a file
    » Pros: Reduced app size
    » Cons: Need to manage versions
    80

    View Slide

  81. Still worry about library size?
    » Splitting APK
    android {
    ...
    splits {
    abi {
    enable true
    reset()
    include 'x86_64', 'x86', 'arm64-v8a', 'armeabi-v7a', 'arm64-v8a'
    }
    }
    project.ext.abiCodes = ['x86_64': 1, 'x86': 2, 'arm64-v8a': 3, 'armeabi-v7a': 4].withDefault {
    0
    }
    }
    android.applicationVariants.all { variant ->
    variant.outputs.each { output ->
    def baseAbiVersionCode = project.ext.abiCodes.get(output.getFilter(OutputFile.ABI))
    if (baseAbiVersionCode != null) {
    output.versionCodeOverride = baseAbiVersionCode * 1000 + variant.versionCode
    }
    }
    }
    81

    View Slide

  82. Asynchronous download + Splitting APK
    ! Get power of AI by just adding 345 MB
    82

    View Slide

  83. [Summary]
    » How to train a model
    » Difference between TensorFlow Mobile and Lite
    » How to transform a graph ! Freeze/Optimize/
    benchmark
    » How to implement client-side
    » Model is too big? ! Consider asyncronous
    download
    » APK still is too big? ! Split APK
    83

    View Slide

  84. Thank you for listening!
    Any questions?
    ! @rejasupotaro 84

    View Slide