$30 off During Our Annual Pro Sale. View Details »

Deep Learningによる画像認識の基礎・CNNの仕組み

kmotohas
October 21, 2019

Deep Learningによる画像認識の基礎・CNNの仕組み

Tableauデータサイエンス勉強会 第4回 - 画像認識技術とBIの巻-
https://techplay.jp/event/750555
2019-10-21

kmotohas

October 21, 2019
Tweet

More Decks by kmotohas

Other Decks in Technology

Transcript

  1. 畳み込みニューラル
    ネットワークの基礎と応⽤
    Tableau データサイエンス勉強会第4回 ー画像認識とAIの巻ー
    https://techplay.jp/event/750555
    2019-10-21 (Mon) @ TECH PLAY SHIBUYA

    View Slide

  2. Kazuki Motohashi - Skymind K.K.
    ‣本橋 和貴 @kmotohas
    - スカイマインド株式会社
    • Deep Learning Engineer (前職ではDL+ROS)
    - 素粒⼦物理学実験(LHC-ATLAS実験)出⾝
    • 博⼠(理学)
    - 好きな本︓詳説 Deep Learning ̶ 実務者のためのアプローチ
    2
    ࣗݾ঺հ

    View Slide

  3. Kazuki Motohashi - Skymind K.K.
    ‣ 原著 “Deep Learning ̶ A Practitionerʼs Approach” は 2017年8⽉発売
    ‣ JVM⾔語⽤ディープラーニング開発フレームワーク Deeplearning4j
    (DL4J) を⽤いた解説書
    - 著者は DL4J の開発者 Adam Gibson、Skymind Inc を創業
    - ソフトウェア/アプリケーション/システム・エンジニアなどがメイン
    ターゲット
    - ディープラーニングの基礎からHadoop/Sparkといったビッグデータ
    分析基盤との連携まで解説
    3
    ೥݄೔ൃച

    View Slide

  4. Agenda
    機械学習の導⼊
    Kerasを⽤いたコーディングサンプル
    畳み込みニューラルネットワークの基礎
    CNNのフィルタの理解
    CNNアーキテクチャの紹介
    4

    View Slide

  5. 5
    "
    15

    !#$

    View Slide

  6. ニューラルネットワークの基礎
    6

    View Slide

  7. 7
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  8. Activity Recognition
    8
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  9. Activity Recognition
    9
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  10. Activity Recognition
    10
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  11. Activity Recognition
    11
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  12. 12
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  13. Activity Recognition
    13
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  14. 機械学習の"Hello World”
    14
    x = -2, -1, 0, 1, 2, 3, 4
    y = -3, -1, 1, 3, 5, 7, 9
    y = f(x)

    View Slide

  15. 機械学習の"Hello World”
    15
    x = -2, -1, 0, 1, 2, 3, 4
    y = -3, -1, 1, 3, 5, 7, 9
    y = f(x) = 2x + 1

    View Slide

  16. 機械学習のアプローチ
    •適当にモデルを初期化 (y_=ax+b)
    16

    View Slide

  17. 機械学習のアプローチ
    • 適当にモデルを初期化 (y_=ax+b)
    •誤差(損失)を計算 (L=1/N Σ(y-y_)2)
    17

    View Slide

  18. 機械学習のアプローチ
    • 適当にモデルを初期化 (y_=ax+b)
    • 誤差(損失)を計算 (L=1/N Σ(y-y_)2)
    •誤差が⼩さくなるようにパラメータ (a, b)
    を少し更新 (a ← a - η ∂L/∂a)
    18

    View Slide

  19. 機械学習のアプローチ
    19
    • 適当にモデルを初期化 (y_=ax+b)
    • 誤差(損失)を計算 (L=1/N Σ(y-y_)2)
    • 誤差が⼩さくなるようにパラメータ (a, b)
    を少し更新 (a ← a - η ∂L/∂a)
    •誤差(損失)を計算 (L=1/N Σ(y-y_)2)

    View Slide

  20. 機械学習のアプローチ
    20
    • 適当にモデルを初期化 (y_=ax+b)
    • 誤差(損失)を計算 (L=1/N Σ(y-y_)2)
    • 誤差が⼩さくなるようにパラメータ (a, b)
    を少し更新 (a ← a - η ∂L/∂a)
    • 誤差(損失)を計算 (L=1/N Σ(y-y_)2)
    •誤差が⼩さくなるようにパラメータ (a, b)
    を少し更新 (a ← a - η ∂L/∂a)
    • …

    View Slide

  21. 21
    import numpy as np

    View Slide

  22. 22
    Y
    ºB
    C
    Z@
    import numpy as np

    View Slide

  23. 23
    import numpy as np

    View Slide

  24. 24

    View Slide

  25. 勾配降下法
    25
    「⽬隠しで⾜元の勾配情報のみを使って⼭の頂上を⽬指すようなもの」
    学習率 η は歩幅のイメージ(η⼩=すり⾜、η⼤=巨⼈の⼀歩)
    IUUQTUXJUUFSDPNNPNJKJ@GVMMNPPOTUBUVT
    IUUQTXXXZBNBLFJPOMJOFDPNKPVSOBMEFUBJMQIQ JE

    View Slide

  26. Fashion MNIST Dataset
    • 7万画像
    • 10カテゴリ
    • 28×28 pixels
    • 実験⽤データセット
    26
    IUUQTHJUIVCDPN[BMBOEPSFTFBSDIGBTIJPONOJTU

    View Slide

  27. 27

    View Slide

  28. 28

    View Slide

  29. 29
    ʜ
    ʜʜʜʜ
    ʜ
    ʜ
    'MBUUFO




    View Slide

  30. 30

    View Slide

  31. Dense Layer の⽋点
    • ⼊⼒のベクトルの全要素の相関をみている
    > 住宅価格予測みたいな話ならまだいい
    > もう作ってる特徴量と特徴量の組み合わせ
    - 例)東京墨⽥区 & 床⾯積 30m2 & 1K & ⾵呂トイレ別 & 新築 => 家賃⽉10万円
    • 「画像の特徴量」を抽出してからDense Layerに渡せば効率的
    31
    ৞ࠐΈχϡʔϥϧωοτϫʔΫ $POWPMVUJPOBM/FVSBM/FUXPSL$//

    View Slide

  32. 畳み込みニューラルネットワークの基礎
    32

    View Slide

  33. 畳み込み (Convolution)
    CURRENT_PIXEL_VALUE = 82
    NEW_PIXEL_VALUE =
    (-1 * 144) + (0 * 60) + (-2 * 19)
    + (0.5 * 188) + (4.5 * 82) + (-1.5 * 32)
    + (1.5 * 156) + (2 * 55) + (-3 * 27)
    33






    u
    ijm
    =
    K 1
    X
    k=0
    W 1
    X
    p=0
    H 1
    X
    q=0
    z(l 1)
    i+p,j+q,k
    h
    pqkm
    + b
    ijm
    AAACwnichVHLShxBFD12Xjp5OEk2ATfiYDCMDreNoAiCGBdCNj4yjqBj090ptaaf048BbfsH/IEsXBkIIfgZbvIDWfgJkp0jZOPCO91NQiJJbtNV55y659atKsO3ZRgRnfcpd+7eu/+gf6D08NHjJ4Plp8/WQy8OTFE3PdsLNgw9FLZ0RT2SkS02/EDojmGLhmG96a03OiIIpee+i/Z90XT0XVfuSFOPWNLKzVhLZMtJ57bC2NlO3k6oqZZYc5TmvJFx/ydfynib+cF2MmZPqK+Yyqo/3qq2x610j3PblpNWjbyqVq5QjbIYvg3UAlRQxLJX/owtvIcHEzEcCLiIGNvQEfK3CRUEn7UmEtYCRjJbF0hRYm/MWYIzdFYtHneZbRaqy7xXM8zcJu9i8x+wcxij9I2+UJe+0ild0PVfayVZjV4v+zwbuVf42uDRi7Uf/3U5PEfY++X6Z88RdjCT9Sq5dz9Teqcwc3/n4EN3bXZ1NHlJH+k7939C53TGJ3A7V+anFbF6jBI/gPrndd8G65M19XVtcmWqMr9QPEU/hjCCMb7vacxjCcuo875nuMAlusqi0lLaSpinKn2F5zl+C+XwBo4trvM=

    View Slide

  34. 34



























    https://www.bbkong.net/fs/alleyoop/molten_BGL7

    View Slide

  35. stride=(1, 1), padding=ʻvalidʼ
    35

    View Slide

  36. stride=(1, 1), padding=ʻvalidʼ
    36

    View Slide

  37. stride=(1, 1), padding=ʻvalidʼ
    37

    View Slide

  38. stride=(1, 1), padding=ʻvalidʼ
    38

    View Slide

  39. stride=(1, 1), padding=ʻvalidʼ
    39

    View Slide

  40. stride=(1, 1), padding=ʻvalidʼ
    40

    View Slide

  41. stride=(1, 1), padding=ʻvalidʼ
    41

    View Slide

  42. stride=(2, 2), padding=ʻvalidʼ
    42

    View Slide

  43. stride=(2, 2), padding=ʻvalidʼ
    43

    View Slide

  44. stride=(2, 2), padding=ʻvalidʼ
    44

    View Slide

  45. stride=(1, 1), padding=ʻsameʼ
    45

    View Slide

  46. stride=(1, 1), padding=ʻsameʼ
    46

    View Slide

  47. stride=(1, 1), padding=ʻsameʼ
    47

    View Slide

  48. stride=(1, 1), padding=ʻsameʼ
    48

    View Slide

  49. Max Pooling
    49
    https://www.coursera.org/learn/introduction-tensorflow

    View Slide

  50. AlexNet
    50

    View Slide

  51. 51
    TMJEFGSPN,FZOPUF4QFFDICZ-BVSFODF.PSPOFZBU%FFQ-FBSOJOH%BZ

    View Slide

  52. 52
    https://medium.com/@lmoroney_40129/codelabs-from-googlemlsummit-f9d53cac8d24

    View Slide

  53. 応⽤的なCNNアーキテクチャの紹介
    53

    View Slide

  54. ImageNet コンペの優勝モデル
    54
    https://www.slideshare.net/ren4yu/ss-84282514

    View Slide

  55. ResNet
    • 2015年のImageNetコンペ (ILSVRC) 優勝モデル
    • Residualモジュール(ショートカット機構)の導⼊
    55
    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf
    Revolution of Depth
    3.57
    6.7 7.3
    11.7
    16.4
    25.8
    28.2
    ILSVRC'15
    ResNet
    ILSVRC'14
    GoogleNet
    ILSVRC'14
    VGG
    ILSVRC'13 ILSVRC'12
    AlexNet
    ILSVRC'11 ILSVRC'10
    ImageNet Classification top-5 error (%)
    shallow
    8 layers
    19 layers
    22 layers
    152 layers
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi
    8 layers

    View Slide

  56. 56
    Revolution of Depth
    ResNet, 152 layers
    1x1 conv, 64
    3x3 conv, 64
    1x1 conv, 256
    1x1 conv, 64
    3x3 conv, 64
    1x1 conv, 256
    1x1 conv, 64
    3x3 conv, 64
    1x1 conv, 256
    1x2 conv, 128, /2
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 256, /2
    3x3 conv, 256
    7x7 conv, 64, /2, pool/2
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi
    (there was an animation here)

    View Slide

  57. 57
    Revolution of Depth
    ResNet, 152 layers
    1x1 conv, 512
    1x1 conv, 128
    3x3 conv, 128
    1x1 conv, 512
    1x1 conv, 256, /2
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi
    (there was an animation here)

    View Slide

  58. 58
    Revolution of Depth
    ResNet, 152 layers
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi
    (there was an animation here)

    View Slide

  59. 59
    Revolution of Depth
    ResNet, 152 layers
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 256
    3x3 conv, 256
    1x1 conv, 1024
    1x1 conv, 512, /2
    3x3 conv, 512
    1x1 conv, 2048
    1x1 conv, 512
    3x3 conv, 512
    1x1 conv, 2048
    1x1 conv, 512
    3x3 conv, 512
    1x1 conv, 2048
    ave pool, fc 1000
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi
    (there was an animation here)

    View Slide

  60. ResNet
    60
    Deep Residual Learning
    • Plaint net
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi
    any two
    stacked layers

    ()
    weight layer
    weight layer
    relu
    relu
    is any desired mapping,
    hope the 2 weight layers fit ()
    Deep Residual Learning
    • Residual net
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian

    hop
    hop
    weight layer
    weight layer
    relu
    relu

    +
    identity


    http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

    View Slide

  61. 61
    CIFAR-10 experiments
    0 1 2 3 4 5 6
    0
    5
    10
    20
    iter. (1e4)
    error (%)
    plain-20
    plain-32
    plain-44
    plain-56
    20-layer
    32-layer
    44-layer
    56-layer
    CIFAR-10 plain nets
    0 1 2 3 4 5 6
    0
    5
    10
    20
    iter. (1e4)
    error (%)
    ResNet-20
    ResNet-32
    ResNet-44
    ResNet-56
    ResNet-110
    CIFAR-10 ResNets
    56-layer
    44-layer
    32-layer
    20-layer
    110-layer
    • Deep ResNets can be trained without difficulties
    • Deeper ResNets have lower training error, and also lower test error
    solid: test
    dashed: train
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Dee Re id al Lea ning f Image Rec gni i n a Xi

    View Slide

  62. Further Reading
    • ⼀年半前 (2017-12) の資料ですが素晴らしいスライドです
    • 畳み込みニューラルネットワークの研究動向 by DeNA 内⽥さん
    • https://www.slideshare.net/ren4yu/ss-84282514
    62

    View Slide

  63. YOLO (You Only Look Once)
    63
    https://www.youtube.com/watch?v=MPU2HistivI

    View Slide

  64. YOLOのアーキテクチャ
    64

    View Slide

  65. Labeling / Annotation
    65
    https://github.com/Microsoft/VoTT

    View Slide

  66. まとめ
    • 機械学習ではデータからルールを確率的に学ぶ
    • 画像に対して全結合の重みを最適化するのは⾮効率
    • CNNで画像の特徴量抽出⽅法まで⾃動で⾏うアプローチ
    • ResNetなど、さらに効率的に⾏うモデルが提案されている
    • 何がうまくいくかは正直データによるので、経験的にうまくいきそうな
    アーキテクチャの組み合わせだけ⽤意してあとは勝⼿に選んで欲しい
    → Auto ML
    66

    View Slide

  67. • https://dl4-practitioners.connpass.com/event/149810/
    • 毎⽉開催、来⽉11/18はGoogleの⼈がAutoMLの話をしてくれる予定
    • 無料
    67

    View Slide