Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Path-Level Network Transformation for Efficient Architecture Search (ICML2018読み会)

S.Shota
July 28, 2018

Path-Level Network Transformation for Efficient Architecture Search (ICML2018読み会)

ICML2018読み会 (https://connpass.com/event/92705/) での発表資料です.

H. Cai, J. Yang, W. Zhang, S. Han, and Y. Yu, “Path-Level Network Transformation for Efficient Architecture Search,” in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 677–686.

http://proceedings.mlr.press/v80/cai18a.html

S.Shota

July 28, 2018
Tweet

More Decks by S.Shota

Other Decks in Research

Transcript

  1. 斉藤 翔汰(横浜国立大学)
    2018年7月28日 | ICML2018読み会
    Path-Level Network Transformation
    for Efficient Architecture Search
    H. Cai, J. Yang, W. Zhang, S. Han, Y. Yu | ICML 2018

    View Slide

  2. 自己紹介・Research Interest
    2
    • 名前:斉藤 翔汰
    • 横浜国立大学大学院 環境情報学府
    情報メディア環境学専攻 白川研究室 M2
    • Machine Learning (Deep Learning, Feature Selection)
    • Evolutionary Computation (Evolution Strategy)
    • ML × EC = Evolutionary Machine Learning
    feature 1
    feature 2

    feature d
    0
    1

    1

    0 1

    0 1 0 1
    Expected loss
    Update
    the distribution
    Update
    the model
    Input
    vector
    Model
    Bernoulli
    distribution
    Sampling
    the binary vector
    Concept image of PEFS
    G(W, ✓)
    Shota Saito, Shinichi Shirakawa, Youhei Akimoto: “Embedded Feature Selection Using Probabilistic Model-Based Optimization”,
    Student Workshop in Genetic and Evolutionary Computation Conference 2018 (GECCO 2018) , Kyoto, Japan, 15th-19th July (2018).
    Probabilistic
    model-based EC
    ×
    Feature Selection

    View Slide

  3. 文献情報・選んだ理由
    3
    • H. Cai, J. Yang, W. Zhang, S. Han, and Y. Yu,
    “Path-Level Network Transformation for
    Efficient Architecture Search,” in Proceedings of
    the 35th International Conference on Machine
    Learning, 2018, pp. 677–686.
    • Neural Networkの構造探索(特に強化学習ベース)
    にまつわる研究であるため
    • 省コストな構造探索のサーベイも兼ねて
    • ENAS[Pham et al. 2018] ,DARTS [Liu et al., 2018]など

    View Slide

  4. 概要
    4
    • Net2Net:もとの学習済みNeural Network (NN)の
    構造を拡大して再学習する手法
    • 重み継承(パラメータ共有)付き構造探索
    • NAS:NNの構造を出力するRNNを用意し,
    強化学習(REINFORCE)で構造学習する手法
    • さらにNet2Netにおける構造の拡大方法にも一工夫
    Net2Net + Neural Architecture Search
    提案手法を構成する技術
    [Chen et al., 2016] [Zoph & Le, 2017]

    View Slide

  5. Related Work and Background
    | 関連研究と背景
    5

    View Slide

  6. Related Work and Background
    6
    1990 ~ 2000
    Neuro-Evolution
    2010 ~ 2016
    Bayesian Optimization
    2016 ~
    強化学習ベースの手法
    進化計算で構造や重みを最適化
    目的関数をガウス過程によって
    推定し,ある基準に従って探索
    構造を探索するエージェントを考え,
    強化学習で方策を更新
    NAS以外にQ-Learningを用いた手法
    [Baker et al., 2017] が存在
    現在

    View Slide

  7. • NASの枠組み:Recurrent NN + REINFORCE
    Related Work and Background
    7
    Child Network 1
    (CNN)
    … Child Network N
    (CNN)
    Meta-Controller
    (RNN)
    ①RNNが学習した生成確率P
    に基づいて構造を複数サンプル
    Reward 1
    (Accuracy)
    Reward N
    (Accuracy)

    ②NNを学習して
    評価値を算出
    ③評価値をもとに勾配を計算
    ④勾配を使ってControllerの
    パラメータを更新
    m: Child Networkの個数,T:ハイパーパラメータの個数
    r✓
    J
    (

    )
    ⇡ 1
    m
    m
    X
    k=
    1
    T
    X
    t=
    1
    (
    R(
    T
    )
    k
    b
    )
    r✓ log
    P
    (
    a(
    t
    )
    | a(
    t
    1):1
    ;

    )

    View Slide

  8. 層の幅を変更するNet2Net: Net2WiderNet
    8
    • Net2Netの枠組み:出力を保つ構造変換
    z
    x y
    a b
    z
    y y’
    b
    x
    a
    Original Network Transformed Network (Wider)
    Net2WiderNet
    Operator
    ②重みをコピー
    ③重みを1/2
    ①コピー元のユニット
    をランダム選択
    変換後も出力zは変わらない
    function-preserving transformation
    ④変換後のネットワークを学習

    View Slide

  9. 層の深さを変更するNet2Net: Net2DeeperNet
    9
    z
    x y
    a b
    z
    y’
    b
    x’
    a
    Original Network Transformed Network (Deeper)
    Net2DeeperNet
    Operator
    ①コピー元の層を
    ランダム選択
    変換後も出力zは変わらない
    ③変換後のネットワークを学習
    y
    x
    ②新しい重みを
    単位行列で初期化

    View Slide

  10. Efficient Architecture Search (EAS)
    10
    • 本論文の著者らがAAAI-18で提案した構造探索手法
    • ネットワーク変換:レイヤー挿入,ユニット数の増加
    • この論文ではスキップ結合なしのCNNと
    DenseNet[Huang et al., 2017]を基にした空間を探索
    • どのようなネットワーク変換操作を取るべきかを
    強化学習で探索
    ネットワーク変換 (Net2Net)+
    強化学習でパラメータ空間を探索
    Efficient Architecture Search [Cai et al., AAAI-2018]

    View Slide

  11. Efficient Architecture Search (EAS)
    11
    • EASの概要:
    Encoder
    Network
    Network Transformation
    Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
    CONV(32,3,1) POOL(2,2) CONV(64,5,1) POOL(2,2) FC(64)
    Net2Wider
    Actor Network
    ...
    Layer Embedding
    Update the network
    Multiple
    Actor Networks
    Network Transformation
    Layer Embedding Layer Embedding Layer Embedding Layer Embedding
    Net2Deeper
    Actor Network
    Figure 1: Overview of the RL based meta-controller in EAS,
    which consists of an encoder network for encoding the ar-
    chitecture and multiple separate actor networks for taking
    network transformation actions.
    Encoder
    Network
    Net2Wider
    Actor
    Probability of
    Widening the Layer
    Sigmoid
    Classifier
    Sigmoid
    Classifier
    Sigmoid
    Classifier
    Decision of
    Widening the Layer
    Bi-LSTM Bi-LSTM
    Block
    Index
    Layer
    Index
    Filter
    Size
    Stride
    Parameters of New Layer
    if Applicable (CNN as example)

    Initial State
    Encoder
    Network
    Net2Deeper
    Actor
    Bi-LSTM
    ①層の情報を全結合NNで低次元特徴に変換
    ②Bi-directional LSTMで構造全体をEncoding
    ③出力にSigmoidを適用し,ユニットを増やすか決定
    ④Decoder Netを用いて,追加する層の構造を決定

    View Slide

  12. Method | 提案手法
    12

    View Slide

  13. Path-Level Network Transformation
    13
    • ある層と等価な機能を持つ枝分かれたしたモジュール
    (= function-preservingを満たす構造)を考える
    • 通常の畳み込み層は以下のように表現可能
    Path-Level Network Transformation
    x
    C
    (
    x
    )
    x
    Replication
    C
    (
    x
    )
    x
    x
    Concat
    x
    Replication
    C
    (
    x
    )
    x
    x
    0.5 0.5
    Add
    Figure 1. Convolution layer and its equivalent multi-branch motifs.
    通常の畳み込み層
    Path-Level Network Transformation for Efficient Arc
    x
    C
    (
    x
    )
    x
    Replication
    C
    (
    x
    )
    x
    x
    Concat
    x
    Replication
    C
    (
    x
    )
    x
    x
    0.5 0.5
    Add
    Figure 1. Convolution layer and its equivalent multi-branch motifs.
    x
    Identity
    x
    x
    0.5
    Identi
    Figure 2. Identity
    Path-Level Network Transformation for Effi
    x
    C
    (
    x
    )
    x
    Replication
    C
    (
    x
    )
    x
    x
    Concat
    x
    Replication
    C
    (
    x
    )
    x
    x
    0.5 0.5
    Add
    Figure 1. Convolution layer and its equivalent multi-branch motifs.
    x
    Identity
    x
    Figure 2
    Path-Level Network Transformation for Efficien
    x
    C
    (
    x
    )
    x
    Replication
    C
    (
    x
    )
    x
    x
    Concat
    x
    Replication
    C
    (
    x
    )
    x
    x
    0.5 0.5
    Add
    Figure 1. Convolution layer and its equivalent multi-branch motifs.
    x
    Identity
    x
    Figure 2. Ide
    加算による等価表現 concatによる等価表現
    値をコピー
    0.5倍 0.5倍

    View Slide

  14. Path-Level Network Transformation
    14
    • 恒等関数(=なにも処理しない層)も同様に
    枝分かれさせた構造で表現することができる
    ormation for Efficient Architecture Search
    x
    motifs.
    x
    Identity
    x
    x
    Replication
    x
    x
    0.5 0.5
    Add
    Identity Identity
    x
    Split
    Identity Identity
    x
    Concat
    x1 x2
    x
    : [
    x1, x2]
    Figure 2. Identity layer and its equivalent multi-branch motifs.
    0.5倍 0.5倍

    View Slide

  15. Path-Level Network Transformation
    15
    • 畳み込み層・恒等関数に対する等価表現を用いて
    何度も枝分かれさせていく
    • 枝の恒等関数を畳み込み層に変更することで
    Net2DeeperNetを実現
    Path-Level Network Transformation for Efficient Architecture Se
    x
    C
    (
    x
    )
    C(·)
    C(·)
    x
    Replication
    x
    x
    0.5
    Add
    C(·)
    C(·) C(·)
    C(·)
    0.5
    x
    Replication
    x
    x
    Add
    Split
    Iden
    tity
    Concat
    C(·)
    C(·)
    C(·)
    C(·)
    Iden
    tity
    Concat
    x
    Replication
    x
    x
    Add
    Split
    Sep
    3x3
    Concat
    C(·)
    C(·)
    C(·)
    C(·)
    Iden
    tity
    Split
    ) ) ) )
    C
    (
    x
    )
    C
    (
    x
    )
    C
    (
    x
    )
    (a) (b) (c) (d)
    e 3. An illustration of transforming a single layer to a tree-structured motif via path-level transfo
    DeeperNet operation to replace an identity mapping with a 3 ⇥ 3 depthwise-separable convolut
    Path-Level Network Transformation for Efficient Architecture Search
    x
    C
    (
    x
    )
    C(·)
    C(·)
    x
    Replication
    x
    x
    0.5
    Add
    C(·)
    C(·) C(·)
    C(·)
    0.5
    x
    Replication
    x
    x
    Add
    Split
    Iden
    tity
    Concat
    C(·)
    C(·)
    C(·)
    C(·)
    Iden
    tity
    Concat
    x
    Replication
    x
    x
    Add
    Split
    Sep
    3x3
    Concat
    C(·)
    C(·)
    C(·)
    C(·)
    Iden
    tity
    Split
    x
    Replication
    Add
    Split
    Concat
    C(·) C(·)
    Sep
    3x3
    Iden
    tity
    Leaf
    ) ) ) )
    C
    (
    x
    )
    C
    (
    x
    )
    C
    (
    x
    )
    (a) (b) (c) (d)
    Figure 3. An illustration of transforming a single layer to a tree-structured motif via path-level transformation operations, where we apply
    Net2DeeperNet operation to replace an identity mapping with a 3 ⇥ 3 depthwise-separable convolution in (c).
    3x3 depth-wise
    sep. convへ変更
    恒等関数を…

    View Slide

  16. Tree-Structured Architecture Space
    16
    • 変換によって得られた構造は
    木で表現される
    • 入力の特徴マップxに対する
    ノードの出力N(x)を以下で
    定義:
    Path-Level Network Transformation for Efficient Architecture Search
    x
    C
    (
    x
    )
    C(·)
    C(·)
    x
    Replication
    x
    x
    0.5
    Add
    C(·)
    C(·) C(·)
    C(·)
    0.5
    x
    Replication
    x
    x
    Add
    Split
    Iden
    tity
    Concat
    C(·)
    C(·)
    C(·)
    C(·)
    Iden
    tity
    Concat
    x
    Replication
    x
    x
    Add
    Split
    Sep
    3x3
    Concat
    C(·)
    C(·)
    C(·)
    C(·)
    Iden
    tity
    Split
    x
    Replication
    Add
    Split
    Concat
    C(·) C(·)
    Sep
    3x3
    Iden
    tity
    Leaf
    ) ) ) )
    C
    (
    x
    )
    C
    (
    x
    )
    C
    (
    x
    )
    (a) (b) (c) (d)
    llustration of transforming a single layer to a tree-structured motif via path-level transformation operations, wher
    Net operation to replace an identity mapping with a 3 ⇥ 3 depthwise-separable convolution in (c).
    mply applying the above transformations does
    non-trivial path topology modifications. How-
    ombined with Net2Net operations, we are able
    ally change the path topology, as shown in Fig-
    example, we can insert different numbers and
    rs into each branch by applying Net2DeeperNet
    leaf nodes, and finally aggregated in mirror from
    nodes to the root node in a bottom-up manner to p
    final output feature map.
    Notice that the tree-structured architecture space
    full architecture space that can be achieved with
    s section, we describe the tree-structured architecture
    that can be explored with path-level network transfor-
    n operations as illustrated in Figure 3.
    -structured architecture consists of edges and nodes,
    at each node (except leaf nodes) we have a specific
    nation of the allocation scheme and the merge scheme,
    he node is connected to each of its child nodes via
    ge that is defined as a primitive operation such as
    lution, pooling, etc. Given the input feature map
    x
    ,
    utput of node
    N
    (·), with
    m
    child nodes denoted as
    )} and
    m
    corresponding edges denoted as {
    Ei(·)}, is
    d recursively based on the outputs of its child nodes:
    zi =
    allocation
    (
    x, i
    )
    ,
    yi
    =
    N
    c
    i
    (
    Ei(
    zi))
    ,
    1 
    i

    m,
    (1)
    N
    (
    x
    ) =
    merge
    (
    y1,
    · · ·
    , ym
    )
    ,
    allocation
    (
    x, i
    ) denotes the allocated feature map
    To apply
    architectu
    the set of
    primitive
    the alloc
    the merge
    primitive
    2017; Liu
    of layers:
    • 1 ⇥
    • Iden
    • 3 ⇥
    • 5 ⇥
    • 7 ⇥
    • 3 ⇥
    • 3 ⇥
    i番目のノードに対する
    データの割り当て
    子ノードEi
    にデータを渡して
    出力yi
    を得る
    子ノードの出力を全て結合し
    そのノードの出力とする

    View Slide

  17. Tree-Structured Architecture Space
    17
    • allocationやmargeで行う操作も選択
    • allocation:複製 (replication),チャンネル分割(split)
    • marge:加算(add),結合(concatenation)
    • 既存の層や恒等関数は以下の層から選択して置換
    • 1x1 convolution
    • Identity
    • 3x3 or 5x5 or 7x7 depthwise-separable convolution
    • 3x3 average pooling
    • 3x3 max pooling
    パラメータを持たないので,
    function-preservingが満たせない
    蒸留によって重みを調整
    (追加コストはごくわずか)

    View Slide

  18. Architecture Search with Path-Level
    Operations
    18
    • 今回,構造は木によって表現されていることから
    構造を生成するControllerとしてTree-LSTMを採用
    • Tree-LSTMは子ノードを入力として親ノードに状態を
    受け渡していく (Bottom-up)
    • さらに親ノード+他の子ノードの隠れ状態を入力として
    子ノードに状態を受け渡す (Top-down)
    Path-Level Network Transformation for Efficient Architecture
    h3, c3
    0
    0
    h1, c1
    h, c
    h0, c0
    h2, c2
    0
    (a) Bottom-up
    h3, c3
    h2, c2
    ˜
    h0
    1
    , ˜
    c0
    1
    ˜
    h1, ˜
    c1
    ˜
    hP, ˜
    cP
    (b) Top-down
    Leaf
    e e1
    e2
    e3
    (a)
    Figure 5. Illustration of tr
    edges. (a) The meta-contr
    leaf child node to have mu
    and branch number are pre

    View Slide

  19. Architecture Search with Path-Level
    Operations
    19
    • Tree-LSTMの出力にSoftmax関数を通すことで
    ノードやオペレーションの種類を選択していく
    1. 1つの子ノードしか持っていない
    親ノードに対しmarge操作を選択
    • marge: add, concatenation,
    none(何もしない)
    2. 親ノードごとに層を追加するかを決定
    3. Identityの部分を畳み込み層or
    プーリング層に置換
    Path-Level Network Transformation for Efficient Architecture
    h3, c3
    0
    0
    h1, c1
    h, c
    h0, c0
    h2, c2
    0
    (a) Bottom-up
    h3, c3
    h2, c2
    ˜
    h0
    1
    , ˜
    c0
    1
    ˜
    h1, ˜
    c1
    ˜
    hP, ˜
    cP
    (b) Top-down
    Figure 4. Calculation procedure of bottom-up and top-down hidden
    states.
    3.3. Architecture Search with Path-Level Operations
    Leaf
    e e1
    e2
    e3
    (a)
    Figure 5. Illustration of tra
    edges. (a) The meta-contr
    leaf child node to have mu
    and branch number are pred
    new leaf node to be the chil
    are connected with an iden
    replaces an identity mappin
    from the set of possible pri
    evel Network Transformation for Efficient Architecture Search
    ˜
    hP, ˜
    cP
    Leaf
    e e1
    e2
    e3
    e
    Identity
    e
    Iden
    tity e
    Path-Level Network Transformation for Efficient Architecture Search
    h3, c3
    0
    h, c
    0
    2
    om-up
    h3, c3
    h2, c2
    ˜
    h0
    1
    , ˜
    c0
    1
    ˜
    h1, ˜
    c1
    ˜
    hP, ˜
    cP
    (b) Top-down
    Leaf
    e e1
    e2
    e3
    (a)
    e
    Identity
    e
    (b)
    Figure 5. Illustration of transformation decision
    edges. (a) The meta-controller transforms a nod
    leaf child node to have multiple child nodes. Bo

    View Slide

  20. Meta-Controller Training Procedure
    20
    Path-Level Network Transformation for Efficient Architecture Search
    D. Meta-Controller Training Procedure
    Algorithm 1 Path-Level Efficient Architecture Search
    Input: base network
    baseNet
    , training set
    trainSet
    , validation
    set
    valSet
    , batch size
    B
    , maximum number of networks
    M
    1:
    trained
    = 0 // Number of trained networks
    2:
    Pnets = [] // Store results of trained networks
    3: randomly initialize the meta-controller
    C
    4:
    Gc = [] // Store gradients to be applied to
    C
    5: while
    trained < M
    do
    6: meta-controller
    C
    samples a tree-structured
    cell
    7: if
    cell
    in
    Pnets
    then
    8: get the validation accuracy
    accv
    of
    cell
    from
    Pnets
    9: else
    10: model = train(trans(
    baseNet
    ,
    cell
    ),
    trainSet
    )
    11:
    accv
    = evel(model,
    valSet
    )
    12: add (
    cell
    ,
    accv
    ) to
    Pnets
    13:
    trained
    =
    trained
    + 1
    14: end if
    15: compute gradients according to (
    cell
    ,
    accv
    ) and add to
    Gc
    16: if
    len
    (
    Gc) ==
    B
    then
    17: update
    C
    according to
    Gc
    18:
    Gc = []
    19: end if
    20: end while
    ベース構造のセルを
    サンプリングしたセルに
    置換して学習・評価
    Tree-LSTMから
    セルをサンプリング
    DenceNetのような
    繰り返し構造を持つ
    ベース構造を与える
    勾配を推定し
    Tree-LSTMの
    パラメータを更新

    View Slide

  21. Meta-Controller Training Procedure
    21
    Path-Level Network Transformation for Efficient Architecture Search
    D. Meta-Controller Training Procedure
    Algorithm 1 Path-Level Efficient Architecture Search
    Input: base network
    baseNet
    , training set
    trainSet
    , validation
    set
    valSet
    , batch size
    B
    , maximum number of networks
    M
    1:
    trained
    = 0 // Number of trained networks
    2:
    Pnets = [] // Store results of trained networks
    3: randomly initialize the meta-controller
    C
    4:
    Gc = [] // Store gradients to be applied to
    C
    5: while
    trained < M
    do
    6: meta-controller
    C
    samples a tree-structured
    cell
    7: if
    cell
    in
    Pnets
    then
    8: get the validation accuracy
    accv
    of
    cell
    from
    Pnets
    9: else
    10: model = train(trans(
    baseNet
    ,
    cell
    ),
    trainSet
    )
    11:
    accv
    = evel(model,
    valSet
    )
    12: add (
    cell
    ,
    accv
    ) to
    Pnets
    13:
    trained
    =
    trained
    + 1
    14: end if
    15: compute gradients according to (
    cell
    ,
    accv
    ) and add to
    Gc
    16: if
    len
    (
    Gc) ==
    B
    then
    17: update
    C
    according to
    Gc
    18:
    Gc = []
    19: end if
    20: end while
    ベース構造のセルを
    サンプリングしたセルに
    置換して学習・評価
    Tree-LSTMから
    セルをサンプリング
    DenceNetのような
    繰り返し構造を持つ
    ベース構造を与える
    勾配を推定し
    Tree-LSTMの
    パラメータを更新

    View Slide

  22. Meta-Controller Training Procedure
    22
    Path-Level Network Transformation for Efficient Architecture Search
    D. Meta-Controller Training Procedure
    Algorithm 1 Path-Level Efficient Architecture Search
    Input: base network
    baseNet
    , training set
    trainSet
    , validation
    set
    valSet
    , batch size
    B
    , maximum number of networks
    M
    1:
    trained
    = 0 // Number of trained networks
    2:
    Pnets = [] // Store results of trained networks
    3: randomly initialize the meta-controller
    C
    4:
    Gc = [] // Store gradients to be applied to
    C
    5: while
    trained < M
    do
    6: meta-controller
    C
    samples a tree-structured
    cell
    7: if
    cell
    in
    Pnets
    then
    8: get the validation accuracy
    accv
    of
    cell
    from
    Pnets
    9: else
    10: model = train(trans(
    baseNet
    ,
    cell
    ),
    trainSet
    )
    11:
    accv
    = evel(model,
    valSet
    )
    12: add (
    cell
    ,
    accv
    ) to
    Pnets
    13:
    trained
    =
    trained
    + 1
    14: end if
    15: compute gradients according to (
    cell
    ,
    accv
    ) and add to
    Gc
    16: if
    len
    (
    Gc) ==
    B
    then
    17: update
    C
    according to
    Gc
    18:
    Gc = []
    19: end if
    20: end while
    ベース構造のセルを
    サンプリングしたセルに
    置換して学習・評価
    Tree-LSTMから
    セルをサンプリング
    DenceNetのような
    繰り返し構造を持つ
    ベース構造を与える
    勾配を推定し
    Tree-LSTMの
    パラメータを更新

    View Slide

  23. Meta-Controller Training Procedure
    23
    Path-Level Network Transformation for Efficient Architecture Search
    D. Meta-Controller Training Procedure
    Algorithm 1 Path-Level Efficient Architecture Search
    Input: base network
    baseNet
    , training set
    trainSet
    , validation
    set
    valSet
    , batch size
    B
    , maximum number of networks
    M
    1:
    trained
    = 0 // Number of trained networks
    2:
    Pnets = [] // Store results of trained networks
    3: randomly initialize the meta-controller
    C
    4:
    Gc = [] // Store gradients to be applied to
    C
    5: while
    trained < M
    do
    6: meta-controller
    C
    samples a tree-structured
    cell
    7: if
    cell
    in
    Pnets
    then
    8: get the validation accuracy
    accv
    of
    cell
    from
    Pnets
    9: else
    10: model = train(trans(
    baseNet
    ,
    cell
    ),
    trainSet
    )
    11:
    accv
    = evel(model,
    valSet
    )
    12: add (
    cell
    ,
    accv
    ) to
    Pnets
    13:
    trained
    =
    trained
    + 1
    14: end if
    15: compute gradients according to (
    cell
    ,
    accv
    ) and add to
    Gc
    16: if
    len
    (
    Gc) ==
    B
    then
    17: update
    C
    according to
    Gc
    18:
    Gc = []
    19: end if
    20: end while
    ベース構造のセルを
    サンプリングしたセルに
    置換して学習・評価
    Tree-LSTMから
    セルをサンプリング
    DenceNetのような
    繰り返し構造を持つ
    ベース構造を与える
    勾配を推定し
    Tree-LSTMの
    パラメータを更新

    View Slide

  24. Experimental Details
    24
    • CIFAR-10とImageNetで評価
    • ImageNetはCIFAR-10で得られたセルを使用
    • CIFAR-10はデータ拡張と標準化による前処理を使用
    • ミラーリング,シフト+チャンネルごとの標準化
    LSTMに関する設定
    ユニット数 100
    Optimizer Adam
    勾配推定 REINFORCE
    バッチサイズ 10
    ベースライン 指数移動平均
    エントロピー正則化 0.01
    構造の評価値 tan(0.5π * 精度)
    CNNに関する設定
    エポック数
    ※重みは継承される
    20
    Optimizer SGD +
    Nestrov
    momentum
    バッチサイズ 64
    Weight Decay 0.0001

    View Slide

  25. Experimental Details
    25
    • ベース構造としてはDenseNet-BCを使用
    • ブロック数16,ブロック内は3x3 Conv 2層,
    Growth rate(3x3 Convの出力チャンネル数)16
    • Group Convolutionを使用
    • 構造探索中はDenseBlock内の3x3 Convを
    サンプリングされたセルに置換してモデルを学習
    • 構造探索が終わったあとにDenseNet・PyramidNetの
    畳み込み層を学習済みのセルに置換し学習・評価
    • 学習の設定は構造探索中と変更なし
    • 300 or 600エポック学習後のテストエラーが
    最終的なスコア

    View Slide

  26. Path-Level Network Transformation for Efficient Architecture Search
    Table 1. Test error rate (%) results of our best discovered architectures as well as state-of-the-art human-designed and automatically
    designed architectures on CIFAR-10. If “Reg” is checked, additional regularization techniques (e.g., Shake-Shake (Gastaldi, 2017),
    DropPath (Zoph et al., 2017) and Cutout (DeVries & Taylor, 2017)), along with a longer training schedule (600 epochs or 1800 epochs)
    are utilized when training the networks.
    Model Reg Params Test error
    Human
    designed
    ResNeXt-29 (16 ⇥ 64d) (Xie et al., 2017)
    DenseNet-BC (
    N
    = 31
    , k
    = 40) (Huang et al., 2017b)
    PyramidNet-Bottleneck (
    N
    = 18
    , ↵
    = 270) (Han et al., 2017)
    PyramidNet-Bottleneck (
    N
    = 30
    , ↵
    = 200) (Han et al., 2017)
    ResNeXt + Shake-Shake (1800 epochs) (Gastaldi, 2017)
    ResNeXt + Shake-Shake + Cutout (1800 epochs) (DeVries & Taylor, 2017)
    X
    X
    68.1M
    25.6M
    27.0M
    26.0M
    26.2M
    26.2M
    3.58
    3.46
    3.48
    3.31
    2.86
    2.56
    Auto
    designed
    EAS (plain CNN) (Cai et al., 2018)
    Hierarchical (
    c0 = 128) (Liu et al., 2018)
    Block-QNN-A (
    N
    = 4) (Zhong et al., 2017)
    NAS v3 (Zoph & Le, 2017)
    NASNet-A (6, 32) + DropPath (600 epochs) (Zoph et al., 2017)
    NASNet-A (6, 32) + DropPath + Cutout (600 epochs) (Zoph et al., 2017)
    NASNet-A (7, 96) + DropPath + Cutout (600 epochs) (Zoph et al., 2017)
    X
    X
    X
    23.4M
    -
    -
    37.4M
    3.3M
    3.3M
    27.6M
    4.23
    3.63
    3.60
    3.65
    3.41
    2.65
    2.40
    Ours
    TreeCell-B with DenseNet (
    N
    = 6
    , k
    = 48
    , G
    = 2)
    TreeCell-A with DenseNet (
    N
    = 6
    , k
    = 48
    , G
    = 2)
    TreeCell-A with DenseNet (
    N
    = 16
    , k
    = 48
    , G
    = 2)
    TreeCell-B with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2) + DropPath (600 epochs)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2) + DropPath + Cutout (600 epochs)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 150
    , G
    = 2) + DropPath + Cutout (600 epochs)
    X
    X
    X
    3.2M
    3.2M
    13.1M
    5.6M
    5.7M
    5.7M
    5.7M
    14.3M
    3.71
    3.64
    3.35
    3.40
    3.14
    2.99
    2.49
    2.30
    Results on CIFAR-10
    26

    View Slide

  27. Path-Level Network Transformation for Efficient Architecture Search
    Table 1. Test error rate (%) results of our best discovered architectures as well as state-of-the-art human-designed and automatically
    designed architectures on CIFAR-10. If “Reg” is checked, additional regularization techniques (e.g., Shake-Shake (Gastaldi, 2017),
    DropPath (Zoph et al., 2017) and Cutout (DeVries & Taylor, 2017)), along with a longer training schedule (600 epochs or 1800 epochs)
    are utilized when training the networks.
    Model Reg Params Test error
    Human
    designed
    ResNeXt-29 (16 ⇥ 64d) (Xie et al., 2017)
    DenseNet-BC (
    N
    = 31
    , k
    = 40) (Huang et al., 2017b)
    PyramidNet-Bottleneck (
    N
    = 18
    , ↵
    = 270) (Han et al., 2017)
    PyramidNet-Bottleneck (
    N
    = 30
    , ↵
    = 200) (Han et al., 2017)
    ResNeXt + Shake-Shake (1800 epochs) (Gastaldi, 2017)
    ResNeXt + Shake-Shake + Cutout (1800 epochs) (DeVries & Taylor, 2017)
    X
    X
    68.1M
    25.6M
    27.0M
    26.0M
    26.2M
    26.2M
    3.58
    3.46
    3.48
    3.31
    2.86
    2.56
    Auto
    designed
    EAS (plain CNN) (Cai et al., 2018)
    Hierarchical (
    c0 = 128) (Liu et al., 2018)
    Block-QNN-A (
    N
    = 4) (Zhong et al., 2017)
    NAS v3 (Zoph & Le, 2017)
    NASNet-A (6, 32) + DropPath (600 epochs) (Zoph et al., 2017)
    NASNet-A (6, 32) + DropPath + Cutout (600 epochs) (Zoph et al., 2017)
    NASNet-A (7, 96) + DropPath + Cutout (600 epochs) (Zoph et al., 2017)
    X
    X
    X
    23.4M
    -
    -
    37.4M
    3.3M
    3.3M
    27.6M
    4.23
    3.63
    3.60
    3.65
    3.41
    2.65
    2.40
    Ours
    TreeCell-B with DenseNet (
    N
    = 6
    , k
    = 48
    , G
    = 2)
    TreeCell-A with DenseNet (
    N
    = 6
    , k
    = 48
    , G
    = 2)
    TreeCell-A with DenseNet (
    N
    = 16
    , k
    = 48
    , G
    = 2)
    TreeCell-B with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2) + DropPath (600 epochs)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 84
    , G
    = 2) + DropPath + Cutout (600 epochs)
    TreeCell-A with PyramidNet (
    N
    = 18
    , ↵
    = 150
    , G
    = 2) + DropPath + Cutout (600 epochs)
    X
    X
    X
    3.2M
    3.2M
    13.1M
    5.6M
    5.7M
    5.7M
    5.7M
    14.3M
    3.71
    3.64
    3.35
    3.40
    3.14
    2.99
    2.49
    2.30
    Results on CIFAR-10
    27
    200 GPU-hoursで48,000 GPU-hoursの
    NAS-Net(2.4%)を超える精度(2.3%)を達成

    View Slide

  28. Path-Level Network Transformation
    x
    Replication
    Add
    Replication
    Add
    Replication
    Add
    Replication
    Add
    GroupConv
    3x3
    GroupConv
    3x3
    Sep
    5x5
    Sep
    5x5
    Split
    Concat
    Replication
    Add
    Replication
    Add
    Replication
    Add
    Sep
    3x3
    Max
    3x3
    Conv
    1x1
    Avg
    3x3
    Sep
    7x7
    Sep
    3x3
    Sep
    5x5
    Avg
    3x3
    Avg
    3x3
    Avg
    3x3
    Sep
    3x3
    Leaf
    Results on CIFAR-10
    28
    • 最終的に得られたセルの構造
    Replication
    Add
    Replication
    Add
    Replication
    Add
    Sep
    5x5
    Sep
    5x5
    Split
    Concat
    Replication
    Add
    Replication
    Add
    Replication
    Add
    Sep
    3x3
    Max
    3x3
    Conv
    1x1
    Avg
    3x3
    Sep
    7x7
    Sep
    3x3
    Sep
    5x5
    Avg
    3x3
    Avg
    3x3
    Avg
    3x3
    Sep
    3x3
    Figure 6. Detailed structure of the best discovered cell on CIFAR-
    10 (TreeCell-A). “GroupConv” denotes the group convolution;
    “Conv” denotes the normal convolution; “Sep” denotes the
    depthwise-separable convolution; “Max” denotes the max pooling;
    “Avg” denotes the average pooling.
    2. For a node that is a leaf node, the meta-controller de-
    termines whether to expand the node, i.e. insert a new
    leaf node to be the child node of this node and connect
    them with identity mapping, which increases the depth
    of the architecture (Figure 5b).
    3. For an identity edge, the meta-controller chooses a new
    edge (can be identity) from the set of possible primitive
    operations (Section 3.2) to replace the identity edge
    (Figure 5c). Also this decision will only be made once
    for each edge.
    4. Experiments and Results
    Our experimental setting1 resembles Zoph et al. (2017),
    Zhong et al. (2017) and Liu et al. (2018). Specifically,
    we apply the proposed method described above to learn
    Figure 7. Progress of the architecture search process and compari-
    son between RL and random search (RS) on CIFAR-10.
    state size of all LSTM units is 100 and we train it with the
    ADAM optimizer (Kingma & Ba, 2014) using the REIN-
    FORCE algorithm (Williams, 1992). To reduce variance, we
    adopt a baseline function which is an exponential moving
    average of previous rewards with a decay of 0.95, as done
    in Cai et al. (2018). We also use an entropy penalty with a
    weight of 0.01 to ensure exploration.
    At each step in the architecture search process, the meta-
    controller samples a tree-structured cell by taking trans-
    formation actions starting with a single layer in the base
    network. For example, when using a DenseNet as the base
    network, after the transformations, all 3 ⇥ 3 convolution
    layers in the dense blocks are replaced with the sampled
    tree-structured cell while all the others remain unchanged.
    The obtained network, along with weights transferred from
    the base network, is then trained for 20 epochs on CIFAR-10
    with an initial learning rate of 0.035 that is further annealed
    with a cosine learning rate decay (Loshchilov & Hutter,
    2016), a batch size of 64, a weight decay of 0.0001, using
    the SGD optimizer with a Nesterov momentum of 0.9. The
    validation accuracy
    accv
    of the obtained network is used
    to compute a reward signal. We follow Cai et al. (2018)
    and use the transformed value, i.e.
    tan
    (
    accv

    ⇡/
    2), as

    View Slide

  29. nsformation for Efficient Architecture Search
    ication
    Add
    p
    Sep
    3x3
    Leaf
    Figure 7. Progress of the architecture search process and compari-
    son between RL and random search (RS) on CIFAR-10.
    Results on CIFAR-10
    29
    • 構造が学習されていることを示すためにランダム
    サーチと比較
    ation for Efficient Architecture Search
    f
    Figure 7. Progress of the architecture search process and compari-

    View Slide

  30. Results on ImageNet
    30
    • ImageNetはCIFAR-10で得られたセルを使用
    • 乗算・加算の回数が600M回以下のMobile設定で
    Top-1,Top-5で比較
    • NAS-Net以上のTop-1, Top-5精度を達成
    rmation for Efficient Architecture Search
    more,
    ame-
    meter
    oved
    com-
    ined
    with
    eves
    indi-
    ncor-
    e the
    Table 2. Top-1 (%) and Top-5 (%) classification error rate results
    on ImageNet in the
    Mobile
    Setting ( 600M multiply-add opera-
    tions). “⇥+” denotes the number of multiply-add operations.
    Model ⇥+ Top-1 Top-5
    1.0 MobileNet-224 (Howard et al., 2017) 569M 29.4 10.5
    ShuffleNet 2x (Zhang et al., 2017) 524M 29.1 10.2
    CondenseNet (
    G1 =
    G3 = 8) (Huang et al., 2017a) 274M 29.0 10.0
    CondenseNet (
    G1 =
    G3 = 4) (Huang et al., 2017a) 529M 26.2 8.3
    NASNet-A (
    N
    = 4) (Zoph et al., 2017) 564M 26.0 8.4
    NASNet-B (
    N
    = 4) (Zoph et al., 2017) 448M 27.2 8.7
    NASNet-C (
    N
    = 3) (Zoph et al., 2017) 558M 27.5 9.0
    TreeCell-A with CondenseNet (
    G1 = 4
    , G3 = 8) 588M 25.5 8.0
    TreeCell-B with CondenseNet (
    G1 = 4
    , G3 = 8) 594M 25.4 8.1
    Network Transformation for Efficient Architecture Search
    eters. Furthermore,
    number of parame-
    mproved parameter
    lts to the improved
    path topology com-
    ls. When combined
    .14% test error with
    ramidNet achieves
    rs, which also indi-
    efficiency by incor-
    midNets. Since the
    he start point rather
    he transferability of
    ectures.
    Table 2. Top-1 (%) and Top-5 (%) classification error rate results
    on ImageNet in the
    Mobile
    Setting ( 600M multiply-add opera-
    tions). “⇥+” denotes the number of multiply-add operations.
    Model ⇥+ Top-1 Top-5
    1.0 MobileNet-224 (Howard et al., 2017) 569M 29.4 10.5
    ShuffleNet 2x (Zhang et al., 2017) 524M 29.1 10.2
    CondenseNet (
    G1 =
    G3 = 8) (Huang et al., 2017a) 274M 29.0 10.0
    CondenseNet (
    G1 =
    G3 = 4) (Huang et al., 2017a) 529M 26.2 8.3
    NASNet-A (
    N
    = 4) (Zoph et al., 2017) 564M 26.0 8.4
    NASNet-B (
    N
    = 4) (Zoph et al., 2017) 448M 27.2 8.7
    NASNet-C (
    N
    = 3) (Zoph et al., 2017) 558M 27.5 9.0
    TreeCell-A with CondenseNet (
    G1 = 4
    , G3 = 8) 588M 25.5 8.0
    TreeCell-B with CondenseNet (
    G1 = 4
    , G3 = 8) 594M 25.4 8.1

    View Slide

  31. Conclusion
    31
    • ある層に対してfunction-preservingを満たす構造の
    変更方法を提案
    • 構造を木構造で表現し,強化学習とTree-LSTMを
    用いて最適な構造を学習
    • 学習したセルをPyramidNetに適用することで,
    少ない計算コスト(200 GPU-hours)でNAS-Netを
    上回る性能を持つ構造を得ることができた
    • Future Work: モデル圧縮の手法と組み合わせて
    よりコンパクトなモデルを探索する

    View Slide

  32. 感想
    32
    • DenseNet,PyramidNetのように大域的な構造を
    ベース構造として与える必要がある
    • DenseNet,PyramidNetから外れた新しい構造を
    生成できないのでは
    • 類似した構造の中で良い構造を見つける
    「局所的な」構造の最適化という印象
    • ENASと比べるとまだ計算コストが高い
    • 木構造によるエンコーディングスキームを用いれば
    Genetic Programmingによって構造最適化ができそう
    • ノードをグリッド上に配置するCartesian GPによる
    CNNの構造最適化は既に先行事例がある
    [Suganuma et al., 2017] [Suganuma et al., 2018]

    View Slide

  33. References
    33
    • [Pham et al., 2018] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean,
    “Efficient Neural Architecture Search via Parameter Sharing,” in
    Proceedings of the 35th International Conference on Machine
    Learning, pp. 4092-4101, 2018.
    • [Liu et al., 2018] H. Liu, K. Simonyan, and Y. Yang, “DARTS:
    Differentiable Architecture Search,” in preprint
    arXiv:1806.09055v1, 2018.
    • [Chen et al. 2016] T. Chen, I. Goodfellow, and J. Shlens, “Net2Net:
    Accelerating Learning via Knowledge Transfer,” in Proceedings of
    4th International Conference on Learning Representations (ICLR’16),
    2016.
    • [Zoph & Le, 2017] B. Zoph and Q. V Le, “Neural Architecture Search
    with Reinforcement Learning,” in Proceedings of 5th International
    Conference on Learning Representations (ICLR'17), 2017.
    • [Real et al., 2017] E. Real, S. Moore, A. Selle, S. Saxena, Y. L.
    Suematsu, J. Tan, Q. Le, and A. Kurakin, “Large-Scale Evolution of
    Image Classifiers,” in Proceedings of the 34th International
    Conference on Machine Learning, vol. PMLR 70, pp. 2902–2911, Mar.
    2017.

    View Slide

  34. References
    34
    • [Huang et al., 2017] G. Huang, Z. Liu, L. v. d. Maaten, and K. Q.
    Weinberger, “Densely Connected Convolutional Networks,” in 2017 IEEE
    Conference on Computer Vision and Pattern Recognition (CVPR), 2017,
    pp. 2261–2269.
    • [Baker et al., 2017] B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing
    Neural Network Architectures Using Reinforcement Learning,” in
    Proceedings of 5th International Conference on Learning
    Representations (ICLR’17), 2017.
    • [Cai et al. AAAI-18] H. Cai, T. Chen, W. Zhang, Y. Yu, and J. Wan,
    “Efficient Architecture Search by Network Transformation,” in Thirty-
    Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018.
    • [Suganuma et al., 2017] M. Suganuma, S. Shirakawa, and T. Nagao, “A
    genetic programming approach to designing convolutional neural
    network architectures,” in Proceedings of the Genetic and Evolutionary
    Computation Conference on - GECCO ’17, 2017, pp. 497–504.
    • [Suganuma et al., 2018] M. Suganuma, M. Ozay, and T. Okatani,
    “Exploiting the Potential of Standard Convolutional Autoencoders for
    Image Restoration by Evolutionary Search,” in Proceedings of the 35th
    International Conference on Machine Learning, PMLR 80:4778-4787,
    2018.

    View Slide