DARTS: Differentiable Architecture Search

Slide 1

Slide 1 text

%BJDIJ)PSJUB %"354%JGGFSFOUJBCMF "SDIJUFDUVSF4FBSDI )BOYJBP-JV ,BSFO4JNPOZBO :JNJOH:BOH *$-3

Slide 2

Slide 2 text

$POUFOUT w *OUSPEVDUJPOPG/FVSBM"SDIJUFDUVSF4FBSDI w 1SPQPTFE.FUIPE w &YQFSJNFOUT w $PODMVTJPO

Slide 3

Slide 3 text

Introduction of Neural Architecture Search(1/2) What is Neural Architecture Search? • Neural networks and hyper-parameters are still hard to design • Related works • Hyper-parameter optimization[Saxena+, NIPS16], [Bergstra+, JMLR12] • Not architecture-level • Evolution algorithms[Stanley+, Artificial Life 09], [Floreano+, EI08] • Less practical at a large scale Need to construct general framework automatically!

Slide 4

Slide 4 text

Introduction of Neural Architecture Search(2/2) Neural Architecture Search with Reinforcement Learning[Zoph+ ICLR17] • Proposed reinforcement learning based method with a RNN controller • Not differentiable! RNN controller Heavy heavy computational costs!

Slide 5

Slide 5 text

a set of operations, : a weight of a node between i and j, apply operation O : α(i,j) o o(x) : • Objectives Cannot compute… Proposed method(1/) Define Differentiable NAS

Slide 6

Slide 6 text

where Hessian matrix… So computation is heavy If hyper-parameter = 0, then no need to compute! (Discuss later about a classification accuracy) Proposed method(2/) Define Differentiable NAS

Slide 7

Slide 7 text

where Approximate a finite difference O(αw) O(α + w) Proposed method(3/) Define Differentiable NAS

Slide 8

Slide 8 text

Proposed method(4/) Algorithm • Alternately update and Lval Ltrain

Slide 9

Slide 9 text

Experiments(1/) CIFAR-10 classification First order means the hyper-parameter is 0

Slide 10

Slide 10 text

Experiments(2/) ImageNet classification Fastest!

Slide 11

Slide 11 text

Conclusion • Achieve differentiable NAS. • Achieve the fastest time for training and searching. • Available my implementation from https://github.com/UdonDa/DARTS_pytorch • (Only CIFAR-10)