Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Basic behaviors of TensorFlow, Eager and Sessio...

Yukio Okuda
January 31, 2018

Basic behaviors of TensorFlow, Eager and Session.run

A research on basic functionalities of TensorFlow(TF) for general numeric analysis applications not for DML applications. Applying model based data analysis shows basic behaviors of TF and effectiveness of multi-cores on Eager mode, which indicates the possibility of success of Google and Intel activities for Phi and AVX2.

Yukio Okuda

January 31, 2018
Tweet

More Decks by Yukio Okuda

Other Decks in Programming

Transcript

  1. 2 Fortran, PL/I, RPG 360/IBM Bliss, C, X11 VAX/DEC (ADA,

    Objective-C) C++, Octave, Tcl Sparc/Sun CERN-Root PC/Intel JavaScript Python /Nvidia? DL(NN/AI) TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  2. 3 ❶DL ❷Python ▪ Python GIL ➡ • CPU➡ GHz

    ➡ • GPU➡GP-GPU➡CUDA • Spark✈ , TPU,Phi✈ , Nervana✈ , Tegra✈ , ARM, QUALCOMM ▪ Python: ➡ ▪ : • PyCUDA, OpenCL, PySpark • Pandas ➡Dask, Intel-Python • List : Numba • NumPy : TensorFlow(TF), PyTorch, CuPy ▼ TF: ➡ TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  3. 4 ❶ ❷ ❸ ❹ ❺ SG: Static Graph @session.run

    EG: Eager, not Graph TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  4. H/W S/W 5 env-1, env-2,...env-n Python 3.5 Conda 4.3.30 Mint

    Linux(Ubuntu 16.04) CPU + GPU SSH, NFS CPU: i7-2630QM (Sandy Bridge’12 ) 2.4 GHz 4 8 L1=256K, L2=1M, L3=6M PCIe II 5GT/s DDR3 16G 21.3G/s QM77, NF9G(Jetway ) GPU: GTX-1060 (Pascal GP-106) 1.5 GHz 1280 L2=1.5M(192bI/F) PCIe II 5GT/s DDR5 6G 8G/s CUDA-8 CC-6.1 TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  5. Env 6 $ conda create -n XXX python=3.5 $ source

    activate XXX $ pip install YYYYY.whl CPU GPU 1.1 TensorFlow-SG 1.4 1.2 TensorFlow-SG 1.4 1.3 TensorFlow-EG 1.5-dev20171127 1.4 TensorFlow-EG 1.5-dev20171127 1.5 TensorFlow-EG/SG 1.5 AVX Python-2.7 2.1 PyTorch 0.2.0 4 3.1 CuPy 2.2.0 4.1 Numba 0.36.1 5.1 Intel Python TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  6. 7 ▪ 0, 1) shot hit r 0 1 1

    ▪NumPy • SIMD ▪ np. ➡tf. torch. cp.(CuPy) π = 4 · hit/shot import numpy as np def get(size, shot, hit): x = np.random.rand(size) y = np.random.rand(size) r = np.sqrt(np.add( np.multiply(x, x), np.multiply(y, y) one = np.array([1.]*size) lss = np.less equal(r, one) hit one = np.count nonzero(lss) hit += hit one shot += size return shot, hit TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  7. 8 for size in [10**2, ..., 10**6]: shot, hit =

    0, 0 shot, hit = get(size, shot, hit) pi = 4 * hit/ shot rec( pi, shot, size) F  F  F  4J[F    ܦա࣌ؒɹ<ඵ> /1࣌ؒ — — — ઈରޡࠩɹMPH /1ޡࠩ for size in [10**2, ..., 10**6]: for in range(1000): shot, hit = 0, 0 shot, hit = get(size, shot, hit) pi = 4 * hit/ shot rec( pi, shot, size) F  F  F  4J[F      ॲཧ࣌ؒɹ<ඵ> Y Y /1  5'  TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  8. TF 9 CPU 4 6 !! CPU GPU 24 76

    While F  F  F  F  F  F           ॲཧ࣌ؒɹ<ඵ> /Q  &H$QV   4H$QV  4H(QV  4H8IJMF(QV  TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  9. NumPy 10 ▪np. ➡tf. torch. NumPy: np.xxx() x = np

    . random . rand ( s i z e ) . ast ype ( np . f l o a t 3 2 ) y = np . random . rand ( s i z e ) . ast ype ( np . f l o a t 3 2 ) r s = np . s q r t ( np . add ( np . m u l t i p l y ( x , x ) , np . m u l t i p l y ( y , y ) ) ) ones = np . a r r a y ( [ 1 . ] ∗ si ze , dtype=np . f l o a t 3 2 ) l s s = np . l e s s e q u a l ( rs , ones ) h i t o n e = np . count nonzero ( l s s ) 1 TF-Eg: tf.xxx() x = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) y = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) r s = t f . s q r t ( t f . add ( t f . m u l t i p l y ( x , x ) , t f . m u l t i p l y ( y , y ) ) ) ones = t f . ones ( [ s i z e ] , dtype= t f . f l o a t 3 2 ) l s s = t f . l e s s e q u a l ( rs , ones ) h i t o n e = t f . count nonzero ( l ss , dtype= t f . i n t 6 4 ) 8! PyTorch: torch.xxx() x = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) y = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) r s = t f . s q r t ( t f . add ( t f . m u l t i p l y ( x , x ) , t f . m u l t i p l y ( y , y ) ) ) ones = t f . ones ( [ s i z e ] , dtype= t f . f l o a t 3 2 ) l s s = t f . l e s s e q u a l ( rs , ones ) h i t o n e = t f . count nonzero ( l ss , dtype= t f . i n t 6 4 ) 3+ EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  10. Np EG Pt CPU 11 ▪PyTorch(Pt) ▪EG F  F

     F       ॲཧ࣌ؒɹ<ඵ> /Q  1U$QV  &H$QV   ▪EG • 0.5M • float32= 4byte • 2 • 4x3x0.5=6M=L3 ▪ AVX F  F  F      ॲཧ࣌ؒɹ<ඵ>   F &H$QV &H"WY EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  11. CPU 12 ▪ N3150 4.3 (Braswell’15 Celeron/Atom) 1.6-2GH, 4 L1=224K

    L2=2M(1Mx2) F  F  F      ॲཧ࣌ؒɹ<ඵ> /Q  &H$QV  AVX 4.2 3.7 4.6 2 1.6-2.4GH, 4 8 L1=256K, L2=1M, L3=6M F  F  F       ॲཧ࣌ؒɹ<ඵ> /Q  &H$QV   &H"WY  EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  12. EG,SG 13 Eager Static Graph # # def gen(a, loop):

    for in range(loop): a = tf.add(a, 1) return a # # ( ) N = 3 def gen(a): for in range(N): a = tf.add(a, 1) return a ap= tf.placeholder(··) ag = gen(ap) sess = tf.Session() # a = tf.zeros(··) a = gen(a,3) # sess.run(tf.··initializer()) a = np.zeros(··) a = sess.run(ag, feed dict=ap: a) tf.Tensor tf.Variable tf.TensorArray ▼ tf.Variable ▼ tfe.Variable tf.xxx TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  13. EG,SG 14 for in range(N): a = tf.add(a, ) EG

    CPU a a + + DDR➡M-Con.➡CPU➡PCIe➡GPU-M-Bus➡DDR a1 SG CPU a a +1 +2 +N +1 +2 +N a1 a2 aN N N TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  14. EG,SG GPU 15 ▪EG GPU • : =9, (8 +

    8) =.. F  F  F      ॲཧ࣌ؒɹ<ඵ> &H$QV &H(QV ▪SG GPU 3.8 • : = , 0, (2, 2) F  F  F    ॲཧ࣌ؒɹ<ඵ> 4H$QV  4H(QV  EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  15. SG2 16 ▪ Run ap = tf.placeholder(...) aa = tf.add(ap,

    1) sess = tf.Session() sess.run(tf.global variables initializer()) a = np.zeros([10]) for in range(N): a = sess.run(aa, feed dict=ap: a) CPU a ap + + aa ▪ While loop ap = tf.placeholder(...) i, a = tf.while loop( lambda i, a: tf.less(i,N), lambda i, a: (tf.add(i, 1), tf.add(a, 1)), [0, ap]) sess = tf.Session(); sess.run(tf.global variables initializer()) a = sess.run(a, feed dict=ap:..) CPU a0 ap + WL + WL a i a TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  16. SG While loop 17 ▪GPU 3.2 F  F 

    F      ॲཧ࣌ؒɹ<ඵ> 4H(QV  4H8IJMF(QV  ▪CPU 1. F  F  F        ॲཧ࣌ؒɹ<ඵ> 4H$QV  4H8IJMF$QV  SG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  17. EG While loop 18 ▪ SG (SG ) •GPU 0.9

    1.3 F  F  F      ॲཧ࣌ؒɹ<ඵ> &H(QV   &H8IJMF(QV   •CPU 0.9 1.2 F  F  F      ॲཧ࣌ؒɹ<ඵ> &H$QV   &H8IJMF$QV   SG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  18. (1/3) 20 ▪ EG : teg ≈ tsgw + N(ted

    + teu) • tf.xxx(..) , EG/SG • EG N : teg CPU a ap + + aa ted tec teu teg = N(tec + ted + teu ) = Ntec + N(ted + teu ) • SG-While loop : tsgw CPU a0 ap + WL + WL a i a twd twc = Ntec twu tsgw = twc + twd + twu ≈ twc TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  19. (2/3) 21 ▪ 1000 tDU = 1000(ted + teu) CPU

    for in range(1000): buf = np.empty(size) ans = sess.run(ans get, feed dict= a: buf) :np.empty a = tf.placeholder(...) ans get = func(a) def func(ain): return tf.add(ain, 1.) ▪ ➡ F  F  F     ॲཧ࣌ؒɹ<ඵ> (16 ฏۉ F  F  F    ॲཧ࣌ؒɹ<ඵ> $16 ฏۉ ted teu TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  20. (3/3) 22 ▪ teg = tsgw + 8tDU ➡GPU ,

    AVX ➡ • EG,SG • F  F  F      ॲཧ࣌ؒɹ<ඵ> (16 ਪఆ teg tsgw F  F  F       ॲཧ࣌ؒɹ<ඵ> $16 ਪఆ tegavx tsgw TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  21. EG 23 ▪GPU (AVX)➡Phi/Intel Wiki F  F  F

         ॲཧ࣌ؒɹ<ඵ> &H (QV $QV "WY ☞ GPU ☞CPU ☞N NumPy N Xeon:4 22 (L3=20 60MB) Wiki Atom:2 16 (L3=2 16MB) Wiki ARM/DynamIQ: 1 8 GPU ▼ + ➡GPU ▼ / + ➡GPU :Google+Intel for Phi✈ Xeon(22 ), AVX2 :PyTorch, Chainer: TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  22. SG 24 ▪ : feed dict={x: xs, y: ys... }

    GPU 25.8 ➡1.2 AVX 1.6 F  F  F       ॲཧ࣌ؒɹ<ඵ> ཚ਺సૹ /Q  4H(QV  &H"WY  4Hసૹແ  F  F  F       ॲཧ࣌ؒɹ<ඵ> ཚ਺సૹ੒෼ /Q  4Hཚ਺సૹ  4H ཚ਺సૹ tDU  ཚ਺ੜ੒  4H  } } } } TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  23. (1/2): 25 : a n1 = fc1 (a) ➡ n2

    = fc2 (a) SIMD n = fc (a) n[0 : 10] = fc (a[0 : 10]) n[10 : 20] = fc (a[10 : 20]) a fc1 fc2 n1 n2 a[i] = fu (m) tf.while loop tf.cond Eager Session.run tf.Variable tfe.Variable tf.TensorArray tf.assign xxx() tf.scater xxx() tf.control dependencies · · TF TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  24. (2/2): 26 TF: H/W,OS, .. ➡ , ▼ ▼ DL

    Help: HW OS Linux Windows Esxi Android iOS Web H/W CUDA Phi/Intel AVX,Intel-Python GFX(AMD) ?/ARM TF Eager Run-SIMD I/O DL Model Based Data Analysis TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  25. 27 ▪ ,TF •EG,SG ▼ GPU ▼ tf.xxx() ▼ CPU

    EG ) •EG ▼ , ( , Lib ) ▼ Python ▼ EG SG •SG ▼ ❙ ❙ ❙ ▼ ▼ ▪ : ,Variable,DL .. TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  26. 30 EG for ... # −∗− coding : utf −8

    −∗− import t e n s o r f l o w as t f import t e n s o r f l o w . c o n t r i b . eag er as t f e t f e . e n a b l e e a g e r e x e c u t i o n ( ) def gen ( a , loop ) : f o r in range ( loop ) : a = t f . add ( a , 1) r e t u r n a a = t f . zero s ( [ 1 0 ] ) a = gen ( a , 3) p r i n t ( a ) SG for ... # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np N = 3 def gen ( a ) : f o r in range (N) : a = t f . add ( a , 1) r e t u r n a ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’a ’ ) ag = gen ( ap ) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = np . zero s ( [ 1 0 ] ) a = s e s s . run ( ag , f e e d d i c t ={ap : a }) p r i n t ( a ) Run # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’ap ’ ) aa = t f . add ( ap , 1) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = np . zero s ( [ 1 0 ] ) f o r in range ( 3 ) : a = s e s s . run ( aa , f e e d d i c t ={ap : a }) p r i n t ( a ) While loop # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’a ’ ) i , a = t f . wh ile lo o p ( lambda i , a : t f . l e s s ( i , 3 ) , lambda i , a : ( t f . add ( i , 1 ) , t f . add ( a , 1 ) ) , [0 , ap ] ) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = s e s s . run ( a , f e e d d i c t ={ap : np . zero s ( [ 1 0 ] ) } ) p r i n t ( a ) TFUG-Tokyo#7-1 2018/1/29 Y. Okuda