Basic behaviors of TensorFlow, Eager and Session.run

Basic behaviors of TensorFlow, Eager and Session.run

A research on basic functionalities of TensorFlow(TF) for general numeric analysis applications not for DML applications. Applying model based data analysis shows basic behaviors of TF and effectiveness of multi-cores on Eager mode, which indicates the possibility of success of Google and Intel activities for Phi and AVX2.

B2a634030954e111197b51a39cbb5925?s=128

Yukio Okuda

January 31, 2018
Tweet

Transcript

  1. ( ) tfug.skiyuki@gmail.com TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

  2. 2 Fortran, PL/I, RPG 360/IBM Bliss, C, X11 VAX/DEC (ADA,

    Objective-C) C++, Octave, Tcl Sparc/Sun CERN-Root PC/Intel JavaScript Python /Nvidia? DL(NN/AI) TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  3. 3 ❶DL ❷Python ▪ Python GIL ➡ • CPU➡ GHz

    ➡ • GPU➡GP-GPU➡CUDA • Spark✈ , TPU,Phi✈ , Nervana✈ , Tegra✈ , ARM, QUALCOMM ▪ Python: ➡ ▪ : • PyCUDA, OpenCL, PySpark • Pandas ➡Dask, Intel-Python • List : Numba • NumPy : TensorFlow(TF), PyTorch, CuPy ▼ TF: ➡ TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  4. 4 ❶ ❷ ❸ ❹ ❺ SG: Static Graph @session.run

    EG: Eager, not Graph TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  5. H/W S/W 5 env-1, env-2,...env-n Python 3.5 Conda 4.3.30 Mint

    Linux(Ubuntu 16.04) CPU + GPU SSH, NFS CPU: i7-2630QM (Sandy Bridge’12 ) 2.4 GHz 4 8 L1=256K, L2=1M, L3=6M PCIe II 5GT/s DDR3 16G 21.3G/s QM77, NF9G(Jetway ) GPU: GTX-1060 (Pascal GP-106) 1.5 GHz 1280 L2=1.5M(192bI/F) PCIe II 5GT/s DDR5 6G 8G/s CUDA-8 CC-6.1 TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  6. Env 6 $ conda create -n XXX python=3.5 $ source

    activate XXX $ pip install YYYYY.whl CPU GPU 1.1 TensorFlow-SG 1.4 1.2 TensorFlow-SG 1.4 1.3 TensorFlow-EG 1.5-dev20171127 1.4 TensorFlow-EG 1.5-dev20171127 1.5 TensorFlow-EG/SG 1.5 AVX Python-2.7 2.1 PyTorch 0.2.0 4 3.1 CuPy 2.2.0 4.1 Numba 0.36.1 5.1 Intel Python TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  7. 7 ▪ 0, 1) shot hit r 0 1 1

    ▪NumPy • SIMD ▪ np. ➡tf. torch. cp.(CuPy) π = 4 · hit/shot import numpy as np def get(size, shot, hit): x = np.random.rand(size) y = np.random.rand(size) r = np.sqrt(np.add( np.multiply(x, x), np.multiply(y, y) one = np.array([1.]*size) lss = np.less equal(r, one) hit one = np.count nonzero(lss) hit += hit one shot += size return shot, hit TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  8. 8 for size in [10**2, ..., 10**6]: shot, hit =

    0, 0 shot, hit = get(size, shot, hit) pi = 4 * hit/ shot rec( pi, shot, size) F  F  F  4J[F    ܦա࣌ؒɹ<ඵ> /1࣌ؒ — — — ઈରޡࠩɹMPH /1ޡࠩ for size in [10**2, ..., 10**6]: for in range(1000): shot, hit = 0, 0 shot, hit = get(size, shot, hit) pi = 4 * hit/ shot rec( pi, shot, size) F  F  F  4J[F      ॲཧ࣌ؒɹ<ඵ> Y Y /1  5'  TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  9. TF 9 CPU 4 6 !! CPU GPU 24 76

    While F  F  F  F  F  F           ॲཧ࣌ؒɹ<ඵ> /Q  &H$QV   4H$QV  4H(QV  4H8IJMF(QV  TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  10. NumPy 10 ▪np. ➡tf. torch. NumPy: np.xxx() x = np

    . random . rand ( s i z e ) . ast ype ( np . f l o a t 3 2 ) y = np . random . rand ( s i z e ) . ast ype ( np . f l o a t 3 2 ) r s = np . s q r t ( np . add ( np . m u l t i p l y ( x , x ) , np . m u l t i p l y ( y , y ) ) ) ones = np . a r r a y ( [ 1 . ] ∗ si ze , dtype=np . f l o a t 3 2 ) l s s = np . l e s s e q u a l ( rs , ones ) h i t o n e = np . count nonzero ( l s s ) 1 TF-Eg: tf.xxx() x = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) y = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) r s = t f . s q r t ( t f . add ( t f . m u l t i p l y ( x , x ) , t f . m u l t i p l y ( y , y ) ) ) ones = t f . ones ( [ s i z e ] , dtype= t f . f l o a t 3 2 ) l s s = t f . l e s s e q u a l ( rs , ones ) h i t o n e = t f . count nonzero ( l ss , dtype= t f . i n t 6 4 ) 8! PyTorch: torch.xxx() x = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) y = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) r s = t f . s q r t ( t f . add ( t f . m u l t i p l y ( x , x ) , t f . m u l t i p l y ( y , y ) ) ) ones = t f . ones ( [ s i z e ] , dtype= t f . f l o a t 3 2 ) l s s = t f . l e s s e q u a l ( rs , ones ) h i t o n e = t f . count nonzero ( l ss , dtype= t f . i n t 6 4 ) 3+ EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  11. Np EG Pt CPU 11 ▪PyTorch(Pt) ▪EG F  F

     F       ॲཧ࣌ؒɹ<ඵ> /Q  1U$QV  &H$QV   ▪EG • 0.5M • float32= 4byte • 2 • 4x3x0.5=6M=L3 ▪ AVX F  F  F      ॲཧ࣌ؒɹ<ඵ>   F &H$QV &H"WY EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  12. CPU 12 ▪ N3150 4.3 (Braswell’15 Celeron/Atom) 1.6-2GH, 4 L1=224K

    L2=2M(1Mx2) F  F  F      ॲཧ࣌ؒɹ<ඵ> /Q  &H$QV  AVX 4.2 3.7 4.6 2 1.6-2.4GH, 4 8 L1=256K, L2=1M, L3=6M F  F  F       ॲཧ࣌ؒɹ<ඵ> /Q  &H$QV   &H"WY  EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  13. EG,SG 13 Eager Static Graph # # def gen(a, loop):

    for in range(loop): a = tf.add(a, 1) return a # # ( ) N = 3 def gen(a): for in range(N): a = tf.add(a, 1) return a ap= tf.placeholder(··) ag = gen(ap) sess = tf.Session() # a = tf.zeros(··) a = gen(a,3) # sess.run(tf.··initializer()) a = np.zeros(··) a = sess.run(ag, feed dict=ap: a) tf.Tensor tf.Variable tf.TensorArray ▼ tf.Variable ▼ tfe.Variable tf.xxx TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  14. EG,SG 14 for in range(N): a = tf.add(a, ) EG

    CPU a a + + DDR➡M-Con.➡CPU➡PCIe➡GPU-M-Bus➡DDR a1 SG CPU a a +1 +2 +N +1 +2 +N a1 a2 aN N N TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  15. EG,SG GPU 15 ▪EG GPU • : =9, (8 +

    8) =.. F  F  F      ॲཧ࣌ؒɹ<ඵ> &H$QV &H(QV ▪SG GPU 3.8 • : = , 0, (2, 2) F  F  F    ॲཧ࣌ؒɹ<ඵ> 4H$QV  4H(QV  EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  16. SG2 16 ▪ Run ap = tf.placeholder(...) aa = tf.add(ap,

    1) sess = tf.Session() sess.run(tf.global variables initializer()) a = np.zeros([10]) for in range(N): a = sess.run(aa, feed dict=ap: a) CPU a ap + + aa ▪ While loop ap = tf.placeholder(...) i, a = tf.while loop( lambda i, a: tf.less(i,N), lambda i, a: (tf.add(i, 1), tf.add(a, 1)), [0, ap]) sess = tf.Session(); sess.run(tf.global variables initializer()) a = sess.run(a, feed dict=ap:..) CPU a0 ap + WL + WL a i a TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  17. SG While loop 17 ▪GPU 3.2 F  F 

    F      ॲཧ࣌ؒɹ<ඵ> 4H(QV  4H8IJMF(QV  ▪CPU 1. F  F  F        ॲཧ࣌ؒɹ<ඵ> 4H$QV  4H8IJMF$QV  SG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  18. EG While loop 18 ▪ SG (SG ) •GPU 0.9

    1.3 F  F  F      ॲཧ࣌ؒɹ<ඵ> &H(QV   &H8IJMF(QV   •CPU 0.9 1.2 F  F  F      ॲཧ࣌ؒɹ<ඵ> &H$QV   &H8IJMF$QV   SG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  19. TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

  20. (1/3) 20 ▪ EG : teg ≈ tsgw + N(ted

    + teu) • tf.xxx(..) , EG/SG • EG N : teg CPU a ap + + aa ted tec teu teg = N(tec + ted + teu ) = Ntec + N(ted + teu ) • SG-While loop : tsgw CPU a0 ap + WL + WL a i a twd twc = Ntec twu tsgw = twc + twd + twu ≈ twc TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  21. (2/3) 21 ▪ 1000 tDU = 1000(ted + teu) CPU

    for in range(1000): buf = np.empty(size) ans = sess.run(ans get, feed dict= a: buf) :np.empty a = tf.placeholder(...) ans get = func(a) def func(ain): return tf.add(ain, 1.) ▪ ➡ F  F  F     ॲཧ࣌ؒɹ<ඵ> (16 ฏۉ F  F  F    ॲཧ࣌ؒɹ<ඵ> $16 ฏۉ ted teu TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  22. (3/3) 22 ▪ teg = tsgw + 8tDU ➡GPU ,

    AVX ➡ • EG,SG • F  F  F      ॲཧ࣌ؒɹ<ඵ> (16 ਪఆ teg tsgw F  F  F       ॲཧ࣌ؒɹ<ඵ> $16 ਪఆ tegavx tsgw TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  23. EG 23 ▪GPU (AVX)➡Phi/Intel Wiki F  F  F

         ॲཧ࣌ؒɹ<ඵ> &H (QV $QV "WY ☞ GPU ☞CPU ☞N NumPy N Xeon:4 22 (L3=20 60MB) Wiki Atom:2 16 (L3=2 16MB) Wiki ARM/DynamIQ: 1 8 GPU ▼ + ➡GPU ▼ / + ➡GPU :Google+Intel for Phi✈ Xeon(22 ), AVX2 :PyTorch, Chainer: TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  24. SG 24 ▪ : feed dict={x: xs, y: ys... }

    GPU 25.8 ➡1.2 AVX 1.6 F  F  F       ॲཧ࣌ؒɹ<ඵ> ཚ਺సૹ /Q  4H(QV  &H"WY  4Hసૹແ  F  F  F       ॲཧ࣌ؒɹ<ඵ> ཚ਺సૹ੒෼ /Q  4Hཚ਺సૹ  4H ཚ਺సૹ tDU  ཚ਺ੜ੒  4H  } } } } TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  25. (1/2): 25 : a n1 = fc1 (a) ➡ n2

    = fc2 (a) SIMD n = fc (a) n[0 : 10] = fc (a[0 : 10]) n[10 : 20] = fc (a[10 : 20]) a fc1 fc2 n1 n2 a[i] = fu (m) tf.while loop tf.cond Eager Session.run tf.Variable tfe.Variable tf.TensorArray tf.assign xxx() tf.scater xxx() tf.control dependencies · · TF TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  26. (2/2): 26 TF: H/W,OS, .. ➡ , ▼ ▼ DL

    Help: HW OS Linux Windows Esxi Android iOS Web H/W CUDA Phi/Intel AVX,Intel-Python GFX(AMD) ?/ARM TF Eager Run-SIMD I/O DL Model Based Data Analysis TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  27. 27 ▪ ,TF •EG,SG ▼ GPU ▼ tf.xxx() ▼ CPU

    EG ) •EG ▼ , ( , Lib ) ▼ Python ▼ EG SG •SG ▼ ❙ ❙ ❙ ▼ ▼ ▪ : ,Variable,DL .. TFUG-Tokyo#7-1 2018/1/29 Y. Okuda
  28. Q/A or TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

  29. TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

  30. 30 EG for ... # −∗− coding : utf −8

    −∗− import t e n s o r f l o w as t f import t e n s o r f l o w . c o n t r i b . eag er as t f e t f e . e n a b l e e a g e r e x e c u t i o n ( ) def gen ( a , loop ) : f o r in range ( loop ) : a = t f . add ( a , 1) r e t u r n a a = t f . zero s ( [ 1 0 ] ) a = gen ( a , 3) p r i n t ( a ) SG for ... # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np N = 3 def gen ( a ) : f o r in range (N) : a = t f . add ( a , 1) r e t u r n a ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’a ’ ) ag = gen ( ap ) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = np . zero s ( [ 1 0 ] ) a = s e s s . run ( ag , f e e d d i c t ={ap : a }) p r i n t ( a ) Run # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’ap ’ ) aa = t f . add ( ap , 1) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = np . zero s ( [ 1 0 ] ) f o r in range ( 3 ) : a = s e s s . run ( aa , f e e d d i c t ={ap : a }) p r i n t ( a ) While loop # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’a ’ ) i , a = t f . wh ile lo o p ( lambda i , a : t f . l e s s ( i , 3 ) , lambda i , a : ( t f . add ( i , 1 ) , t f . add ( a , 1 ) ) , [0 , ap ] ) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = s e s s . run ( a , f e e d d i c t ={ap : np . zero s ( [ 1 0 ] ) } ) p r i n t ( a ) TFUG-Tokyo#7-1 2018/1/29 Y. Okuda