Basic behaviors of TensorFlow, Eager and Session.run

( ) [email protected] TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

2 Fortran, PL/I, RPG 360/IBM Bliss, C, X11 VAX/DEC (ADA,
Objective-C) C++, Octave, Tcl Sparc/Sun CERN-Root PC/Intel JavaScript Python /Nvidia? DL(NN/AI) TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

3 ❶DL ❷Python ▪ Python GIL ➡ • CPU➡ GHz
➡ • GPU➡GP-GPU➡CUDA • Spark✈ , TPU,Phi✈ , Nervana✈ , Tegra✈ , ARM, QUALCOMM ▪ Python: ➡ ▪ : • PyCUDA, OpenCL, PySpark • Pandas ➡Dask, Intel-Python • List : Numba • NumPy : TensorFlow(TF), PyTorch, CuPy ▼ TF: ➡ TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

4 ❶ ❷ ❸ ❹ ❺ SG: Static Graph @session.run
EG: Eager, not Graph TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

H/W S/W 5 env-1, env-2,...env-n Python 3.5 Conda 4.3.30 Mint
Linux(Ubuntu 16.04) CPU + GPU SSH, NFS CPU: i7-2630QM (Sandy Bridge’12 ) 2.4 GHz 4 8 L1=256K, L2=1M, L3=6M PCIe II 5GT/s DDR3 16G 21.3G/s QM77, NF9G(Jetway ) GPU: GTX-1060 (Pascal GP-106) 1.5 GHz 1280 L2=1.5M(192bI/F) PCIe II 5GT/s DDR5 6G 8G/s CUDA-8 CC-6.1 TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

Env 6 $ conda create -n XXX python=3.5 $ source
activate XXX $ pip install YYYYY.whl CPU GPU 1.1 TensorFlow-SG 1.4 1.2 TensorFlow-SG 1.4 1.3 TensorFlow-EG 1.5-dev20171127 1.4 TensorFlow-EG 1.5-dev20171127 1.5 TensorFlow-EG/SG 1.5 AVX Python-2.7 2.1 PyTorch 0.2.0 4 3.1 CuPy 2.2.0 4.1 Numba 0.36.1 5.1 Intel Python TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

7 ▪ 0, 1) shot hit r 0 1 1
▪NumPy • SIMD ▪ np. ➡tf. torch. cp.(CuPy) π = 4 · hit/shot import numpy as np def get(size, shot, hit): x = np.random.rand(size) y = np.random.rand(size) r = np.sqrt(np.add( np.multiply(x, x), np.multiply(y, y) one = np.array([1.]*size) lss = np.less equal(r, one) hit one = np.count nonzero(lss) hit += hit one shot += size return shot, hit TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

8 for size in [10**2, ..., 10**6]: shot, hit =
0, 0 shot, hit = get(size, shot, hit) pi = 4 * hit/ shot rec( pi, shot, size) F F F 4J[F ܦա࣌ؒɹ<ඵ> /1࣌ؒ ઈରޡࠩɹMPH /1ޡࠩ for size in [10**2, ..., 10**6]: for in range(1000): shot, hit = 0, 0 shot, hit = get(size, shot, hit) pi = 4 * hit/ shot rec( pi, shot, size) F F F 4J[F ॲཧ࣌ؒɹ<ඵ> Y Y /1 5' TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

TF 9 CPU 4 6 !! CPU GPU 24 76
While F F F F F F ॲཧ࣌ؒɹ<ඵ> /Q &H$QV 4H$QV 4H(QV 4H8IJMF(QV TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

NumPy 10 ▪np. ➡tf. torch. NumPy: np.xxx() x = np
. random . rand ( s i z e ) . ast ype ( np . f l o a t 3 2 ) y = np . random . rand ( s i z e ) . ast ype ( np . f l o a t 3 2 ) r s = np . s q r t ( np . add ( np . m u l t i p l y ( x , x ) , np . m u l t i p l y ( y , y ) ) ) ones = np . a r r a y ( [ 1 . ] ∗ si ze , dtype=np . f l o a t 3 2 ) l s s = np . l e s s e q u a l ( rs , ones ) h i t o n e = np . count nonzero ( l s s ) 1 TF-Eg: tf.xxx() x = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) y = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) r s = t f . s q r t ( t f . add ( t f . m u l t i p l y ( x , x ) , t f . m u l t i p l y ( y , y ) ) ) ones = t f . ones ( [ s i z e ] , dtype= t f . f l o a t 3 2 ) l s s = t f . l e s s e q u a l ( rs , ones ) h i t o n e = t f . count nonzero ( l ss , dtype= t f . i n t 6 4 ) 8! PyTorch: torch.xxx() x = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) y = t f . random uniform ( shape =[ s i z e ] , minval =0. , maxval =1. , dtype= t f . f l o a t 3 2 ) r s = t f . s q r t ( t f . add ( t f . m u l t i p l y ( x , x ) , t f . m u l t i p l y ( y , y ) ) ) ones = t f . ones ( [ s i z e ] , dtype= t f . f l o a t 3 2 ) l s s = t f . l e s s e q u a l ( rs , ones ) h i t o n e = t f . count nonzero ( l ss , dtype= t f . i n t 6 4 ) 3+ EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

Np EG Pt CPU 11 ▪PyTorch(Pt) ▪EG F F
F ॲཧ࣌ؒɹ<ඵ> /Q 1U$QV &H$QV ▪EG • 0.5M • ﬂoat32= 4byte • 2 • 4x3x0.5=6M=L3 ▪ AVX F F F ॲཧ࣌ؒɹ<ඵ> F &H$QV &H"WY EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

CPU 12 ▪ N3150 4.3 (Braswell’15 Celeron/Atom) 1.6-2GH, 4 L1=224K
L2=2M(1Mx2) F F F ॲཧ࣌ؒɹ<ඵ> /Q &H$QV AVX 4.2 3.7 4.6 2 1.6-2.4GH, 4 8 L1=256K, L2=1M, L3=6M F F F ॲཧ࣌ؒɹ<ඵ> /Q &H$QV &H"WY EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

EG,SG 13 Eager Static Graph # # def gen(a, loop):
for in range(loop): a = tf.add(a, 1) return a # # ( ) N = 3 def gen(a): for in range(N): a = tf.add(a, 1) return a ap= tf.placeholder(··) ag = gen(ap) sess = tf.Session() # a = tf.zeros(··) a = gen(a,3) # sess.run(tf.··initializer()) a = np.zeros(··) a = sess.run(ag, feed dict=ap: a) tf.Tensor tf.Variable tf.TensorArray ▼ tf.Variable ▼ tfe.Variable tf.xxx TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

EG,SG 14 for in range(N): a = tf.add(a, ) EG
CPU a a + + DDR➡M-Con.➡CPU➡PCIe➡GPU-M-Bus➡DDR a1 SG CPU a a +1 +2 +N +1 +2 +N a1 a2 aN N N TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

EG,SG GPU 15 ▪EG GPU • : =9, (8 +
8) =.. F F F ॲཧ࣌ؒɹ<ඵ> &H$QV &H(QV ▪SG GPU 3.8 • : = , 0, (2, 2) F F F ॲཧ࣌ؒɹ<ඵ> 4H$QV 4H(QV EG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

SG2 16 ▪ Run ap = tf.placeholder(...) aa = tf.add(ap,
1) sess = tf.Session() sess.run(tf.global variables initializer()) a = np.zeros([10]) for in range(N): a = sess.run(aa, feed dict=ap: a) CPU a ap + + aa ▪ While loop ap = tf.placeholder(...) i, a = tf.while loop( lambda i, a: tf.less(i,N), lambda i, a: (tf.add(i, 1), tf.add(a, 1)), [0, ap]) sess = tf.Session(); sess.run(tf.global variables initializer()) a = sess.run(a, feed dict=ap:..) CPU a0 ap + WL + WL a i a TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

SG While loop 17 ▪GPU 3.2 F F
F ॲཧ࣌ؒɹ<ඵ> 4H(QV 4H8IJMF(QV ▪CPU 1. F F F ॲཧ࣌ؒɹ<ඵ> 4H$QV 4H8IJMF$QV SG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

EG While loop 18 ▪ SG (SG ) •GPU 0.9
1.3 F F F ॲཧ࣌ؒɹ<ඵ> &H(QV &H8IJMF(QV •CPU 0.9 1.2 F F F ॲཧ࣌ؒɹ<ඵ> &H$QV &H8IJMF$QV SG TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

(1/3) 20 ▪ EG : teg ≈ tsgw + N(ted
+ teu) • tf.xxx(..) , EG/SG • EG N : teg CPU a ap + + aa ted tec teu teg = N(tec + ted + teu ) = Ntec + N(ted + teu ) • SG-While loop : tsgw CPU a0 ap + WL + WL a i a twd twc = Ntec twu tsgw = twc + twd + twu ≈ twc TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

(2/3) 21 ▪ 1000 tDU = 1000(ted + teu) CPU
for in range(1000): buf = np.empty(size) ans = sess.run(ans get, feed dict= a: buf) :np.empty a = tf.placeholder(...) ans get = func(a) def func(ain): return tf.add(ain, 1.) ▪ ➡ F F F ॲཧ࣌ؒɹ<ඵ> (16 ฏۉ F F F ॲཧ࣌ؒɹ<ඵ> $16 ฏۉ ted teu TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

(3/3) 22 ▪ teg = tsgw + 8tDU ➡GPU ,
AVX ➡ • EG,SG • F F F ॲཧ࣌ؒɹ<ඵ> (16 ਪఆ teg tsgw F F F ॲཧ࣌ؒɹ<ඵ> $16 ਪఆ tegavx tsgw TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

EG 23 ▪GPU (AVX)➡Phi/Intel Wiki F F F
ॲཧ࣌ؒɹ<ඵ> &H (QV $QV "WY ☞ GPU ☞CPU ☞N NumPy N Xeon:4 22 (L3=20 60MB) Wiki Atom:2 16 (L3=2 16MB) Wiki ARM/DynamIQ: 1 8 GPU ▼ + ➡GPU ▼ / + ➡GPU :Google+Intel for Phi✈ Xeon(22 ), AVX2 :PyTorch, Chainer: TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

SG 24 ▪ : feed dict={x: xs, y: ys... }
GPU 25.8 ➡1.2 AVX 1.6 F F F ॲཧ࣌ؒɹ<ඵ> ཚ਺సૹ /Q 4H(QV &H"WY 4Hసૹແ F F F ॲཧ࣌ؒɹ<ඵ> ཚ਺సૹ੒෼ /Q 4Hཚ਺సૹ 4H ཚ਺సૹ tDU ཚ਺ੜ੒ 4H } } } } TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

(1/2): 25 : a n1 = fc1 (a) ➡ n2
= fc2 (a) SIMD n = fc (a) n[0 : 10] = fc (a[0 : 10]) n[10 : 20] = fc (a[10 : 20]) a fc1 fc2 n1 n2 a[i] = fu (m) tf.while loop tf.cond Eager Session.run tf.Variable tfe.Variable tf.TensorArray tf.assign xxx() tf.scater xxx() tf.control dependencies · · TF TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

(2/2): 26 TF: H/W,OS, .. ➡ , ▼ ▼ DL
Help: HW OS Linux Windows Esxi Android iOS Web H/W CUDA Phi/Intel AVX,Intel-Python GFX(AMD) ?/ARM TF Eager Run-SIMD I/O DL Model Based Data Analysis TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

27 ▪ ,TF •EG,SG ▼ GPU ▼ tf.xxx() ▼ CPU
EG ) •EG ▼ , ( , Lib ) ▼ Python ▼ EG SG •SG ▼ ❙ ❙ ❙ ▼ ▼ ▪ : ,Variable,DL .. TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

Q/A or TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

30 EG for ... # −∗− coding : utf −8
−∗− import t e n s o r f l o w as t f import t e n s o r f l o w . c o n t r i b . eag er as t f e t f e . e n a b l e e a g e r e x e c u t i o n ( ) def gen ( a , loop ) : f o r in range ( loop ) : a = t f . add ( a , 1) r e t u r n a a = t f . zero s ( [ 1 0 ] ) a = gen ( a , 3) p r i n t ( a ) SG for ... # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np N = 3 def gen ( a ) : f o r in range (N) : a = t f . add ( a , 1) r e t u r n a ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’a ’ ) ag = gen ( ap ) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = np . zero s ( [ 1 0 ] ) a = s e s s . run ( ag , f e e d d i c t ={ap : a }) p r i n t ( a ) Run # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’ap ’ ) aa = t f . add ( ap , 1) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = np . zero s ( [ 1 0 ] ) f o r in range ( 3 ) : a = s e s s . run ( aa , f e e d d i c t ={ap : a }) p r i n t ( a ) While loop # −∗− coding : utf −8 −∗− import t e n s o r f l o w as t f import numpy as np ap = t f . p l a c e h o l d e r ( t f . int32 , None , name= ’a ’ ) i , a = t f . wh ile lo o p ( lambda i , a : t f . l e s s ( i , 3 ) , lambda i , a : ( t f . add ( i , 1 ) , t f . add ( a , 1 ) ) , [0 , ap ] ) s e s s = t f . Sessio n ( ) s e s s . run ( t f . g l o b a l v a r i a b l e s i n i t i a l i z e r ( ) ) a = s e s s . run ( a , f e e d d i c t ={ap : np . zero s ( [ 1 0 ] ) } ) p r i n t ( a ) TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

Basic behaviors of TensorFlow, Eager and Sessio...

Basic behaviors of TensorFlow, Eager and Session.run

Yukio Okuda

More Decks by Yukio Okuda

Other Decks in Programming

Featured

Transcript

( ) [email protected] TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

2 Fortran, PL/I, RPG 360/IBM Bliss, C, X11 VAX/DEC (ADA,

3 ❶DL ❷Python ▪ Python GIL ➡ • CPU➡ GHz

4 ❶ ❷ ❸ ❹ ❺ SG: Static Graph @session.run

H/W S/W 5 env-1, env-2,...env-n Python 3.5 Conda 4.3.30 Mint

Env 6 $ conda create -n XXX python=3.5 $ source

7 ▪ 0, 1) shot hit r 0 1 1

8 for size in [102, ..., 106]: shot, hit =

TF 9 CPU 4 6 !! CPU GPU 24 76

NumPy 10 ▪np. ➡tf. torch. NumPy: np.xxx() x = np

Np EG Pt CPU 11 ▪PyTorch(Pt) ▪EG F F

CPU 12 ▪ N3150 4.3 (Braswell’15 Celeron/Atom) 1.6-2GH, 4 L1=224K

EG,SG 13 Eager Static Graph # # def gen(a, loop):

EG,SG 14 for in range(N): a = tf.add(a, ) EG

EG,SG GPU 15 ▪EG GPU • : =9, (8 +

SG2 16 ▪ Run ap = tf.placeholder(...) aa = tf.add(ap,

SG While loop 17 ▪GPU 3.2 F F

EG While loop 18 ▪ SG (SG ) •GPU 0.9

TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

(1/3) 20 ▪ EG : teg ≈ tsgw + N(ted

(2/3) 21 ▪ 1000 tDU = 1000(ted + teu) CPU

(3/3) 22 ▪ teg = tsgw + 8tDU ➡GPU ,

EG 23 ▪GPU (AVX)➡Phi/Intel Wiki F F F

SG 24 ▪ : feed dict={x: xs, y: ys... }

(1/2): 25 : a n1 = fc1 (a) ➡ n2

(2/2): 26 TF: H/W,OS, .. ➡ , ▼ ▼ DL

27 ▪ ,TF •EG,SG ▼ GPU ▼ tf.xxx() ▼ CPU

Q/A or TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

TFUG-Tokyo#7-1 2018/1/29 Y. Okuda

30 EG for ... # −∗− coding : utf −8