Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Spying on Hadoop with strace

Julia Evans
December 10, 2014

Spying on Hadoop with strace

Julia Evans

December 10, 2014
Tweet

More Decks by Julia Evans

Other Decks in Technology

Transcript

  1. SPYING ON HADOOP WITH STRACE by Julia Evans t w

    i t t e r . c o m / b 0 r k g i t h u b . c o m / j v n s j v n s . c a
  2. LET'S USE HADOOP! $ h a d o o p

    f s - l s / w i k i p e d i a . c s v
  3. HOW TO STRACE $ s t r a c e

    h a d o o p f s - l s / p e n g u i n
  4. OPEN s t r a c e - e o

    p e n h a d o o p f s - l s / p a n d a o p e n ( " < b > / e t c / h a d o o p / m a p r e d - s i t e . x m l < / b > " , O _ R D O N L Y ) = 2 7 4 o p e n ( " / e t c / h a d o o p / y a r n - s i t e . x m l " , O _ R D O N L Y ) = 2 7 4 o p e n ( " / e t c / h a d o o p / h d f s - s i t e . x m l " , O _ R D O N L Y ) = 2 7 4
  5. SENDTO $ s t r a c e s n

    a k e b i t e l s / u n i c o r n c o n n e c t ( 8 , { s a _ f a m i l y = A F _ I N E T , s i n _ p o r t = h t o n s ( < b > 8 2 0 0 < / b > ) , s i n _ a d d r = i n e t _ a d d r ( " < b > 1 0 . 1 4 7 . 1 7 7 . 1 7 0 < / b > " ) } , 1 6 ) = 0 s e n d t o ( 8 , " \ n B \ n 5 \ n 3 \ n ( B P - 1 0 1 9 3 3 6 1 8 3 - 1 0 . 1 6 5 . 4 3 . 3 9 - 1 4 0 0 0 8 8 4 0 9 4 9 8 \ 2 0 \ 2 1 1 \ 2 0 0 \ 2 0 0 \ 2 0 0 \ 4 \ 3 0 \ 3 6 1 \ 7 \ 2 2 7 5 , 0 , N U L L , 0 ) = 7 5
  6. RECVFROM s t r a c e h a d

    o o p f s - g e t / w i k i p e d i a . c s v r e c v f r o m ( 8 , " o t , i t ' s a p a i n t i n g . T h o m a s G r a e m e a p p a r e n t l y l i v e d i n t h e m i d - 1 8 t h c e n t u r y , a c c o r d i n g t o t h e [ [ G r a e m e P a r k ] ] a r t i c l e . T h e r a t i o n a l e a l s o s a y s t h a t t h i s i m a g e i s " u s e d o n t h e b i o g r a p h y p a g e a b o u t h i m b y U S H i s t o r y . o r g o f G r a e m e P a r k . " I c a n n o t q u i t e f i g u r e o u t w h a t t h i s m e a n s , b u t I a m g u e s s i n g t h a t i t m e a n s t h e u p l o a d e r t o o k t h i s i m a g e f r o m a p a g e h o s t e d o n U S H i s t o r y . o r g . A p a i n t i n g o f a m a n w h o l i v e d i n t h e m i d - 1 8 t h c e n t u r y i s l i k e l y t o b e t h e p u b l i c d o m a i n , a s c l a i m e d , b u t w e h a v e n o g o o d s o u r c e " , 5 1 2 , 0 , N U L L , N U L L ) = 5 1 2
  7. TOY HADOOP CLUSTER 1 master machine (namenode) hadoop-m-0 2 worker

    machines (datanode) hadoop-w-0, hadoop-w-1 20 GB of disk space one 14 GB file / w i k i p e d i a . c s v $ s n a k e b i t e l s - h / 1 4 . 1 G / w i k i p e d i a . c s v
  8. HOW HDFS WORKS The namenode knows where all the files

    are hadoop-m-0 The datanodes store the files adoop-w-0, hadoop-w-1
  9. HOW HDFS WORKS Files are split into blocks Blocks for

    / w i k i p e d i a . c s v B y t e s | B l o c k I D | # L o c a t i o n s | H o s t n a m e s 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 5 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 6 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 7 | 1 | h a d o o p - w - 0 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 8 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 9 | 1 | h a d o o p - w - 0 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 3 0 | 1 | h a d o o p - w - 1 . . . . 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 1 3 6 | 1 | h a d o o p - w - 0 6 6 7 8 3 7 2 0 | 1 0 7 3 7 4 2 1 3 7 | 1 | h a d o o p - w - 1
  10. HOW TO READ A FILE Ask the namenode where the

    blocks for that are Ask the data nodes for each block
  11. LET'S STRACE IT! s n a k e b i

    t e c a t / w i k i p e d i a . c s v | h e a d s t r a c e - e r e c v f r o m , s e n d t o , c o n n e c t \ s n a k e b i t e c a t / w i k i p e d i a . c s v | h e a d
  12. 1: WHERE ARE THE BLOCKS? (the namenode is at 10.240.98.73)

    c o n n e c t ( 4 , { s a _ f a m i l y = A F _ I N E T , s i n _ p o r t = h t o n s ( 8 0 2 0 ) , s i n _ a d d r = i n e t _ a d d r ( " 1 0 . 2 4 0 . 9 8 . 7 3 " ) } , 1 6 ) s e n d t o ( 4 , " \ n \ v g e t F i l e I n f o \ 2 2 . o r g . a p a c h e . h a d o o p . h d f s . p r o t o c o l . C l i e n t P r o t o c o l \ 3 0 \ 1 " , 6 3 , 0 , N U L L , 0 ) = 6 3 s e n d t o ( 4 , " \ n \ 1 6 / w i k i p e d i a . c s v " , 1 6 , 0 , N U L L , 0 ) = 1 s e n d t o ( 4 , " \ n \ 2 1 g e t B l o c k L o c a t i o n s \ 2 2 . o r g . a p a c h e . h a d o o p . h d f s . p r o t o c o l . C l i e n t P r o t o c o l \ 3 0 \ 1 " , 6 9 , 0 , N U L L , 0 ) = 6 9 s e n d t o ( 4 , " \ n \ 1 6 / w i k i p e d i a . c s v \ 2 0 \ 0 \ 3 0 \ 3 5 0 \ 2 2 3 \ 3 5 4 \ 2 3 7 8 " , 2 4 , 0 , N U L L , 0 ) = 2 4
  13. 1: WHERE ARE THE BLOCKS? r e c v f

    r o m ( 4 , " \ 2 5 5 \ 2 0 2 \ 2 \ n \ 2 5 1 \ 2 0 2 \ 2 \ 1 0 \ 3 5 0 \ 2 2 3 \ 3 5 4 \ 2 3 7 8 \ 2 2 \ 2 3 3 \ 2 \ n 7 \ n ' B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 \ 2 0 \ 3 1 1 \ 2 0 1 \ 2 0 0 \ 2 0 0 \ 4 \ 3 0 \ 2 6 1 \ t \ 2 0 0 \ 2 0 0 \ 2 0 0 @ \ 2 0 \ 0 \ 3 2 \ 2 4 3 \ 1 \ n k \ n \ 0 1 6 1 0 . 2 4 0 . 1 4 6 . 1 6 8 \ 2 2 % h a d o o p - w - 1 . c . s t r a c i n g - h a d o o p . i n t e r n a l \ 3 2 $ 3 5 8 0 4 3 f 6 - 0 5 1 d - 4 0 3 0 - b a 9 b - 3 c d 0 e c 2 8 3 f 6 b \ 3 3 2 \ 2 0 6 \ 3 ( \ 2 3 3 \ 2 0 7 \ 0 0 3 0 \ 3 4 4 \ 2 0 6 \ 0 0 3 8 \ 0 \ 2 0 \ 2 0 0 \ 3 0 0 \ 3 2 3 \ 3 5 6 & \ 3 0 \ 2 0 0 \ 2 4 0 \ 3 5 4 \ 3 7 2 \ 3 2 \ 2 0 0 \ 2 0 0 \ 2 1 6 \ 3 4 4 \ 4 ( \ 2 0 0 \ 2 4 0 \ 3 5 4 \ 3 7 2 \ 0 3 2 0 \ 2 3 3 \ 3 2 7 \ 3 0 4 \ 3 4 6 \ 2 4 2 ) 8 \ 1 B \ r / d e f a u l t - r a c k P \ 0 X \ 0 ` \ 0 \ 0 * \ 1 0 \ n \ 0 \ 2 2 \ 0 \ 3 2 \ 0 \ " \ 0 0 0 2 \ 1 \ 0 0 0 8 \ 1 B ' D S - 3 f a 1 3 3 e 4 - 2 b 1 7 - 4 e d 1 - a d c a - f e d 4 7 6 7 a 6 e 6 f \ 2 2 \ 2 3 6 \ 2 \ n 7 \ n ' B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 \ 2 0 \ 3 1 2 \ 2 0 1 \ 2 0 0 \ 2 0 0 \ 4 \ 3 0 \ 2 6 2 \ t \ 2 0 0 \ 2 0 0 \ 2 0 0 @ \ 2 0 \ 2 0 0 \ 2 0 0 \ 2 0 0 @ \ 3 2 \ 2 4 3 \ 1 \ n k \ n \ 0 1 6 1 0 . 2 4 0 . 1 4 6 . 1 6 8 \ 2 2 % h a d o o p - w - 1 . c . s t r a c i n g - h a d o o p . i n t e r n a l \ 3 2 $ 3 5 8 0 4 3 f 6 - 0 5 1 d - 4 0 3 0 - b a 9 b - 3 c d 0 e c 2 8 3 f 6 b \ 3 3 2 \ 2 0 6 \ 3 ( \ 2 3 3 \ 2 0 7 \ 0 0 3 0 \ 3 4 4 \ 2 0 6 \ 0 0 3 8 \ 0 \ 2 0 \ 2 0 0 \ 3 0 0 \ 3 2 3 \ 3 5 6 & \ 3 0 \ 2 0 0 \ 2 4 0 \ 3 5 4 \ 3 7 2 \ 3 2 \ 2 0 0 \ 2 0 0 \ 2 1 6 \ 3 4 4 \ 4 ( \ 2 0 0 \ 2 4 0 \ 3 5 4 \ 3 7 2 \ 0 3 2 0 \ 2 3 3 \ 3 2 7 \ 3 0 4 \ 3 4 6 \ 2 4 2 ) 8 \ 1 B \ r / d e f a u l t - r a c k P \ 0 X \ 0 ` \ 0 \ 0 * \ 1 0 \ n \ 0 \ 2 2 \ 0 \ 3 2 \ 0 \ " \ 0 0 0 2 \ 1 \ 0 0 0 8 \ 1 B ' D S - 3 f a 1 3 3 e 4 - 2 b 1 7 - 4 e d 1 - a d c a - f e d 4 7 6 7 a 6 e 6 f \ 2 2 \ 2 3 7 \ 2 \ n 7 \ n ' B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 \ 2 0 \ 3 1 3 \ 2 0 1 \ 2 0 0 \ 2 0 0 \ 4 \ 3 0 \ 2 6 3 \ t \ 2 0 0 \ 2 0 0 \ 2 0 0 @ \ 2 0 \ 2 0 0 \ 2 0 0 \ 2 0 0 \ 2 0 0 \ 1 \ 3 2 \ 2 4 3 \ 1 \ n k \ n \ 0 1 6 1 0 . 2 4 0 . 1 0 9 . 2 2 4 \ 2 2 % h a d o o p - w - 0 . c . s t r a c i n g - h a d o o p . i n t e r n a l \ 3 2 $ b d 6 1 2 5 d 3 - 6 0 e a - 4 c 2 2 - 9 6 3 4 - 4 f 6 f 3 5 2 c f a 3 e \ 3 3 2 \ 2 0 6 \ 3 ( \ 2 3 3 \ 2 0 7 \ 0 0 3 0 \ 3 4 4 \ 2 0 6 \ 0 0 3 8 \ 0 \ 2 0 \ 2 0 0 \ 3 0 0 \ 3 2 3 \ 3 5 6 & \ 3 0 \ 2 0 0 \ 2 0 0 \ 3 4 2 \ 3 3 5 \ 3 5 \ 2 0 0 \ 2 4 0 \ 2 0 2 \ 2 0 1 \ 2 ( \ 2 0 0 \ 2 0 0 \ 3 4 2 \ 3 3 5 \ 0 3 5 0 \ 2 7 1 \ 3 5 3 \ 3 0 4 \ 3 4 6 \ 2 4 2 ) 8 \ 1 B \ r / d e f a u l t - r a c k P \ 0 X \ 0 ` \ 0 \ 0 * \ 1 0 \ n \ 0 \ 2 2 \ 0 \ 3 2 \ 0 \ " \ 0 0 0 2 \ 1 \ 0 0 0 8 \ 1 B ' D S - c 5 e f 5 8 c a - 9 5 c 4 - 4 5 4 d - a d f 4 - 7 c e a f 6 3 2 c 0 3 5 \ 2 2 \ 2 3 7 \ 2 \ n 7 \ n ' B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 \ 2 0 \ 3 1 4 \ 2 0 1 \ 2 0 0 \ 2 0 0 \ 4 \ 3 0 \ 2 6 4 \ t \ 2 0 0 \ 2 0 0 \ 2 0 0 @ \ 2 0 \ 2 0
  14. 1: WHERE ARE THE BLOCKS? Blocks for / w i

    k i p e d i a . c s v B y t e s | B l o c k I D | # L o c a t i o n s | H o s t n a m e s 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 5 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 6 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 7 | 1 | h a d o o p - w - 0 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 8 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 9 | 1 | h a d o o p - w - 0 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 3 0 | 1 | h a d o o p - w - 1 . . . . 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 1 3 6 | 1 | h a d o o p - w - 0 6 6 7 8 3 7 2 0 | 1 0 7 3 7 4 2 1 3 7 | 1 | h a d o o p - w - 1
  15. ( is hadoop-w-1) 2: GETTING A BLOCK 10.240.146.168 c o

    n n e c t ( 5 , { s a _ f a m i l y = A F _ I N E T , s i n _ p o r t = h t o n s ( 5 0 0 1 0 ) , s i n _ a d d r = i n e t _ a d d r ( " 1 0 . 2 4 0 . 1 4 6 . 1 6 8 " ) } , 1 6 ) = 0 s e n d t o ( 5 , " \ n K \ n > \ n 2 \ n ' B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 \ 2 0 \ 3 1 1 \ 2 0 1 \ 2 0 0 \ 2 0 0 \ 4 \ 3 0 \ 2 6 1 \ t \ 2 2 \ 1 0 \ n \ 0 \ 2 2 \ 0 \ 3 2 \ 0 \ " \ 0 \ 2 2 \ t s n a k e b i t e \ 2 0 \ 0 \ 3 0 \ 2 0 0 \ 2 0 0 \ 2 0 0 @ " , 8 4 , 0 , N U L L , 0 ) = 8 4
  16. 2: GETTING A BLOCK O p T r a n

    s f e r B l o c k P r o t o h e a d e r { b a s e H e a d e r { b l o c k { p o o l I d : " B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 " b l o c k I d : 1 0 7 3 7 4 2 0 2 5 g e n e r a t i o n S t a m p : 1 2 0 1 } } c l i e n t N a m e : " s n a k e b i t e " }
  17. 2: GETTING A BLOCK Blocks for / w i k

    i p e d i a . c s v B y t e s | B l o c k I D | # L o c a t i o n s | H o s t n a m e s 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 5 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 6 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 7 | 1 | h a d o o p - w - 0 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 8 | 1 | h a d o o p - w - 1 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 2 9 | 1 | h a d o o p - w - 0 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 0 3 0 | 1 | h a d o o p - w - 1 . . . . 1 3 4 2 1 7 7 2 8 | 1 0 7 3 7 4 2 1 3 6 | 1 | h a d o o p - w - 0 6 6 7 8 3 7 2 0 | 1 0 7 3 7 4 2 1 3 7 | 1 | h a d o o p - w - 1
  18. 2: GETTING A BLOCK r e c v f r

    o m ( 5 , " t i t l e , i d , l a n g u a g e , w p _ n a m e s p a c e , i s _ r e d i r e c t , r e v i s i o n _ i d , c o n t r i b u t o r _ i p , c o n t r i b u t o r _ i d , c o n t r i b u t o r _ u s e r n a m e , t i m e s t a m p , i s _ m i n o r , i s _ b o t , r e v e r s i o n _ i d , c o m m e n t , n u m _ c h a r a c t e r s \ n I v a n T y r r e l l , 6 1 2 6 9 1 9 , , 0 , t r u e , 2 6 4 1 9 0 1 8 4 , , 3 7 4 8 6 , O d d h a r m o n i c , 1 2 3 1 9 9 2 2 9 9 , , , , \ " A d d e d d e f a u l t s o r t t a g , c a t e g o r i e s . \ " , 2 9 8 9 \ n I n a z u m a R a i g o r \ 3 0 5 \ 2 1 5 , 9 1 2 4 4 3 2 , , 0 , , 2 2 4 4 7 7 5 1 6 , , 2 9 9 5 7 5 0 , A C S E , 1 2 1 5 5 6 4 3 7 0 , , , , / * T o p d i v i s i o n r e c o r d * / r m j a w p r e f e r e n c e , 5 5 5 7 \ n J e b B u s h , 1 8 9 3 2 2 , , 0 , , 2 9 9 7 7 1 3 6 3 , 6 6 . 1 1 9 . 3 1 . 1 0 , , , 1 2 4 6 4 8 4 8 4 6 , , , , / * S e e a l s o * / , 4 3 6 8 0 \ n T a l k : G o r a n b o y ( c i t y ) , 1 8 9 4 1 8 7 0 , , 1 , , " , 5 1 2 , 0 , N U L L , N U L L ) = 5 1 2
  19. 3: WHERE IS THE BLOCK? $ c d / h

    a d o o p / d f s / d a t a / c u r r e n t / B P - 5 7 2 4 1 8 7 2 6 - 1 0 . 2 4 0 . 9 8 . 7 3 - 1 4 1 7 9 7 5 1 1 9 0 3 6 / c u r r e n t / f i n a l i z e d $ l s - l b l k _ 1 0 7 3 7 4 2 0 2 5 - r w - r - - r - - 1 h a d o o p h a d o o p 1 3 4 2 1 7 7 2 8 D e c 8 0 2 : 0 8 b l k _ 1 0 7 3 7 4 2 0 2 5
  20. 3: WHERE IS THE BLOCK? $ h e a d

    b l k _ 1 0 7 3 7 4 2 0 2 5 t i t l e , i d , l a n g u a g e , w p _ n a m e s p a c e , i s _ r e d i r e c t , r e v i s i o n _ i d , c o n t r i b u t o r _ i p , c o n t r i b u t o r _ i d , c o n t r i b u t o r _ u s e r n a m e , t i m e s t a m p , i s _ m i n o r , i s _ b o t , r e v e r s i o n _ i d , c o m m e n t , n u m _ c h a r a c t e r s I v a n T y r r e l l , 6 1 2 6 9 1 9 , , 0 , t r u e , 2 6 4 1 9 0 1 8 4 , , 3 7 4 8 6 , O d d h a r m o n i c , 1 2 3 1 9 9 2 2 9 9 , , , , " A d d e d d e f a u l t s o r t t a g , c a t e g o r i e s . " , 2 9 8 9 I n a z u m a R a i g o r ō , 9 1 2 4 4 3 2 , , 0 , , 2 2 4 4 7 7 5 1 6 , , 2 9 9 5 7 5 0 , A C S E , 1 2 1 5 5 6 4 3 7 0 , , , , / * T o p d i v i s i o n r e c o r d * / r m j a w p r e f e r e n c e , 5 5 5 7 J e b B u s h , 1 8 9 3 2 2 , , 0 , , 2 9 9 7 7 1 3 6 3 , 6 6 . 1 1 9 . 3 1 . 1 0 , , , 1 2 4 6 4 8 4 8 4 6 , , , , / * S e e a l s o * / , 4 3 6 8 0 T a l k : G o r a n b o y ( c i t y ) , 1 8 9 4 1 8 7 0 , , 1 , , 2 3 3 0 3 3 4 5 2 , , 6 2 7 0 3 2 , O O O D D D , 1 2 1 9 2 0 0 1 1 3 , , , , t a l k p a g e t a g u s i n g [ [ P r o j e c t : A u t o W i k i B r o w s e r | A W B ] ] , 5 2
  21. INTERNALS let you manage smarter systems Now you know why

    you shouldn't set your block size to 4KB!
  22. FURTHER DIRECTIONS: strace the scheduler! see how map/reduce jobs get

    submitted! strace a map job while it's running!
  23. QUESTIONS? Julia Evans t w i t t e r

    . c o m / b 0 r k g i t h u b . c o m / j v n s j v n s . c a