Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Pandas

Introduction to Pandas

A brief introduction to the Pandas python module.

Jason Myers

July 24, 2014
Tweet

More Decks by Jason Myers

Other Decks in Technology

Transcript

  1. INSTALLATION p i p i n s t a l

    l p a n d a s p i p i n s t a l l p a n d a s a s p d
  2. PANDAS DATA STRUCTURES Series - basically an ordered dict that

    can be named Dataframe - A labeled two dimensional datatype
  3. SERIES i m p o r t p a n

    d a s a s p d c o o k i e s = p d . S e r i e s ( [ ' C h o c o l a t e C h i p , ' ' P e a n u t B u t t e r , ' ' G i n g e r M o l a s s e s , ' ' O a t m e a l R a i s i n , ' ' S u g a r ' , ' O r e o ' , ] )
  4. WHAT DOES IT LOOK LIKE? 0 C h o c

    o l a t e C h i p 1 P e a n u t B u t t e r 2 G i n g e r M o l a s s e s 3 O a t m e a l R a i s i n 4 S u g a r 5 O r e o d t y p e : o b j e c t
  5. PROPERTIES > > > c o o k i e

    s . v a l u e s a r r a y ( [ ' C h o c o l a t e C h i p ' , ' P e a n u t B u t t e r ' , ' G i n g e r M o l a s s e s ' , ' O a t m e a l R a i s i n ' , ' S u g a r ' , ' O r e o ' ] , d t y p e = o b j e c t ) > > > c o o k i e s . i n d e x I n t 6 4 I n d e x ( [ 0 , 1 , 2 , 3 , 4 , 5 ] , d t y p e = ' i n t 6 4 ' )
  6. SPECIFYING THE INDEX c o o k i e s

    = p d . S e r i e s ( [ 1 2 , 1 0 , 8 , 6 , 4 , 2 ] , i n d e x = [ ' C h o c o l a t e C h i p ' , ' P e a n u t B u t t e r ' , ' G i n g e r M o l a s s e s ' , ' O a t m e a l R a i s i n ' , ' S u g a r ' , ' P o w d e r S u g a r ' ] )
  7. INDEXED SERIES C h o c o l a t

    e C h i p 1 2 P e a n u t B u t t e r 1 0 G i n g e r M o l a s s e s 8 O a t m e a l R a i s i n 6 S u g a r 4 P o w d e r S u g a r 2 d t y p e : i n t 6 4
  8. NAMING THE VALUES AND INDEXES > > > c o

    o k i e s . n a m e = ' c o u n t s ' > > > c o o k i e s . i n d e x . n a m e = ' t y p e ' t y p e C h o c o l a t e C h i p 1 2 P e a n u t B u t t e r 1 0 G i n g e r M o l a s s e s 8 O a t m e a l R a i s i n 6 S u g a r 4 P o w d e r S u g a r 2 N a m e : c o u n t s , d t y p e : i n t 6 4
  9. ACCESSING ELEMENTS > > > c o o k i

    e s [ [ n a m e . e n d s w i t h ( ' S u g a r ' ) f o r n a m e i n c o o k i e s . i n d e x ] ] S u g a r 4 P o w d e r S u g a r 2 d t y p e : i n t 6 4 > > > c o o k i e s [ c o o k i e s > 1 0 ] C h o c o l a t e C h i p 1 2 N a m e : c o u n t s , d t y p e : i n t 6 4
  10. DATAFRAMES d f = p d . D a t

    a F r a m e ( { ' c o u n t ' : [ 1 2 , 1 0 , 8 , 6 , 2 , 2 , 2 ] , ' t y p e ' : [ ' C h o c o l a t e C h i p ' , ' P e a n u t B u t t e r ' , ' G i n g e r M o l a s s e s ' , ' O a t m e a l R a i s i n ' , ' S u g ' o w n e r ' : [ ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' M a r v i n ' ] } )
  11. c o u n t o w n e r

    t y p e 0 1 2 J a s o n C h o c o l a t e C h i p 1 1 0 J a s o n P e a n u t B u t t e r 2 8 J a s o n G i n g e r M o l a s s e s 3 6 J a s o n O a t m e a l R a i s i n 4 2 J a s o n S u g a r 5 2 J a s o n P o w d e r S u g a r 6 2 M a r v i n S u g a r
  12. ACCESSING COLUMNS > > > d f [ ' t

    y p e ' ] 0 C h o c o l a t e C h i p 1 P e a n u t B u t t e r 2 G i n g e r M o l a s s e s 3 O a t m e a l R a i s i n 4 S u g a r 5 P o w d e r S u g a r 6 S u g a r N a m e : t y p e , d t y p e : o b j e c t
  13. ACCESSING ROWS > > > d f . l o

    c [ 2 ] c o u n t 8 o w n e r J a s o n t y p e G i n g e r M o l a s s e s N a m e : 2 , d t y p e : o b j e c t
  14. SLICING ROWS > > > d f . l o

    c [ 2 : 5 ] c o u n t o w n e r t y p e 2 8 J a s o n G i n g e r M o l a s s e s 3 6 J a s o n O a t m e a l R a i s i n 4 2 J a s o n S u g a r 5 2 J a s o n P o w d e r S u g a r
  15. PIVOTING > > > d f . l o c

    [ 3 : 4 ] . T 3 4 c o u n t 6 2 o w n e r J a s o n J a s o n t y p e O a t m e a l R a i s i n S u g a r
  16. GROUPING > > > d f . g r o

    u p b y ( ' o w n e r ' ) . s u m ( ) c o u n t o w n e r J a s o n 4 0 M a r v i n 2
  17. > > > d f . g r o u

    p b y ( [ ' t y p e ' , ' o w n e r ' ] ) . s u m ( ) c o u n t t y p e o w n e r C h o c o l a t e C h i p J a s o n 1 2 G i n g e r M o l a s s e s J a s o n 8 O a t m e a l R a i s i n J a s o n 6 P e a n u t B u t t e r J a s o n 1 0 P o w d e r S u g a r J a s o n 2 S u g a r J a s o n 2 M a r v i n 2
  18. RENAMING COLUMNS > > > g _ s u m

    = d f . g r o u p b y ( [ ' t y p e ' ] ) . s u m ( ) > > > g _ s u m . c o l u m n s = [ ' T o t a l ' ] T o t a l s u m C h o c o l a t e C h i p 1 2 G i n g e r M o l a s s e s 8 O a t m e a l R a i s i n 6 P e a n u t B u t t e r 1 0 P o w d e r S u g a r 2 S u g a r 4
  19. PIVOT TABLES > > > p d . p i

    v o t _ t a b l e ( d f , v a l u e s = ' c o u n t ' , i n d e x = [ ' t y p e ' ] , c o l u m n s = [ ' o w n e r ' ] ) O w n e r J a s o n M a r v i n t y p e C h o c o l a t e C h i p 1 2 N a N G i n g e r M o l a s s e s 8 N a N O a t m e a l R a i s i n 6 N a N P e a n u t B u t t e r 1 0 N a N P o w d e r S u g a r 2 N a N S u g a r 2 2
  20. JOINING > > > d f = p i v

    o t _ t . j o i n ( g _ s u m ) > > > d f . f i l l n a ( 0 , i n p l a c e = T r u e ) J a s o n M a r v i n T o t a l t y p e C h o c o l a t e C h i p 1 2 0 1 2 G i n g e r M o l a s s e s 8 0 8 O a t m e a l R a i s i n 6 0 6 P e a n u t B u t t e r 1 0 0 1 0 P o w d e r S u g a r 2 0 2 S u g a r 2 2 4
  21. OUR DATASOURCE 2 0 1 4 - 0 6 -

    2 4 1 7 : 2 0 : 2 3 . 0 1 4 6 4 2 , 0 , 3 4 , 1 0 2 , 0 , 0 , 0 , 6 0 2 0 1 4 - 0 6 - 2 4 1 7 : 2 5 : 0 1 . 1 7 6 7 7 2 , 0 , 3 2 , 1 7 4 , 0 , 0 , 0 , 1 3 3 2 0 1 4 - 0 6 - 2 4 1 7 : 3 0 : 0 1 . 3 7 0 2 3 5 , 0 , 2 8 , 5 7 , 0 , 0 , 0 , 7 5 2 0 1 4 - 0 7 - 2 1 1 4 : 3 5 : 0 1 . 7 9 7 8 3 8 , 0 , 3 9 , 7 4 , 0 , 0 , 0 , 3 0 , 0 , 2 6 2 , 2 , 3 , 3 , 0 2 0 1 4 - 0 7 - 2 1 1 4 : 4 0 : 0 2 . 0 0 0 4 3 4 , 0 , 5 4 , 1 4 3 , 0 , 0 , 0 , 4 4 , 0 , 4 9 9 , 3 , 9 , 9 , 0
  22. READING FROM A CSV d f = p d .

    r e a d _ c s v ( ' r e s u l t s . c s v ' , h e a d e r = 0 , q u o t e c h a r = ' \ ' ' )
  23. d a t e t i m e a b

    u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . 0 2 0 1 4 - 0 6 - 2 4 1 7 : 2 0 : 2 3 . 0 1 4 6 4 2 0 3 4 . . .
  24. SETTING THE DATETIME AS THE INDEX > > > d

    f [ ' d a t e t i m e ' ] = p a n d a s . t o _ d a t e t i m e ( d f . d a t e t i m e ) > > > d f . i n d e x = d f . d a t e t i m e > > > d e l d f [ ' d a t e t i m e ' ] a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 6 - 2 4 1 7 : 2 0 : 2 3 . 0 1 4 6 4 2 0 3 4 . . . 2 0 1 4 - 0 6 - 2 4 1 7 : 2 5 : 0 1 . 1 7 6 7 7 2 0 3 2 . . .
  25. TIME SLICING > > > d f [ ' 2

    0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 0 ' : ' 2 0 1 4 - 0 7 - 2 1 1 4 : 1 0 : 0 0 ' ] a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 1 . 1 5 3 7 0 6 0 2 4 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 0 : 0 1 . 3 7 2 6 2 4 0 2 4 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 5 : 0 1 . 9 1 0 8 2 7 0 3 2 . . .
  26. Handling Missing Data Points > > > d f .

    f i l l n a ( 0 , i n p l a c e = T r u e )
  27. FUNCTIONS > > > d f . s u m

    ( ) a b u s e _ p a s s t h r o u g h 3 9 a n y _ a b u s e _ h a n d l e d 8 1 5 3 7 h a n d l e _ b p _ m e s s a g e _ h a n d l e d 2 7 1 6 8 9 h a n d l e _ b p _ m e s s a g e _ c o r r u p t _ h a n d l e d 0 e r r o r 0 f o r w a r d _ a l l _ u n h a n d l e d 0 o r i g i n a l _ m e s s a g e _ h a n d l e d 1 3 6 1 1 6 l i s t _ u n s u b s c r i b e _ o p t o u t 7 1 d e f a u l t _ h a n d l e r _ d r o p p e d 1 3 4 2 2 8 5 d e f a u l t _ u n h a n d l e d 2 9 7 8 d e f a u l t _ o p t _ o u t _ b o u n c e 2 2 0 4 4 d e f a u l t _ o p t _ o u t 2 3 1 3 2 d e f a u l t _ h a n d l e r _ p a t t e r n _ d r o p p e d 0 d t y p e : f l o a t 6 4
  28. > > > d f . s u m (

    ) . s u m ( ) 1 8 7 9 8 9 1 . 0
  29. > > > d f . m e a n

    ( ) a b u s e _ p a s s t h r o u g h 0 . 0 0 9 6 7 3 a n y _ a b u s e _ h a n d l e d 2 0 . 2 2 2 4 7 0 h a n d l e _ b p _ m e s s a g e _ h a n d l e d 6 7 . 3 8 3 1 8 5 h a n d l e _ b p _ m e s s a g e _ c o r r u p t _ h a n d l e d 0 . 0 0 0 0 0 0 e r r o r 0 . 0 0 0 0 0 0 f o r w a r d _ a l l _ u n h a n d l e d 0 . 0 0 0 0 0 0 o r i g i n a l _ m e s s a g e _ h a n d l e d 3 3 . 7 5 8 9 2 9 l i s t _ u n s u b s c r i b e _ o p t o u t 0 . 0 1 7 6 0 9 d e f a u l t _ h a n d l e r _ d r o p p e d 3 3 2 . 9 0 7 9 8 6 d e f a u l t _ u n h a n d l e d 0 . 7 3 8 5 9 1 d e f a u l t _ o p t _ o u t _ b o u n c e 5 . 4 6 7 2 6 2 d e f a u l t _ o p t _ o u t 5 . 7 3 7 1 0 3 d e f a u l t _ h a n d l e r _ p a t t e r n _ d r o p p e d 0 . 0 0 0 0 0 0 d t y p e : f l o a t 6 4
  30. > > > d f [ ' 2 0 1

    4 - 0 7 - 2 1 1 3 : 5 5 : 0 0 ' : ' 2 0 1 4 - 0 7 - 2 1 1 4 : 1 0 : 0 0 ' ] . a p p l y ( n p . c u m s u m ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 1 . 1 5 3 7 0 6 0 2 4 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 0 : 0 1 . 3 7 2 6 2 4 0 4 8 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 5 : 0 1 . 9 1 0 8 2 7 0 8 0 . . .
  31. RESAMPLING > > > d _ d f = d

    f . r e s a m p l e ( ' 1 D ' , h o w = ' s u m ' ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 0 7 0 3 1 7 8 . . . 2 0 1 4 - 0 7 - 0 8 1 6 5 3 6 . . . 2 0 1 4 - 0 7 - 0 9 2 6 8 5 7 . . .
  32. SORTING > > > d _ d f . s

    o r t ( ' a n y _ a b u s e _ h a n d l e d ' , a s c e n d i n g = F a l s e ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 1 5 2 1 7 6 6 4 . . . 2 0 1 4 - 0 7 - 1 7 5 7 5 4 8 . . . 2 0 1 4 - 0 7 - 1 0 0 7 1 0 6 . . . 2 0 1 4 - 0 7 - 1 1 1 0 6 9 4 2 . . .
  33. DESCRIBE > > > d _ d f . d

    e s c r i b e ( ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . c o u n t 1 5 . 0 0 0 0 0 1 5 . 0 0 0 0 0 0 . . . m e a n 2 . 6 0 0 0 0 5 4 3 5 . 8 0 0 0 0 0 . . . s t d 5 . 7 9 1 6 2 1 8 4 8 . 7 1 6 3 5 8 . . . m i n 0 . 0 0 0 0 0 2 1 7 4 . 0 0 0 0 0 0 . . . 2 5 % 0 . 0 0 0 0 0 3 8 1 0 . 0 0 0 0 0 0 . . . 5 0 % 0 . 0 0 0 0 0 6 1 9 1 . 0 0 0 0 0 0 . . . 7 5 % 1 . 5 0 0 0 0 6 8 9 9 . 5 0 0 0 0 0 . . . m a x 2 1 . 0 0 0 0 0 7 6 6 4 . 0 0 0 0 0 0 . . .
  34. OUTPUT TO CSV > > > d _ d f

    . t o _ c s v ( p a t h _ o r _ b u f = ' o u t p u t . c s v ' )
  35. CHARTS c h a r t = v i n

    c e n t . S t a c k e d A r e a ( d _ d f ) c h a r t . l e g e n d ( t i t l e = ' L e g e n d ' ) c h a r t . c o l o r s ( b r e w = ' S e t 3 ' )