Slide 1

Slide 1 text

PANDAS A POWERFUL DATA MANIPULATION TOOL / / Emma Jason A Myers @jasonamyers

Slide 2

Slide 2 text

WHAT'S SO SPECIAL ABOUT PANDAS? 1. Tabular/Matrix 2. Data Flexibility 3. Data Manipulation 4. Time Series

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

INSTALLATION p i p i n s t a l l p a n d a s p i p i n s t a l l p a n d a s a s p d

Slide 5

Slide 5 text

PANDAS DATA STRUCTURES Series - basically an ordered dict that can be named Dataframe - A labeled two dimensional datatype

Slide 6

Slide 6 text

SERIES i m p o r t p a n d a s a s p d c o o k i e s = p d . S e r i e s ( [ ' C h o c o l a t e C h i p , ' ' P e a n u t B u t t e r , ' ' G i n g e r M o l a s s e s , ' ' O a t m e a l R a i s i n , ' ' S u g a r ' , ' O r e o ' , ] )

Slide 7

Slide 7 text

WHAT DOES IT LOOK LIKE? 0 C h o c o l a t e C h i p 1 P e a n u t B u t t e r 2 G i n g e r M o l a s s e s 3 O a t m e a l R a i s i n 4 S u g a r 5 O r e o d t y p e : o b j e c t

Slide 8

Slide 8 text

PROPERTIES > > > c o o k i e s . v a l u e s a r r a y ( [ ' C h o c o l a t e C h i p ' , ' P e a n u t B u t t e r ' , ' G i n g e r M o l a s s e s ' , ' O a t m e a l R a i s i n ' , ' S u g a r ' , ' O r e o ' ] , d t y p e = o b j e c t ) > > > c o o k i e s . i n d e x I n t 6 4 I n d e x ( [ 0 , 1 , 2 , 3 , 4 , 5 ] , d t y p e = ' i n t 6 4 ' )

Slide 9

Slide 9 text

SPECIFYING THE INDEX c o o k i e s = p d . S e r i e s ( [ 1 2 , 1 0 , 8 , 6 , 4 , 2 ] , i n d e x = [ ' C h o c o l a t e C h i p ' , ' P e a n u t B u t t e r ' , ' G i n g e r M o l a s s e s ' , ' O a t m e a l R a i s i n ' , ' S u g a r ' , ' P o w d e r S u g a r ' ] )

Slide 10

Slide 10 text

INDEXED SERIES C h o c o l a t e C h i p 1 2 P e a n u t B u t t e r 1 0 G i n g e r M o l a s s e s 8 O a t m e a l R a i s i n 6 S u g a r 4 P o w d e r S u g a r 2 d t y p e : i n t 6 4

Slide 11

Slide 11 text

NAMING THE VALUES AND INDEXES > > > c o o k i e s . n a m e = ' c o u n t s ' > > > c o o k i e s . i n d e x . n a m e = ' t y p e ' t y p e C h o c o l a t e C h i p 1 2 P e a n u t B u t t e r 1 0 G i n g e r M o l a s s e s 8 O a t m e a l R a i s i n 6 S u g a r 4 P o w d e r S u g a r 2 N a m e : c o u n t s , d t y p e : i n t 6 4

Slide 12

Slide 12 text

ACCESSING ELEMENTS > > > c o o k i e s [ [ n a m e . e n d s w i t h ( ' S u g a r ' ) f o r n a m e i n c o o k i e s . i n d e x ] ] S u g a r 4 P o w d e r S u g a r 2 d t y p e : i n t 6 4 > > > c o o k i e s [ c o o k i e s > 1 0 ] C h o c o l a t e C h i p 1 2 N a m e : c o u n t s , d t y p e : i n t 6 4

Slide 13

Slide 13 text

DATAFRAMES d f = p d . D a t a F r a m e ( { ' c o u n t ' : [ 1 2 , 1 0 , 8 , 6 , 2 , 2 , 2 ] , ' t y p e ' : [ ' C h o c o l a t e C h i p ' , ' P e a n u t B u t t e r ' , ' G i n g e r M o l a s s e s ' , ' O a t m e a l R a i s i n ' , ' S u g ' o w n e r ' : [ ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' J a s o n ' , ' M a r v i n ' ] } )

Slide 14

Slide 14 text

c o u n t o w n e r t y p e 0 1 2 J a s o n C h o c o l a t e C h i p 1 1 0 J a s o n P e a n u t B u t t e r 2 8 J a s o n G i n g e r M o l a s s e s 3 6 J a s o n O a t m e a l R a i s i n 4 2 J a s o n S u g a r 5 2 J a s o n P o w d e r S u g a r 6 2 M a r v i n S u g a r

Slide 15

Slide 15 text

ACCESSING COLUMNS > > > d f [ ' t y p e ' ] 0 C h o c o l a t e C h i p 1 P e a n u t B u t t e r 2 G i n g e r M o l a s s e s 3 O a t m e a l R a i s i n 4 S u g a r 5 P o w d e r S u g a r 6 S u g a r N a m e : t y p e , d t y p e : o b j e c t

Slide 16

Slide 16 text

ACCESSING ROWS > > > d f . l o c [ 2 ] c o u n t 8 o w n e r J a s o n t y p e G i n g e r M o l a s s e s N a m e : 2 , d t y p e : o b j e c t

Slide 17

Slide 17 text

SLICING ROWS > > > d f . l o c [ 2 : 5 ] c o u n t o w n e r t y p e 2 8 J a s o n G i n g e r M o l a s s e s 3 6 J a s o n O a t m e a l R a i s i n 4 2 J a s o n S u g a r 5 2 J a s o n P o w d e r S u g a r

Slide 18

Slide 18 text

PIVOTING > > > d f . l o c [ 3 : 4 ] . T 3 4 c o u n t 6 2 o w n e r J a s o n J a s o n t y p e O a t m e a l R a i s i n S u g a r

Slide 19

Slide 19 text

GROUPING > > > d f . g r o u p b y ( ' o w n e r ' ) . s u m ( ) c o u n t o w n e r J a s o n 4 0 M a r v i n 2

Slide 20

Slide 20 text

> > > d f . g r o u p b y ( [ ' t y p e ' , ' o w n e r ' ] ) . s u m ( ) c o u n t t y p e o w n e r C h o c o l a t e C h i p J a s o n 1 2 G i n g e r M o l a s s e s J a s o n 8 O a t m e a l R a i s i n J a s o n 6 P e a n u t B u t t e r J a s o n 1 0 P o w d e r S u g a r J a s o n 2 S u g a r J a s o n 2 M a r v i n 2

Slide 21

Slide 21 text

RENAMING COLUMNS > > > g _ s u m = d f . g r o u p b y ( [ ' t y p e ' ] ) . s u m ( ) > > > g _ s u m . c o l u m n s = [ ' T o t a l ' ] T o t a l s u m C h o c o l a t e C h i p 1 2 G i n g e r M o l a s s e s 8 O a t m e a l R a i s i n 6 P e a n u t B u t t e r 1 0 P o w d e r S u g a r 2 S u g a r 4

Slide 22

Slide 22 text

PIVOT TABLES > > > p d . p i v o t _ t a b l e ( d f , v a l u e s = ' c o u n t ' , i n d e x = [ ' t y p e ' ] , c o l u m n s = [ ' o w n e r ' ] ) O w n e r J a s o n M a r v i n t y p e C h o c o l a t e C h i p 1 2 N a N G i n g e r M o l a s s e s 8 N a N O a t m e a l R a i s i n 6 N a N P e a n u t B u t t e r 1 0 N a N P o w d e r S u g a r 2 N a N S u g a r 2 2

Slide 23

Slide 23 text

JOINING > > > d f = p i v o t _ t . j o i n ( g _ s u m ) > > > d f . f i l l n a ( 0 , i n p l a c e = T r u e ) J a s o n M a r v i n T o t a l t y p e C h o c o l a t e C h i p 1 2 0 1 2 G i n g e r M o l a s s e s 8 0 8 O a t m e a l R a i s i n 6 0 6 P e a n u t B u t t e r 1 0 0 1 0 P o w d e r S u g a r 2 0 2 S u g a r 2 2 4

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

REAL WORLD PROBLEM

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

OUR DATASOURCE 2 0 1 4 - 0 6 - 2 4 1 7 : 2 0 : 2 3 . 0 1 4 6 4 2 , 0 , 3 4 , 1 0 2 , 0 , 0 , 0 , 6 0 2 0 1 4 - 0 6 - 2 4 1 7 : 2 5 : 0 1 . 1 7 6 7 7 2 , 0 , 3 2 , 1 7 4 , 0 , 0 , 0 , 1 3 3 2 0 1 4 - 0 6 - 2 4 1 7 : 3 0 : 0 1 . 3 7 0 2 3 5 , 0 , 2 8 , 5 7 , 0 , 0 , 0 , 7 5 2 0 1 4 - 0 7 - 2 1 1 4 : 3 5 : 0 1 . 7 9 7 8 3 8 , 0 , 3 9 , 7 4 , 0 , 0 , 0 , 3 0 , 0 , 2 6 2 , 2 , 3 , 3 , 0 2 0 1 4 - 0 7 - 2 1 1 4 : 4 0 : 0 2 . 0 0 0 4 3 4 , 0 , 5 4 , 1 4 3 , 0 , 0 , 0 , 4 4 , 0 , 4 9 9 , 3 , 9 , 9 , 0

Slide 28

Slide 28 text

READING FROM A CSV d f = p d . r e a d _ c s v ( ' r e s u l t s . c s v ' , h e a d e r = 0 , q u o t e c h a r = ' \ ' ' )

Slide 29

Slide 29 text

d a t e t i m e a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . 0 2 0 1 4 - 0 6 - 2 4 1 7 : 2 0 : 2 3 . 0 1 4 6 4 2 0 3 4 . . .

Slide 30

Slide 30 text

SETTING THE DATETIME AS THE INDEX > > > d f [ ' d a t e t i m e ' ] = p a n d a s . t o _ d a t e t i m e ( d f . d a t e t i m e ) > > > d f . i n d e x = d f . d a t e t i m e > > > d e l d f [ ' d a t e t i m e ' ] a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 6 - 2 4 1 7 : 2 0 : 2 3 . 0 1 4 6 4 2 0 3 4 . . . 2 0 1 4 - 0 6 - 2 4 1 7 : 2 5 : 0 1 . 1 7 6 7 7 2 0 3 2 . . .

Slide 31

Slide 31 text

TIME SLICING > > > d f [ ' 2 0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 0 ' : ' 2 0 1 4 - 0 7 - 2 1 1 4 : 1 0 : 0 0 ' ] a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 1 . 1 5 3 7 0 6 0 2 4 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 0 : 0 1 . 3 7 2 6 2 4 0 2 4 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 5 : 0 1 . 9 1 0 8 2 7 0 3 2 . . .

Slide 32

Slide 32 text

Handling Missing Data Points > > > d f . f i l l n a ( 0 , i n p l a c e = T r u e )

Slide 33

Slide 33 text

FUNCTIONS > > > d f . s u m ( ) a b u s e _ p a s s t h r o u g h 3 9 a n y _ a b u s e _ h a n d l e d 8 1 5 3 7 h a n d l e _ b p _ m e s s a g e _ h a n d l e d 2 7 1 6 8 9 h a n d l e _ b p _ m e s s a g e _ c o r r u p t _ h a n d l e d 0 e r r o r 0 f o r w a r d _ a l l _ u n h a n d l e d 0 o r i g i n a l _ m e s s a g e _ h a n d l e d 1 3 6 1 1 6 l i s t _ u n s u b s c r i b e _ o p t o u t 7 1 d e f a u l t _ h a n d l e r _ d r o p p e d 1 3 4 2 2 8 5 d e f a u l t _ u n h a n d l e d 2 9 7 8 d e f a u l t _ o p t _ o u t _ b o u n c e 2 2 0 4 4 d e f a u l t _ o p t _ o u t 2 3 1 3 2 d e f a u l t _ h a n d l e r _ p a t t e r n _ d r o p p e d 0 d t y p e : f l o a t 6 4

Slide 34

Slide 34 text

> > > d f . s u m ( ) . s u m ( ) 1 8 7 9 8 9 1 . 0

Slide 35

Slide 35 text

> > > d f . m e a n ( ) a b u s e _ p a s s t h r o u g h 0 . 0 0 9 6 7 3 a n y _ a b u s e _ h a n d l e d 2 0 . 2 2 2 4 7 0 h a n d l e _ b p _ m e s s a g e _ h a n d l e d 6 7 . 3 8 3 1 8 5 h a n d l e _ b p _ m e s s a g e _ c o r r u p t _ h a n d l e d 0 . 0 0 0 0 0 0 e r r o r 0 . 0 0 0 0 0 0 f o r w a r d _ a l l _ u n h a n d l e d 0 . 0 0 0 0 0 0 o r i g i n a l _ m e s s a g e _ h a n d l e d 3 3 . 7 5 8 9 2 9 l i s t _ u n s u b s c r i b e _ o p t o u t 0 . 0 1 7 6 0 9 d e f a u l t _ h a n d l e r _ d r o p p e d 3 3 2 . 9 0 7 9 8 6 d e f a u l t _ u n h a n d l e d 0 . 7 3 8 5 9 1 d e f a u l t _ o p t _ o u t _ b o u n c e 5 . 4 6 7 2 6 2 d e f a u l t _ o p t _ o u t 5 . 7 3 7 1 0 3 d e f a u l t _ h a n d l e r _ p a t t e r n _ d r o p p e d 0 . 0 0 0 0 0 0 d t y p e : f l o a t 6 4

Slide 36

Slide 36 text

> > > d f [ ' 2 0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 0 ' : ' 2 0 1 4 - 0 7 - 2 1 1 4 : 1 0 : 0 0 ' ] . a p p l y ( n p . c u m s u m ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 2 1 1 3 : 5 5 : 0 1 . 1 5 3 7 0 6 0 2 4 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 0 : 0 1 . 3 7 2 6 2 4 0 4 8 . . . 2 0 1 4 - 0 7 - 2 1 1 4 : 0 5 : 0 1 . 9 1 0 8 2 7 0 8 0 . . .

Slide 37

Slide 37 text

RESAMPLING > > > d _ d f = d f . r e s a m p l e ( ' 1 D ' , h o w = ' s u m ' ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 0 7 0 3 1 7 8 . . . 2 0 1 4 - 0 7 - 0 8 1 6 5 3 6 . . . 2 0 1 4 - 0 7 - 0 9 2 6 8 5 7 . . .

Slide 38

Slide 38 text

SORTING > > > d _ d f . s o r t ( ' a n y _ a b u s e _ h a n d l e d ' , a s c e n d i n g = F a l s e ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . d a t e t i m e . . . 2 0 1 4 - 0 7 - 1 5 2 1 7 6 6 4 . . . 2 0 1 4 - 0 7 - 1 7 5 7 5 4 8 . . . 2 0 1 4 - 0 7 - 1 0 0 7 1 0 6 . . . 2 0 1 4 - 0 7 - 1 1 1 0 6 9 4 2 . . .

Slide 39

Slide 39 text

DESCRIBE > > > d _ d f . d e s c r i b e ( ) a b u s e _ p a s s t h r o u g h a n y _ a b u s e _ h a n d l e d . . . c o u n t 1 5 . 0 0 0 0 0 1 5 . 0 0 0 0 0 0 . . . m e a n 2 . 6 0 0 0 0 5 4 3 5 . 8 0 0 0 0 0 . . . s t d 5 . 7 9 1 6 2 1 8 4 8 . 7 1 6 3 5 8 . . . m i n 0 . 0 0 0 0 0 2 1 7 4 . 0 0 0 0 0 0 . . . 2 5 % 0 . 0 0 0 0 0 3 8 1 0 . 0 0 0 0 0 0 . . . 5 0 % 0 . 0 0 0 0 0 6 1 9 1 . 0 0 0 0 0 0 . . . 7 5 % 1 . 5 0 0 0 0 6 8 9 9 . 5 0 0 0 0 0 . . . m a x 2 1 . 0 0 0 0 0 7 6 6 4 . 0 0 0 0 0 0 . . .

Slide 40

Slide 40 text

OUTPUT TO CSV > > > d _ d f . t o _ c s v ( p a t h _ o r _ b u f = ' o u t p u t . c s v ' )

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

ONE MORE THING... Vincent i m p o r t v i n c e n t

Slide 43

Slide 43 text

CHARTS c h a r t = v i n c e n t . S t a c k e d A r e a ( d _ d f ) c h a r t . l e g e n d ( t i t l e = ' L e g e n d ' ) c h a r t . c o l o r s ( b r e w = ' S e t 3 ' )

Slide 44

Slide 44 text

EXAMPLE STACKED AREA

Slide 45

Slide 45 text

EXAMPLE LINE

Slide 46

Slide 46 text

EXAMPLE PIE

Slide 47

Slide 47 text

QUESTIONS JASON A MYERS / @JASONAMYERS