John Fries - Pandas - PyDSLA meetup - Nov 2014

Slide 1

Slide 1 text

Data Munging with Pandas v0.01

Slide 37

Slide 37 text

Random Points ZEN: pandas is really an extension of the python language, or at least of it’s core data structures: list and dict ZEN: OrderedDict kind of sucks ZEN: try using pandas as a sql replacement if your dataset can fit into memory ZEN: fundamentally, pandas likes to talk lists. if you can understand how pandas is extending python’s indexing methods to use lists, you are on your way to experiencing the zen of pandas ZEN: when we say pandas is built on numpy, consider that numpy primarily supports integer indexing...like a python list does. pandas supports much broader datatypes for indexing (strings, datetimes, tuples, etc) START:ipython notebook is awesome. ipython notebook is viral Dataframe, series: you can think of a DataFrame as a series of dicts, which all share the same index. however, in practice, I visualize a DataFrame as a table, and a Series as a list. PLOT: often the goal is to model and predict/explain, more often, for me, the goal is to visualize. I would even say that if you can’t visualize it, your chances of explaining it are pretty poor CLEAN: sometimes you want to fix broken rows, but more often than you might think you should just drop the nans and outliers. Just check how many there are first! CONFORM: reindexing is confusing because you have to understand this notion of *index*, which took me awhile to grok. not like a sql index! you could think of reindexing and resampling as examples of conforming ROTATE: pivot() doesn’t like NaNs, so often you want to dropna() SELECT: I’m not saying that .loc is always right or elegant, but if you are getting started it is always there and it always works. BIN: you could think of binning as a special case of grouping if you really wanted to JOIN: use df.merge don’t worry about df.join() until you understand df.merge() merge probably should have been called join. df.merge can be pretty confusing, compared to SQL syntax, but it provides equivalent functionality MAP: map, apply, applymap (I don’t use mapapply all that much because column types are different) Could map and apply have been called the same thing?

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Slide 21

Slide 21 text

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text