Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HOWTO Make Your Future Data Science Team Love You

Sasha Laundy
November 24, 2014

HOWTO Make Your Future Data Science Team Love You

My talk from PyData NYC 2014. For more info on the talk, go to http://blog.sashalaundy.com/talks/data-audit/

There's a lot of the talk that is not in these slides—the video will make much more sense.

Sasha Laundy

November 24, 2014
Tweet

More Decks by Sasha Laundy

Other Decks in Technology

Transcript

  1. $ cat logs.csv uid, name, gender, join_date 1, John Smith,

    M, 01/01/2001 2, Sarah Stein, F, 02/02/2002 3, Lee Jones, '', 03/03/2003
  2. $ cat logs.csv | csvcut -c 3 | sort |

    uniq -c 598 F 6254 M 1 gender 321 ''
  3. $ cat logs.csv | csvcut -c 3 | sort |

    uniq -c 598 F 6254 M 1 gender 321 '' 90% male :(
  4. 11201 $95,000 10038 $35,000 11456 $65,000 Noise 10014 Rats 11218

    No heat 11201 Census Income Data 3-1-1 Calls
  5. 11201 $95,000 10038 $35,000 11456 $65,000 Noise 10014 Rats 11218

    No heat 11201 Census Income Data 3-1-1 Calls
  6. 11201 $95,000 No heat 11219 $75,000 Bagel ! theft 10095

    $35,000 Pigeon attack Joined table
  7. $ cat /tmp/data | histogram.py # NumSamples = 29; Max

    = 10.00; Min = 1.00 # Mean = 4.379310; Variance = 5.131986; SD = 2.265389 # each * represents a count of 1 1.0000 - 1.9000 [ 1]: * 1.9000 - 2.8000 [ 5]: ***** 2.8000 - 3.7000 [ 8]: ******** 3.7000 - 4.6000 [ 3]: *** 4.6000 - 5.5000 [ 4]: **** 5.5000 - 6.4000 [ 2]: ** 6.4000 - 7.3000 [ 3]: *** 7.3000 - 8.2000 [ 1]: * 8.2000 - 9.1000 [ 1]: * 9.1000 - 10.0000 [ 1]: *