Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HOWTO Make Your Future Data Science Team Love You

Sasha Laundy
February 19, 2015

HOWTO Make Your Future Data Science Team Love You

My talk from Strata San Jose 2015.

You'll have a better time if you watch the video, which will be on my site sasha.io.

Sasha Laundy

February 19, 2015
Tweet

More Decks by Sasha Laundy

Other Decks in Technology

Transcript

  1. "It’s worth noting the obvious: without a reliable and complete

    data flow, a Hadoop cluster is little more than a very expensive and difficult-to-assemble space heater.” ! Jay Kreps, I Heart Logs
  2. $_

  3. $ cat logs.csv uid, name, gender, join_date 1, John Smith,

    M, 01/01/2001 2, Sarah Stein, F, 02/02/2002 3, Lee Jones, '', 03/03/2003
  4. $ cat logs.csv | csvcut -c 3 | sort |

    uniq -c 598 F 6254 M 1 gender 321 ''
  5. $ cat logs.csv | csvcut -c 3 | sort |

    uniq -c 598 F 6254 M 1 gender 321 '' 90% male :(
  6. $ cat /tmp/data | histogram.py # NumSamples = 29; Max

    = 10.00; Min = 1.00 # Mean = 4.379310; Variance = 5.131986; SD = 2.265389 # each * represents a count of 1 1.0000 - 1.9000 [ 1]: * 1.9000 - 2.8000 [ 5]: ***** 2.8000 - 3.7000 [ 8]: ******** 3.7000 - 4.6000 [ 3]: *** 4.6000 - 5.5000 [ 4]: **** 5.5000 - 6.4000 [ 2]: ** 6.4000 - 7.3000 [ 3]: *** 7.3000 - 8.2000 [ 1]: * 8.2000 - 9.1000 [ 1]: * 9.1000 - 10.0000 [ 1]: *
  7. 11201 $95,000 10038 $35,000 11456 $65,000 Noise 10014 Rats 11218

    No heat 11201 Census Income Data 3-1-1 Calls
  8. 11201 $95,000 10038 $35,000 11456 $65,000 Noise 10014 Rats 11218

    No heat 11201 Census Income Data 3-1-1 Calls
  9. 11201 $95,000 No heat 11219 $75,000 Bagel ! theft 10095

    $35,000 Pigeon attack Joined table