Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Human Cloning: The Data Scientist Bottleneck Resolved

Human Cloning: The Data Scientist Bottleneck Resolved

Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12

Data Science London

July 03, 2012
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. 0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012

    2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
  2. By 2018, the United States alone could face a shortage

    of 140,000 to 190,000 data people... Friday, 24 February 2012
  3. MAYBE WE CAN JUST.... • 1 statistician + 1 developer

    ≈ 1 data scientist? Friday, 24 February 2012
  4. HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4

    Data Scientists? Friday, 24 February 2012
  5. WHAT CAN WE DO? • Train more new data scientists

    (not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
  6. DOING MORE • simplify (fob the work off) • automate

    (fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
  7. AUTOMATE / PARALLELISE Lots of jobs at once Job 1

    Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
  8. TOOLS • something thats allows fast iteration i.e. not java

    • R, ruby, python Friday, 24 February 2012
  9. ITERATE • try different things • improve what works •

    dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012