Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Data Science Team

Christophe Bourguignat
November 12, 2014
400

Building a Data Science Team

Paris Machine Learning Meetup - 12/11/2014

Christophe Bourguignat

November 12, 2014
Tweet

Transcript

  1. Building a Data Science Team What NLP and Machine Learning

    can teach us #MLParis - @chris_bour Paris Machine Learning Applications Group – November 12, 2014 – Paris, France Christophe Bourguignat – AXA Data Innovation Lab
  2. « A Great Data Science Team Is Like A Jazz

    Quartet » Harvard Business Review
  3. « A Great Data Science Team Is Like A Jazz

    Quartet » Harvard Business Review The machine learning god
  4. « A Great Data Science Team Is Like A Jazz

    Quartet » Harvard Business Review The software development hero The machine learning god
  5. « A Great Data Science Team Is Like A Jazz

    Quartet » Harvard Business Review The software development hero The machine learning god The IT infrastructure guru
  6. « A Great Data Science Team Is Like A Jazz

    Quartet » Harvard Business Review The software development hero The machine learning god The data business rock star The IT infrastructure guru
  7. Idea • Collect several profiles on Linkedin • Manually label

    each profile among 4 (unordered !) classes : • The machine learning god • The software development hero • The IT infrastructure guru • The data business rock star • Do NLP (Natural Langage Processing) and ML (Machine Learning)
  8. Machine Learning God Software Development Hero IT Infrastructure Guru Data

    Business Rock Star Python Algorithm Software Data Analytics Data Science Management Customer Solution Project Development Architecture Machine Learning Statistical Big Data Value User Integration Java Application
  9. Lessons learned • With a quite simple model, it’s possible

    to cluster the 4 profiles • Typical terms for each profile are aligned with our intuition • “IT architecture” only, or “Business” only profiles are uncommon. They are often mixed with an other profile • Some profiles are difficult to discriminate, as they are in-between (also in line with intuition) : • IT architecture and software development • IT architecture and business
  10. Improvement suggestions • Use much more Linkedin profiles to train

    the model ! • Cross-validate to find the best model and its meta-parameters • Make an API to allow anybody to try with its own profile