Slide 1

Slide 1 text

Building a Data Science Team What NLP and Machine Learning can teach us #MLParis - @chris_bour Paris Machine Learning Applications Group – November 12, 2014 – Paris, France Christophe Bourguignat – AXA Data Innovation Lab

Slide 2

Slide 2 text

Building a Data Science Team Option 1 : hire an army of Super Heroes

Slide 3

Slide 3 text

« A Great Data Science Team Is Like A Jazz Quartet » Harvard Business Review

Slide 4

Slide 4 text

« A Great Data Science Team Is Like A Jazz Quartet » Harvard Business Review The machine learning god

Slide 5

Slide 5 text

« A Great Data Science Team Is Like A Jazz Quartet » Harvard Business Review The software development hero The machine learning god

Slide 6

Slide 6 text

« A Great Data Science Team Is Like A Jazz Quartet » Harvard Business Review The software development hero The machine learning god The IT infrastructure guru

Slide 7

Slide 7 text

« A Great Data Science Team Is Like A Jazz Quartet » Harvard Business Review The software development hero The machine learning god The data business rock star The IT infrastructure guru

Slide 8

Slide 8 text

Idea • Collect several profiles on Linkedin • Manually label each profile among 4 (unordered !) classes : • The machine learning god • The software development hero • The IT infrastructure guru • The data business rock star • Do NLP (Natural Langage Processing) and ML (Machine Learning)

Slide 9

Slide 9 text

IPython Notebook Download : https://github.com/christophebourguignat

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Machine Learning God Software Development Hero IT Infrastructure Guru Data Business Rock Star Python Algorithm Software Data Analytics Data Science Management Customer Solution Project Development Architecture Machine Learning Statistical Big Data Value User Integration Java Application

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Lessons learned • With a quite simple model, it’s possible to cluster the 4 profiles • Typical terms for each profile are aligned with our intuition • “IT architecture” only, or “Business” only profiles are uncommon. They are often mixed with an other profile • Some profiles are difficult to discriminate, as they are in-between (also in line with intuition) : • IT architecture and software development • IT architecture and business

Slide 16

Slide 16 text

Improvement suggestions • Use much more Linkedin profiles to train the model ! • Cross-validate to find the best model and its meta-parameters • Make an API to allow anybody to try with its own profile

Slide 17

Slide 17 text

The Sexiest Job Of The 21th Century

Slide 18

Slide 18 text

Twitter : @chris_bour Blog : https://medium.com/@chris_bour Notebooks : https://github.com/christophebourguignat Decks : https://speakerdeck.com/kriss THANK YOU