Slide 1

Slide 1 text

SO YOU WANT TO WORK IN DATA SCIENCE Giovanni Lanzani @gglanzani

Slide 2

Slide 2 text

WHO AM I • Imported from Italy ca. 2006 • Master & PhD in Theoretical Physics in Leiden (2006-2012) • Consultant Software Quality @ KPMG (2012-2013) • Data Whisperer @ GoDataDriven (2013-2016) • Chief Science Officer @ GoDataDriven (2016-…) • Father of 5

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

WHAT DO YOU WANT • Stimulating environment • Great team • Space to experiment and to grow • Develop yourself • Learn new things • Salary (?) •

Slide 5

Slide 5 text

WHAT COMPANIES (THOUGHT THEY) WANTED • All the things big data • Predictive modeling & Advanced Analytics • Make moar money • Do all the cool things the others are doing

Slide 6

Slide 6 text

THREE KIND OF COMPANIES • Heavy R&D department • Tech company, software driven, internet first (only) • All the others • Speak about the latter (majority)

Slide 7

Slide 7 text

WHAT COMPANIES GOT • A lot of POCs • A lot of screenshots/presentations/dashboards on a laptop • Extra mouths to feed with no returns • Nice stories to tell to their network, about those screenshots and especially those dashboards • Headaches with data and infra even more scattered

Slide 8

Slide 8 text

BUT… • We got a data scientist working on trees, and forests • Neural networks • Deeply convoluted neural networks • Deep learning! • All the above, and more, is taught in popular MOOCs

Slide 9

Slide 9 text

WHAT DO COMPANIES ACTUALLY NEED • Put things into production • They don’t teach that in any data science MOOC (that I know)

Slide 10

Slide 10 text

JOB MARKET 2016: US • Ask HN: What's the state of the job market in data science and machine learning? • https://news.ycombinator.com/item?id=13232883 • The supply-demand dynamics have changed a lot in the last couple years. • Two groups: people with work experience + strong software development skills, and those without • The first group is in higher demand than ever • The second group has gotten extremely crowded [from people] […] who have completed MOOCs or bootcamps • Supply keeps growing while demand is flat or shrinking • especially as executives get burned by “data scientists” who don't know how to help them build things of value

Slide 11

Slide 11 text

JOB MARKET 2016: US • The biggest differentiator I've seen is to be able to participate in actually building production quality systems vs being proficient enough in R or python to hack together a prototype on a very small dataset

Slide 12

Slide 12 text

JOB MARKET 2017: NL • I am seeing the same things happening • We (GoDataDriven) are definitely only interested in these profiles (people who are already there, or that are getting there) • Many of our clients are in the same position

Slide 13

Slide 13 text

WHICH COMPANIES CAN REALLY DO APPLIED DATA SCIENCE These are the companies you should be aiming to work for! • Business case for TP, TN & Cost of FP and FN • Data {insert something here} should be pro grade

Slide 14

Slide 14 text

WHAT THESE COMPANIES EXPECT FROM YOU • Good software (code + non functional) • Monitor your models

Slide 15

Slide 15 text

GOOD SOFTWARE? • Testable (and tested) • Modular (otherwise you cannot test it) • DRY • Efficient • Performant • Maintainable (clear code!)

Slide 16

Slide 16 text

INTERMEZZO: BEST SOFTWARE • Demo here! Some real Python code!

Slide 17

Slide 17 text

I’VE TOLD YOU SO • https://blog.godatadriven.com/production-ready-ds • Many data scientists approach the problem at hand with a Kaggle-like mentality: delivering the best model in absolute terms, no matter what the practical implications are. • In reality it's not the best model that we implement, but the one that combines quality and practicality. • Netflix competition

Slide 18

Slide 18 text

BUT WAIT, I DON’T WANT TO DO THAT • There is a simple solution to this: companies should hire Machine Learning Engineers: help the data scientists productionizing ML/DS • The role currently doesn’t (really) exist • That means (almost) nobody has them!

Slide 19

Slide 19 text

BUT WAIT, I DON’T WANT TO DO THAT • Intermezzo • What’s your experience • You can tell me how (much) I’m wrong • Are you hiring/are you searching?

Slide 20

Slide 20 text

OK, HOW CAN I LEARN THAT • Code, code, code • If already in industry, start project/working with developers • OS contributions • Mini sales pitch, you can leave for beer already! • GoDataDriven offers the data science accelerator program • 12 + 12 (or 5 + 5) modules, with lecture + hands-on day • Hands-on day helps you put in practice how you can really use what you learn • Also productionizing your code! • /salespitch

Slide 21

Slide 21 text

OK, BUT I REALLY CAN’T LEARN THAT • Don’t despair • Find your niche (finance, biology, marine, energy, etc.) • Find a voice • Explorative Data Analysis with Story Telling • Convincing stakeholders is still one of the most important skills (not only DS)

Slide 22

Slide 22 text

TECH COMPANY, SOFTWARE DRIVEN, INTERNET FIRST • Where there’s a mature, professional, software engineering culture, the DS don’t have to worry about all the above (still about a lot though) • But they still have to code to be understood • No SE will save you from being lazy about good code • Booking.com, Bol.com, Marktplaats/eBay • Probably many others as well

Slide 23

Slide 23 text

HEAVY R&D • There the focus is much more on research • Productionizing is in the (distant) future • Domain expertise is (often) more important than ML/DS • People working there have the chops to learn ML/DS • (Hint: They don’t always do it properly)

Slide 24

Slide 24 text

FINAL NOTE • If you have no previous experience, you won’t likely be called every week with job offers • You might land a data science job, but instead of doing ML, you might end up doing ETL, glue code, write SQL, etc) • Hang in there: learn the skills to get where you want • After the first 1-2 y of experience, it’s usually downhill

Slide 25

Slide 25 text

QUESTIONS? • We’re hiring • Data scientists & Machine Learning Engineers! • [email protected]