Am I a data scientist?

Am I a data scientist? Alyssa Frazee, Stripe @acfrazee

Where I’m coming from Math undergrad Biostatistics PhD “Machine Learning
Engineer” today Recurse Center (née Hacker School) 2010

Am I a data scientist? What do I really mean
by this question?

by this question? Could I get a job offer with a title of “data scientist?”

by this question? Am I preparing my students to be able to get job offers with a title of “data scientist?”

by this question? Could I get a job offer with a title of “data scientist?” → sometimes implicitly industry → and sometimes specifically tech

What’s “data science”?

data skills spectrum

theoretical statistics software engineering

theoretical statistics software engineering data science

understanding quantitative data building a product data science

output: numerical results output: usable software data science

Am I a statistician? points for: • Am in a
grad program called [bio]statistics • Know things about martingales and the delta method • Can explain what a p-value is and interpret linear regression coefficients points against: • Haven’t proved a theorem since 2011 • Spend more time writing bash scripts than inventing estimators • No publications in statistics journals

Or am I a data scientist? points for: • Can
program in more than one language • Actively use git & GitHub • Have written R packages and reproducible reports • Once made a web app and also a D3.js graph points against: • Not working in industry • Have never written a SQL query more complicated than select * from table • Understanding of Hadoop, Spark, and AWS is vague at best • Have never written production code

Idea! I will listen to what experts in our field
say! Camp #1: Data science is just a rebranding of applied statistics. Camp #2: Statistics and data science are overlapping. Neither is a subset of the other. Camp #3: Statistics is irrelevant to data science.

First: do I want to be a data scientist?

Second: Does it matter?

Am I on the job market? Am I hiring?

If you decide it matters: some distinguishing features

Intentionality about programming

Intentionality about programming Spending time thinking primarily about: • code
efficiency • version control • code quality (cleanliness, modularity) • documentation / usability • unit testing • systematic debugging • giving and receiving code review • and other principles of software engineering

Interest in schleppy- but-practical projects

Interest in schleppy- but-practical projects • figuring out how to
get the data you need • combining existing tools/methods in new ways • finding the simplest solution that works in practice

Focus on concrete decision-making

Focus on concrete decision-making less about inference and parameter estimation,
more about what action should be taken

Camp #1: Data science is just a rebranding of applied
statistics. Camp #2: Statistics and data science are overlapping. Neither is a subset of the other. Camp #3: Statistics is irrelevant to data science.

Perspective from the other side Camp #1: Data science is
just a rebranding of applied statistics.

just a rebranding of applied statistics. Intentionality about programming

just a rebranding of applied statistics. The day-to-day work is different!

Perspective from the other side Last month I: • wrote
Ruby, Scala, Coffeescript, and Python • fought with maven • backfilled some busted tables in our databases • investigated the mystery of why some of our cluster boxes are overworked • learned how to be on call (so I can fix some of Stripe if it breaks at 3am) • helped teach a SQL class • and did some statistics

Camp #3: Statistics is irrelevant to data science. Perspective from
the other side

Statistics and data science are overlapping. Neither is a subset
of the other. Perspective from the other side

About that identity crisis: Program intentionally and be a data
scientist, if you want!

About that identity crisis: Or don’t! Statistics is hugely important
and relevant in its own right!

• http://andrewgelman.com/2013/11/14/statistics-least- important-part-data-science/ • http://bulletin.imstat.org/2014/09/data-science-how-is-it- different-to-statistics%E2%80%89/ • https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the- 21st-century/ •
http://datascopeanalytics.com/blog/what-is-a-data-scientist/ Further reading:

Am I a data scientist?

Am I a data scientist?

More Decks by Alyssa Frazee

Other Decks in Technology

Featured

Transcript