2015, Seattle, WA Jennifer (Jenny) Bryan Dept. of Statistics & Michael Smith Laboratories, UBC [email protected] @JennyBryan http://stat545-ubc.github.io @STAT545 http://www.stat.ubc.ca/~jenny/ @jennybc
http://jakevdp.github.io/blog/2013/10/26/big-data-brain-drain/ in a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research
least) Statistics for High Dimensional Biology grad course at UBC since 2001 w/ R. Gottardo, P. Pavlidis, G. Cohen-Freue, S. Mostafavi Software Carpentry, Data Carpentry, Reproducible Science since 2012
• 9 courses (free or $49 signature track) • 1 capstone project course w/ industry partnership • Total signature track cost (modular): $490 • Each course is four weeks • Every course runs every month • Quizzes, in video quizzes, programming assignments and peer assessment projects • All content open source with permissive license on GitHub Johns Hopkins DSS via Roger Peng
Science: The End of Statistics? by Larry Wasserman https://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data science: how is it different to statistics? by Hadley Wickham http://bulletin.imstat.org/2014/09/data-science-how-is-it-different-to-statistics%E2%80%89/ Data Science, Big Data and Statistics — can we all live together? by Terry Speed http://www.chalmers.se/en/areas-of-advance/ict/calendar/Pages/Terry-Speed.aspx
data analysis supported by mathematical statistics - a broader viewpoint – based on an inclusive concept of learning from data The latter course presents severe challenges as well as exciting opportunities. The former risks seeing statistics become increasingly marginal. Chambers, 1993
Representation and Transformation - Computing with Data - Data Modeling - Data Visualization and Presentation - Science about Data Science Donoho, 2015 Full recognition of the scope of GDS would require … major shifts in teaching.
to … scrape data off the web … get data from an API … parse JSON or XML utter defeat by date times text encoding fiascos ineptitude with regular expressions R scripts that consume infinite time and RAM software installation gong shows
to develop proficiency “simple” descriptive stats exploration through visualization tame data from the wild, including the web + APIs readiness for open science and automation create an R package alpha to omega: raw data to a web page or app STAT 545 now
about all this stuff living outside the regular academic envelope Do we signal it isn’t that important? What are career implications for those who embrace? Are we in denial about the need to make room for this in our regular programs?