Slide 1

Slide 1 text

BIDS: Berkeley Institute for Data Science “a place for people like us” Fernando Perez @fperez_org

Slide 2

Slide 2 text

“People like us”? (a careers problem in academia) • Folks at the intersection of domain science, methods and software. • We actually care about our code – Not just about our results/publication count. • We often belong to the Department of Connective Tissue – Good luck getting tenure in that one! • The Big Data Brain Drain: Why Science is in Trouble. – Great blog post by Jake van der Plas.

Slide 3

Slide 3 text

A ‘home’ for people like us? Credit: @ananelson of dexy fame.

Slide 4

Slide 4 text

Raise your hand if… Your dept. chair/dean/provost • Loves that you write lots of code and therefore less papers. • Loves that your papers have many authors from different departments and you’re lost in the middle. • Loves that the journals you’ve published in seems like a random collection of unrelated topics. • Encourages you to spend more time on mailing lists/github/hipchat/IRC helping random strangers. • …

Slide 5

Slide 5 text

Catalyst: Moore/Sloan Initiative • 3 chosen out of 15 LOIs invited • 5 Yrs, $38MM • Announced at White House OSTP event Nov 2013

Slide 6

Slide 6 text

U Washington eScience: SciPy’14 Sponsors! • Sarah Stone • Bill Howe • Jake van der Plas

Slide 7

Slide 7 text

NYU Center for Data Science: cds.nyu.edu

Slide 8

Slide 8 text

•Moore/Sloan Initiative: Core goals •Support meaningful collaborations between •Methodology fields: Comp Sci, Stats, Applied Math •Science domains •Establish sustainable Data Science career paths •A new generation of multi-disciplinary scientists •A new generation of data scientists focused on tool development •Build an ecosystem of tools and research practices •Sustainable, reusable, extensible •Effective as scientific research tools 8

Slide 9

Slide 9 text

Initial Data Science Faculty Group Joshua Bloom, Professor, Astronomy; Director, Center for Time Domain Informatics Henry Brady, Dean, Goldman School of Public Policy Cathryn Carson, Associate Dean, Social Sciences; Acting Director of Social Sciences Data Laboratory "D-Lab” David Culler, Chair, EECS Michael Franklin, Professor; EECS, Co- Director, AMP Lab Erik Mitchell, Associate University Librarian •Faculty Lead/PI: Saul Perlmutter, Physics, Berkeley Center for Cosmological Physics •Fernando Perez, Researcher, Henry H. Wheeler Jr. Brain Imaging Center •Jasjeet Sekhon, Professor, Political Science and Statistics; Center for Causal Inference and Program Evaluation • Jamie Sethian, Professor, Mathematics •Kimmen Sjölander, Professor, Bioengineering, Plant and Microbial Biology •Philip Stark, Chair, Statistics • Ion Stoica, Professor, EECS; Co-Director, AMP Lab

Slide 10

Slide 10 text

Berkeley Institute for Data Science (BIDS) Relevance across the campus suggests need for central location that will serve as home for data science efforts Doe Library Enhancing strengths of • Simons Institute for the Theory of Computing • D-Lab (Barrows) • AMP Lab (EECS) • CITRIS • SDAV Institute (LBL) • Urban Analytics Lab • etc.

Slide 11

Slide 11 text

Doe Memorial Library Doe Memorial Library at the heart of the UC Berkeley campus will be the new home of the Berkeley Institute for Data Science (BIDS). The campus has set aside 5,000 sq ft on the ground floor directly accessible from the building’s north entrance and opposite to the historical Morrison Reading Room.

Slide 12

Slide 12 text

Exec. Director and Data Science Fellows Exec. Director: Kevin Koy • Berkeley Geospatial Innovation Facility (GIF) • Numpy, Pandas, Django tools and REST APIs for GIS in research. Fellows affiliated with this community (15 total): • Katy Huff (SciPy) • Dav Clark (presented Tuesday) • Kyle Barbari (AstroPy) • Karthik Ram (Software Carpentry, ROpenSci) • Justin Kitzes (Software Carpentry)

Slide 13

Slide 13 text

13 Applied Math / Working Groups as Bridges The collaboration model across NYU-Berkeley-UW • Software Tools and Environments • Reproducible Research and Open Science • Education and Training • Ethnography and Evaluation • Career Paths • Space and Culture

Slide 14

Slide 14 text

Software Tools and Environments • Tools for open, reproducible data science: – Python, Julia, R, Scala, etc. • Our three universities: – Deep expertise we don’t always engage. – Space and resources. – Complement github/OSS models. •14

Slide 15

Slide 15 text

Reproducible Research and Open Science •15 • Incentive models –funding, publication • Build tools and practices. • Ask the right epistemological questions (cf. Lorena)

Slide 16

Slide 16 text

Education •16 • What are the conceptual foundations of data science? • How do we teach them? • Are there new programs (undergrad/grad) or just courses?

Slide 17

Slide 17 text

Career Paths •17 • Stable, rewarding, competitive. • New criteria for faculty tenure/promotion & scholarship? • New kinds of faculty FTEs? • New kinds of Research Software Engineer positions?

Slide 18

Slide 18 text

Ethnography and Evaluation •18 • Study the process itself of doing data science. • This project seeks institutional change. Talk to Dav Clark and Seb Benthall.

Slide 19

Slide 19 text

Industry Outreach • Problems that cross the industry/academia lines • Data/resources are often in industry – Though see Facebook’s recent experience… • Not enough academic jobs: – Many of our students will go to industry – Good collaboration and recruitment opportunities. •19

Slide 20

Slide 20 text

Questions for discussion • Is “Data Science” actually “a thing” in science? • Is something like BIDS the right intellectual & institutional space for many folks from the SciPy community? • How can we (BIDS) better engage and support the SciPy community?

Slide 21

Slide 21 text

For more context, see this blog post An ambitious experiment in Data Science takes off: a biased, Open Source view from Berkeley