Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Berkeley Institute for Data Science: a place for people like us

The Berkeley Institute for Data Science: a place for people like us

Video of presentation: https://www.youtube.com/watch?v=q5yAy4WWTyU

BIDS was created as a novel environment for Data Science, in collaboration with the University of Washington in Seattle and NYU, with funding from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation.

Fernando Perez

July 09, 2014
Tweet

More Decks by Fernando Perez

Other Decks in Science

Transcript

  1. BIDS: Berkeley Institute for
    Data Science
    “a place for people like us”
    Fernando Perez
    @fperez_org

    View Slide

  2. “People like us”?
    (a careers problem in academia)
    • Folks at the intersection of domain science, methods and
    software.
    • We actually care about our code
    – Not just about our results/publication count.
    • We often belong to the Department of Connective Tissue
    – Good luck getting tenure in that one!
    • The Big Data Brain Drain: Why Science is in Trouble.
    – Great blog post by Jake van der Plas.

    View Slide

  3. A ‘home’ for people like us?
    Credit: @ananelson of dexy fame.

    View Slide

  4. Raise your hand if…
    Your dept. chair/dean/provost
    • Loves that you write lots of code and therefore less
    papers.
    • Loves that your papers have many authors from
    different departments and you’re lost in the middle.
    • Loves that the journals you’ve published in seems like
    a random collection of unrelated topics.
    • Encourages you to spend more time on mailing
    lists/github/hipchat/IRC helping random strangers.
    • …

    View Slide

  5. Catalyst: Moore/Sloan Initiative
    • 3 chosen out of 15
    LOIs invited
    • 5 Yrs, $38MM
    • Announced at White
    House OSTP event
    Nov 2013

    View Slide

  6. U Washington eScience: SciPy’14 Sponsors!
    • Sarah Stone
    • Bill Howe
    • Jake van der Plas

    View Slide

  7. NYU Center for Data Science: cds.nyu.edu

    View Slide

  8. •Moore/Sloan Initiative: Core goals
    •Support meaningful collaborations between
    •Methodology fields: Comp Sci, Stats, Applied Math
    •Science domains
    •Establish sustainable Data Science career paths
    •A new generation of multi-disciplinary scientists
    •A new generation of data scientists focused on tool development
    •Build an ecosystem of tools and research practices
    •Sustainable, reusable, extensible
    •Effective as scientific research tools
    8

    View Slide

  9. Initial Data Science Faculty Group
    Joshua Bloom, Professor, Astronomy;
    Director, Center for Time Domain
    Informatics
    Henry Brady, Dean, Goldman School of
    Public Policy
    Cathryn Carson, Associate Dean,
    Social Sciences; Acting Director of Social
    Sciences Data Laboratory "D-Lab”
    David Culler, Chair, EECS
    Michael Franklin, Professor; EECS, Co-
    Director, AMP Lab
    Erik Mitchell, Associate University
    Librarian
    •Faculty Lead/PI: Saul Perlmutter, Physics, Berkeley Center for Cosmological Physics
    •Fernando Perez, Researcher, Henry H.
    Wheeler Jr. Brain Imaging Center
    •Jasjeet Sekhon, Professor, Political
    Science and Statistics; Center for Causal
    Inference and Program Evaluation

    Jamie Sethian, Professor, Mathematics
    •Kimmen Sjölander, Professor,
    Bioengineering, Plant and Microbial
    Biology
    •Philip Stark, Chair, Statistics

    Ion Stoica, Professor, EECS; Co-Director,
    AMP Lab

    View Slide

  10. Berkeley Institute for Data
    Science (BIDS)
    Relevance across the campus
    suggests need for central
    location that will serve as home
    for data science efforts
    Doe Library
    Enhancing strengths of
    • Simons Institute for the
    Theory of Computing
    • D-Lab (Barrows)
    • AMP Lab (EECS)
    • CITRIS
    • SDAV Institute (LBL)
    • Urban Analytics Lab
    • etc.

    View Slide

  11. Doe Memorial Library
    Doe Memorial Library at
    the heart of the UC
    Berkeley campus will be
    the new home of the
    Berkeley Institute for Data
    Science (BIDS).
    The campus has set aside
    5,000 sq ft on the ground
    floor directly accessible
    from the building’s north
    entrance and opposite to
    the historical Morrison
    Reading Room.

    View Slide

  12. Exec. Director and Data Science Fellows
    Exec. Director: Kevin Koy
    • Berkeley Geospatial Innovation Facility (GIF)
    • Numpy, Pandas, Django tools and REST APIs for GIS in research.
    Fellows affiliated with this community (15 total):
    • Katy Huff (SciPy)
    • Dav Clark (presented Tuesday)
    • Kyle Barbari (AstroPy)
    • Karthik Ram (Software Carpentry, ROpenSci)
    • Justin Kitzes (Software Carpentry)

    View Slide

  13. 13
    Applied Math
    /
    Working Groups as Bridges
    The collaboration model across NYU-Berkeley-UW
    • Software Tools and
    Environments
    • Reproducible Research
    and Open Science
    • Education and Training
    • Ethnography and
    Evaluation
    • Career Paths
    • Space and Culture

    View Slide

  14. Software Tools and Environments
    • Tools for open, reproducible data science:
    – Python, Julia, R, Scala, etc.
    • Our three universities:
    – Deep expertise we don’t always engage.
    – Space and resources.
    – Complement github/OSS models.
    •14

    View Slide

  15. Reproducible Research and Open
    Science
    •15
    • Incentive models
    –funding, publication
    • Build tools and practices.
    • Ask the right epistemological questions
    (cf. Lorena)

    View Slide

  16. Education
    •16
    • What are the conceptual foundations of
    data science?
    • How do we teach them?
    • Are there new programs
    (undergrad/grad) or just courses?

    View Slide

  17. Career Paths
    •17
    • Stable, rewarding, competitive.
    • New criteria for faculty
    tenure/promotion & scholarship?
    • New kinds of faculty FTEs?
    • New kinds of Research Software
    Engineer positions?

    View Slide

  18. Ethnography and Evaluation
    •18
    • Study the process itself of doing data
    science.
    • This project seeks institutional change.
    Talk to Dav Clark and Seb Benthall.

    View Slide

  19. Industry Outreach
    • Problems that cross the industry/academia lines
    • Data/resources are often in industry
    – Though see Facebook’s recent experience…
    • Not enough academic jobs:
    – Many of our students will go to industry
    – Good collaboration and recruitment opportunities.
    •19

    View Slide

  20. Questions for discussion
    • Is “Data Science” actually “a thing” in science?
    • Is something like BIDS the right intellectual &
    institutional space for many folks from the
    SciPy community?
    • How can we (BIDS) better engage and support
    the SciPy community?

    View Slide

  21. For more context, see this blog post
    An ambitious experiment in Data
    Science takes off: a biased, Open
    Source view from Berkeley

    View Slide