Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Privacy-Enhancing Data Science (SSI Fellowship 2022)

Privacy-Enhancing Data Science (SSI Fellowship 2022)

Presentation for my Software Sustainability Institute (SSI) Fellowship Application to foster the adoption of Privacy-Preserving Data Science tools and methods.

Valerio Maggio

January 13, 2022
Tweet

More Decks by Valerio Maggio

Other Decks in Research

Transcript

  1. Software Sustainability Institute
    Fellowship Programme 2022
    [email protected]
    @leriomaggio
    V
    a
    lerio M
    a
    ggio

    View Slide

  2. A little bit about myself
    🇮🇹
    Hi, I am Valerio


    My pronouns are he/him


    And I am very pleased to meet you! ☺
    I am
    🇬🇧
    Now in
    Credits @Nathan Riley “Clifton Suspension Bridge, Bristol, United Kingdom”


    Published on August 14, 2017 - Source: https://unsplash.com/photos/iOMkcADNoq8

    View Slide

  3. A bit about my professional career
    Undergraduate to PhD
    101011100110
    Research Associate - Fondazione Bruno Kessler (FBK) Trento, Italy
    Training data
    [Classifier tuning]
    Validation data
    Ranked
    biomarkers
    Classification
    model
    Internal
    training set
    Data splitting
    Internal
    validation set
    Prediction
    Performance
    evaluation
    Selected
    biomarkers
    Prediction Predicted labels
    Selected
    biomarkers
    Best model
    Repeat 10 times
    5-fold CV
    Random labels
    Random labels sanity check
    Reproducible genomics:


    DNA-Seq to enhance research in precision medicine
    DAP (Data Analysis Pipelines)
    gitlab.
    f
    bk.eu/mpba
    /phylogenetic-cnn
    /dap
    /dapper
    AI for Healthcare Grant
    kube
    f
    low-kale.github.io
    KubeCon 2021 - Keynote
    drawXORRect => draw|XOR|Rectangle
    Identi
    f
    ic
    a
    tion of Code Siblings
    Code Identi
    f
    iers Processing
    graphics
    Genomics
    Histology
    ML4SE: Machine Learning for Software Engineering Cloud RSE

    View Slide

  4. Teaching Experience
    github.com/leriomaggio/
    bristol.ac.uk/golding/get-involved/data-week-online-2020/
    Data Week Online 2020

    View Slide

  5. Open Source
    Community
    Community built on principles of diversity & inclusion
    https://speakerdeck.com/leriomaggio/

    View Slide

  6. Research Activity
    Senior Research Associate, Population Health Science, Univ. Bristol
    Source: UK Birth Cohorts as a Platform for Ground Truth in Mental
    Health Data Science O. Davis/ C. Haworth ATI Fellowship
    Platform to enable ML algorithms for


    Mental-Health Data Science


    in UK birth cohort studies
    Privacy-Preserving Machine Learning
    Aw
    a
    rded by JGI Seed-Corn Fundings 2021
    je
    a
    ngoldinginstitute.blogs.bristol.
    a
    c.uk/2021/01/07/seed-corn-funding-winner-
    a
    nnouncement/
    Member of the Writing/Doc Te
    a
    m &


    Technic
    a
    l Mentor @ Priv
    a
    te AI Series
    bristol.ac.uk/alspac/
    PPML
    PPML

    View Slide

  7. SSI Fellowship Plans

    View Slide

  8. SSI Fellowship Plans
    Why I am applying to SSI Programme ‘22
    1. Shared Interest for Sustainable Research Software Principles and
    Reproducible Science Practice


    • Being a SSI fellow will de
    fi
    nitely support me in disseminating these
    principles among researchers at the University


    2. Join a community of peers with whom I wish to collaborate, exchange
    ideas, and to learn from.

    View Slide

  9. • Privacy-Preserving Machine Learning (PPML)
    technologies have the huge potential to be the

    Data Science paradigm of the future


    • Joint e
    ff
    ort of Open Source & ML & Security
    Communities

    • I wish to disseminate the knowledge about these new
    methods and technologies among researchers


    • Focus on Reproducibility of PPML work
    fl
    ows
    SSI Fellowship Plans
    What I would like to do

    View Slide

  10. SSI Fellowship Plans
    • Develop a workshop on Reproducible PPML


    • Increase visibility by writing blog posts and short tutorials


    • Eventually aiming at submitting the material as a proposal for a new Data Carpentry Curriculum
    gather.town
    • Run at least two data carpentry-style workshops on PPML


    • Pay for hosting and cloud computing to host and run teaching materials


    • (Ideally) Having funds for some travel costs & catering for attendees


    • (More realistically)

    Purchasing professional equipment for recording (e.g. webcam)

    Host the bootcamp on remote premises (e.g. gather.town)

    View Slide

  11. Thank you very much

    for your kind attention
    Valerio Maggio
    [email protected]
    @leriomaggio

    View Slide