Deep Learning for Health and Life Sciences with PyTorch

Deep Learning for Health and Life Sciences with PyTorch

Machine Learning (ML) continues to be increasingly important to biological and medical research. Across these diverse fields, advances in dataset size and availability, ML algorithm competence and computational power are transforming modern science. Deep Learning (DL) has been instrumental to this progress, bringing the promise to radically transform human wellness and healthcare. One of the striking advantages of DL over classical ML is its natural ability to integrate heterogeneous datasets, as well as multiple sources of information, and resolve arbitrarily complex relationships.

In this workshop, some of the most recent state-of-the-art solutions of DL for Biomedicine and Computational Biology will be presented.
The PyTorch Deep Learning framework will be used, along with the fully fledged Python data science ecosystem (e.g. pandas, numpy, scikit-learn).

The tutorial is intended for researchers interested in exploring the latest ML/DL solutions for the Health and the Life Sciences; and for practitioners who wants to learn more about the PyTorch framework.

Proficiency with the main structures of the Python language is required, plus basic knowledge of statistical learning and computational biology is ideal, but not compulsory for the tutorial.

The supplementary code material is available at:
https://github.com/leriomaggio/deep-learning-health-life-sciences

B15fbecd95ea25d06214feb4ade9d42d?s=128

Valerio Maggio

June 19, 2020
Tweet

Transcript

  1. None
  2. The Jean Golding Institute • A central hub for data

    science and data-intensive research • One of 5 University of Bristol research institutes • Connect multidisciplinary experts across the University and beyond • Events, training, funding, Ask JGI, The Alan Turing Institute Our priorities 1. Societal challenges 2. Data visualisation 3. Reproducibility & data governance 4. Fundamental research
  3. Date Event Speaker Monday 15 June Data science and COVID

    19 & Data Week Introduction Kate Robson Brown, JGI Director Monday 15 June Intermediate Python Advanced Computing Research Centre Tuesday 16 June Talk: Working at and with The Turing Institute: experiences as a Fellow Jon Crowcroft, Turing Fellow & University of Cambridge Tuesday 16 June Talk: increasing engagement with data Michael Green, Luna 9 Tuesday 16 June Introduction to data analysis in Python Advanced Computing Research Centre Wednesday 17 June Do you want to be a data Rockstar? Luke Stoughton, The Information Lab Wednesday 17 June Applied data analysis in Python Advanced Computing Research Centre Thursday 18 June Talk: New data on COVID-19 is undermined by old statistical problems Gibran Hemani, University of Bristol Thursday 18 June Managing sensitive research data: from planning to sharing Library Research Services Thursday 18 June Introduction to deep learning Advanced Computing Research Centre Friday 19 June Deep Learning for Health and Life Sciences Valerio Maggio, University of Bristol Friday 19 June Tour of the Tidyverse Max Kronborg, Mango Solutions Friday 19 June Best practices in software engineering Advanced Computing Research Centre
  4. Share your participation #Dataweekonline2020 Keep in touch @JGIBristol jgi-admin@bristol.ac.uk bristol.ac.uk/golding

  5. Deep Learning for Health and Life Sciences with PyTorch

  6. Me Pun Who? Background in CS PhD in Machine Learning

    for Software Engineering Research: ML/DL for BioMedicine
  7. Machine Learning “Machine learning is the science (and art) of

    programming computers so they can learn from data” Aurélien Géron, Hands-on Machine Learning with Scikit-Learn and TensorFlow Source: bit.ly/ml-simple-definition “(ml) focuses on teaching computers how to learn without the need to be programmed for specific tasks” S. Pal & A. Gulli, Deep Learning with Keras
  8. Machine Learning Machine learning teaches machines how to carry out

    tasks by themselves. It is that simple. The complexity comes with the details Louis Pedro Coelho, Building Machine Learning Systems with Python (and that’s probably one of the reason why you’re here :)
  9. The (Machine) Learning is about DATA Data are one of

    the most important part of a ml solution Importance: Data >> Model ? Learning by examples Data Preparation is crucial! data algorithms
  10. BioMedicine: another data case ? Contemporary Life Science is about

    data recent advances in sequencing techs and instruments (e.g. “bio-images”) huge datasets generated at incredible pace from human observation to data analysis cheminformatics (drug discovery) Research Impact —> Social and Human Impact
  11. Why Deep Learning, btw? Subset of ML w/ very specific

    model: (Deep?) Neural Networks State of the art Theory ’50 / ’80 hw acceleration to train (~new) learning structure + composability (2018/20)
  12. What about Deep Learning

  13. Deep Learning A multi-layer feed-forward neural network that starts w/

    an input layer fully connected, which is followed by multiple hidden layer of non-linear transformation
  14. More details…

  15. More details… ReLu | sigmoid | tanh

  16. More details… Repeat for each layer…

  17. More details… Image Classification Task

  18. More details… Summary: A Neural Network is: • Built from

    layers; each of which is: • a matrix multiplication, • then add bias • then apply non-linearity Learn values for parameters; W and b (for each layer using Back-Propagation)
  19. Machine Learning for dummies (a.k.a. ML explained to computer scientists)

    Note: I *am* a computer scientist + Matrix Multiplication Random Number Generation Machine Learning = ( ) t≅2k Deep
  20. Ml / Dl basics in a NutShell

  21. features labels (raw) data ML/DL Model Training Trained Model (unseen)

    data Test Predictions Supervised learning supervision
  22. features labels (raw) data ML/DL Model Training Trained Model (unseen)

    data Test Similarities/likelihood UnSupervised learning
  23. labels (raw) data DL Model Training Trained Model (unseen) data

    Test Predictions supervision features Deep Supervised learning
  24. Supervised Training Loop labels (raw) data Model Parameters Loss loss

    predictions
  25. Supervised Training Loop breakdown.. (raw) Data - a.k.a. Observations /

    Input Items about which we want to predict something. We usually will denote observation with x. Labels - a.k.a. Targets (i.e. Ground Truth) Labels corresponding to observations. These are usually the things being predicted. Following standard notations in ML/DL, we will use y to refer to these. Model f(x) = ˆy A mathematica expression or a function that takes an observation x and predicts the value of its target label. Predictions - a.k.a. Estimates: Values of the Targets generated by the model - usually referred to as ˆy Parameters - a.k.a. Weights (in DL terminology) Parameters of the Model. We will refer to them using the w. Loss Function L(y, ˆy): Function that compares how far off a prediction is from its target for observations in the training data. The loss function assigns a scalar real value called the loss. The lower the value of the loss, the better the model is predicting. The Loss is usually referred to as L Source:D. Rao et al. - Natural Language Processing with PyTorch, O’Reilly 2019
  26. Python has its say Machine Learning Deep Learning “There should

    be one, and preferably one, way to do it” The Zen of Python
  27. Multiple Frameworks?

  28. If someone tells you …

  29. Deep Learning Frameworks Static Graph Dynamic Graph X b W

    * + σ xTW + b (xTW + b) σ Computational Graph Models Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + y1 L y’ fc2 fc3 fc4 fc5 fc1 X1 epoch 1, batch 1 + + L y’ fc2 fc3 fc4 fc5 fc1
  30. Deep Learning Frameworks Static Graph Dynamic Graph X b W

    * + σ xTW + b (xTW + b) σ Computational Graph Models Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + L y’ fc2 fc3 fc4 fc5 fc1 + + y2 L y’ fc2 fc3 fc4 fc5 fc1 X2 epoch 1, batch 2
  31. Deep Learning Frameworks Static Graph Dynamic Graph X b W

    * + σ xTW + b (xTW + b) σ Backwards and Gradients Calculation Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + y L y’ fc2 fc3 fc1 X fc5 fc4 Backprop Autograd Record
  32. Deep Learning Frameworks Static Graph Dynamic Graph X b W

    * + σ xTW + b (xTW + b) σ Backwards and Gradients Calculation Linear (or Dense) + + y L y’ fc1 fc2 fc3 fc4 fc5 + + y L y’ fc2 fc3 fc1 X fc5 fc4 Record Replay Backprop Autograd &
  33. rundown review of summary you should already know

  34. Tensors, NumPy, Devices Numpy-like API tensor -> ndarray tensor <-

    ndarray CUDA support
  35. Neural Module subclassing Definition of layers (i.e. tensors) Definition of

    graph (i.e. network)
  36. Loss and Gradients optimiser criterion & loss backprop & update

  37. Dataset and DataLoader transformers Dataset DataLoader

  38. So….

  39. Deep learning Repository: https://github.com/leriomaggio/deep-learning-health-life-sciences mybinder Colaboratory (more on this in

    a few minutes) with PyTorch
  40. Deep Learning Course Approach: Data Scientist Always prefer dev/practical aspects

    (tools & sw) Work on full pipeline (e.g. data preparation) Emphasis on the implementation Perspective: Researcher No off-the-shelf (so no “black-box”) solutions” References and Further Readings to know more features
  41. Deep learning for BioMedicine Introduction to ML and DL for

    the Health and Life Science Short introduction to PyTorch BioImages Diabetic Retinopathy from fundus images Histopathological Images and Transfer Learning Few Notes on Model Interpretability follow up: more materials added later in the next days :) Outline at a glance
  42. Share your participation #Dataweekonline2020 Keep in touch @JGIBristol jgi-admin@bristol.ac.uk bristol.ac.uk/golding