Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Co-evolution of algorithms and data in biology

Co-evolution of algorithms and data in biology

While every cell in your body has (nearly) the same DNA, different cells do radically different things! Neurons in your brain and muscle cells in your heart use different genes to perform their specialized tasks. New technologies have emerged to measure which genes are expressed in individual cells, and the throughput of these technologies has created data analysis and storage problems that make the Human Genome Project look like, so 90s. Unlike then, biologists have leveraged the open-source data science community and quickly adopted common machine learning techniques. Many open questions remain: How can we recover the sparse signal generated by the low detection limits of the technology? What clustering algorithm should we use to determine how many cell types there are? What classifier should we use to predict cell identity? What dimensionality reduction algorithm should we use to get coherent groups of genes acting together in concert? The field is undergoing a period of rapid expansion and iteration.In this talk I will review the strides already made by the biological community in defining data and analysis standards, the challenges that remain, and the implications of the new wave of “big data” biology in human health.


Olga Botvinnik

November 03, 2017


  1. @ODSC Co-evolution of algorithms and data in biology Olga Botvinnik,

    PhD Chan Zuckerberg Biohub San Francisco | November 2nd - 4th 2017 twitter, github: @olgabot olgabotvinnik.com
  2. None
  3. None
  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. → → ➜

  14. None
  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. Macosko et al, Cell (2015)

  30. None
  31. None
  32. None
  33. None
  34. None
  35. None
  36. None
  37. None
  38. None
  39. None
  40. None
  41. Don’t have enough labeled data to create a robust classifier

  42. None
  43. None
  44. None
  45. None
  46. None
  47. None
  48. None
  49. None
  50. Birth of an RNA Maturation

  51. Lopez Camarillo et al, WIREs RNA (2014)

  52. None
  53. None
  54. None
  55. None
  56. None
  57. twitter, github: @olgabot www: olgabotvinnik.com email: olga@olgabotvinnik.com

  58. None
  59. Viable lengths: 300-1000 nucleotides

  60. None