Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Janet Matsen - Programming microbes using Python

Janet Matsen - Programming microbes using Python

The genome of a typical microbe contains roughly 5 million base pairs of DNA including > 4000 genes, which provide the instructions for cellular replication, energy metabolism, and other biological processes. At Zymergen, we edit DNA to design microbes with improved ability to produce valuable materials and molecules. Microbes with these edits are built and tested in high throughput by our fleet of robots. Genomes are far too large for exhaustive search, so identifying which edits to make requires machine learning on non-standard features. Our task to extract information from trees, networks, and graphs of independently representable knowledge bases (metabolism, genomics, regulation), in ways that respect the strongly causal relationships between systems. In this talk, I will describe how we use Python’s biological packages (e.g. BioPython, CobraPy, Escher, goatools) and other packages (NetworkX, TensorFlow, PyStan, AirFlow) to extract machine learning features and predict which genetic edits will produce high-performance microbes.

https://us.pycon.org/2018/schedule/presentation/242/

PyCon 2018

May 11, 2018
Tweet

More Decks by PyCon 2018

Other Decks in Programming

Transcript

  1. Zymergen’s high-throughput testing 105 liters 10-3 liters 101 liters design

    large #s of variants automate data collection identify statistically significant improvements predict tank performance from plates
  2. The DNA search space is huge. Genome optimization is hard

    1/2 of genes have unknown function thousands of genes millions of characters
  3. Let’s make MSG! (1909–1962) Initially MSG was extracted History of

    glutamate production. The American Journal of Clinical Nutrition, 2009. Fermentation established in 1957 (glutamate)
  4. CobraPy: a subset of reactions are modeled We’ve only observed

    one corner of the board game. models only include a fraction of the genes
  5. CobraPy: linear steady-state modeling ∴ & ∴ Not for simulations

    over time, or varying external conditions.