Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interpreting DNA with Python

Interpreting DNA with Python

These are the slides from a 10-minute lightning talk I did on interpreting DNA with Python. I talked about how I built my own 23andMe parser, which is very fast, and then how one can infer simple phenotypes from genotypes using Python.

Link to MeetUp:
http://www.meetup.com/Stavanger-Software-Developers-Meetup/events/220440594/

Link to GitHub code:
https://github.com/cslarsen/dna-traits

Illustration credits:
Kelvin Song, https://commons.wikimedia.org/wiki/User:Kelvinsong

Photo credits:
Darthmouth College, http://remf.dartmouth.edu/Blood_cells_SEM/

Christian Stigen Larsen

March 17, 2015
Tweet

More Decks by Christian Stigen Larsen

Other Decks in Science

Transcript

  1. RSID Chromosome Position Genotype rs4477212 1 82154 AA rs3094315 1

    752566 AA i4000827 MT 14478 C rs12116415 Y 9938087 T rs6526110 X 151603258 AG
  2. Blazing fast parsing • C++ • Packed structs w/bit-fields •

    Memory maps • Forward-only reads • Dense hash map • Exposed through Python
  3. csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb

    112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s
  4. from dna_traits.match import unphased_match import dna_traits as dna import sys

    genome = dna.parse(sys.argv[1]) print( unphased_match(genome.rs713598, { "CC": "Probably can't taste certain bitter flavours", "CG": "Can taste bitter flavours that others can't", "GG": "Can taste bitter flavours that others can’t"})) from dna_traits.match import unphased_match import dna_traits as dna import sys genome = dna.parse(sys.argv[1]) print( unphased_match(genome.rs713598, { "CC": "Probably can't taste certain bitter flavours", "CG": "Can taste bitter flavours that others can't", "GG": "Can taste bitter flavours that others can’t"}))
  5. unphased_match(genome.rs713598, { "CC": "Probably can't taste certain bitter flavours", "CG":

    "Can taste bitter flavours that others can't", "GG": "Can taste bitter flavours that others can’t"}))
  6. def earwax_type(genome): "Earwax type." return unphased_match(genome.rs17822931, { "CC": "Wet earwax

    (sticky, honey-colored)", "CT": "Wet earwax (sticky, honey-colored)", "TT": "Dry earwax (flaky, pale)", None: "Unable to determine"})
  7. def caffeine_metabolism(genome): """Caffeine metabolism.""" assert_european(genome) return unphased_match(genome.rs762551, { "AA": "Fast

    metabolizer", "AC": "Slow metabolizer", "CC": "Slow metabolizer", None: "Unable to determine"})
  8. def pain_sensitivity(genome): """Pain sensitivity."" return unphased_match(genome.rs6269, { "AA": "Increased sensitivity

    to pain", "AG": "Typical sensitivity to pain", "GG": "Less sensitive to pain", None: "Unable to determine"})
  9. def blood_glucose(genome): """Blood glucose.""" assert_european(genome) return unphased_match(genome.rs560887, { "CC": "Avg

    fasting plasma glucose levels of 5.18mmol/L", "CT": "Avg fasting plasma glucose levels of 5.12mmol/L", "TT": "Avg fasting plasma glucose levels of 5.06mmol/L", None: "Unable to determine"})
  10. def heroin_addiction(genome): """Heroin addiction.""" assert_european(genome) return unphased_match(genome.rs1799971, { "AA": "Typical

    odds of addiction", "AG": "Higher odds of addiction", "GG": "Higher odds of addiction", None: "Unable to determine"})
  11. variants = { "CCCC": "e4/e4", "CCCT": "e1/e4", "CCTT": "e1/e1", "CTCC":

    "e3/e4", "CTCT": "e1/e3 or e2/e4", # ambiguous "CTTT": "e1/e2", "TTCC": "e3/e3", "TTCT": "e2/e3", "TTTT": "e2/e2", } PATENTED
  12. apoe = { "CT": "e1", "TT": "e2", "TC": "e3", "CC":

    "e4"} return ‘/‘.join(sorted( [apoe[x+y] for x,y in zip(rs429358, rs7412)]))
  13. $ profile.py genome.txt Gender: Male Blue eyes: Likely, if European

    Skin color: Likely light-skinned of European descent
  14. $ health-risks.py genome.txt Health report ============= APOE-variants (Alzheimer’s): e3/e4 Chronic

    kidney disease (CKD): 1.20 relative risk, 1.21 odds ratio (2 markers)
  15. Hypothyroidism About 42.9% higher risk than baseline (1.5 vs -1.5

    of 3.5 points, test is unweighted) Migranes Slightly higher/typical/slightly lower odds of migraines
  16. Scleroderma (limited cutaneous type) Higher odds Stroke Slightly increased risk

    of having a stroke Inherited conditions Alpha-1 Antitrypsin Deficiency: MM: Normal form of the SERPINA1 gene
  17. Adiponectin levels Slightly lower, which may be bad Alcohol flush

    reaction Little to no reaction (two copies of the ALDH2 gene) Aspargus metabolite detection Typical odds of smelling aspargus in urine
  18. Biological aging (telomere lengths) From 2.8 years younger to 0.0

    years younger than actual age The sum is 2.8 years younger, compared to actual age Birth weight From -60.0g to 0.0g (sum: -60.0g) compared to typical weight
  19. Bitter taste perception Probably can't taste certain bitter flavours Blood

    glucose Average fasting plasma glucose levels of 5.18mmol/L Caffeine metabolism Slow metabolizer Earwax type Wet earwax (sticky, honey-colored)
  20. Eye color Most likely blue, but 30% have green and

    1% brown Hair color; blond versus brown Typical odds of having blond hair vs. brown hair Hair color; odds for red hair Typical odds for red hair
  21. Hair curl Slightly curlier hair on average Heroin addiction Higher

    odds of addiction Lactose intolerance Likely lactose tolerant Malaria resistance (Duffy antigen) Likely not resistant to P. vivax
  22. Muscle performance Likely sprinter, perhaps endurance athlete (one copy) Norovirus

    resistance (most common strain) Likely not resistant to most common strain Pain sensitivity Increased sensitivity to pain Smoking behaviour Likely to smoke a little bit more than average
  23. rs17822931 CC: Wet earwax, normal body odour rs1815739 CT: Mix

    of muscle types, likely sprinter rs4680 AA: Advantage in memory and attention tasks, lower pain threshold, enhanced vulnerability to stress, more efficient at processing information rs53576 GG: Optimistic and emphatic, handles stress well
  24. https://github.com/cslarsen/dna-traits/ See also: Promethease, SNPedia, OpenSNP, 23andMe, ncbi.nlm.nih.gov, etc. Learn:

    http://rosalind.info Image credits: Google images (sorry, too many to mention)