Slide 1

Slide 1 text

INTERPRETING DNA WITH PYTHON Christian Stigen Larsen 2015-03-17 Stavanger Software Developers

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

AGGTCGGTCGAGAGACTCAGGCAGCCGCC TCCAGCCAGCTCTCTGAGTCCGTCGGCGG

Slide 10

Slide 10 text

AGGTCGGTCGAGAGACTCAGGCAGCCGCC AGGTCGGTCGAGAGACTCAGGCATCCGCC AGGTCGGTCGAGATACTCAGGCATCCGCC AGGTCGGTCGAGAGACTCAGGCAGCCGCC SNP1 SNP2 Haplotype Person 1 Person 2 Person n-1 Person n

Slide 11

Slide 11 text

G G T G SNP1 SNP2 G T T G

Slide 12

Slide 12 text

RSID Chromosome Position Genotype rs4477212 1 82154 AA rs3094315 1 752566 AA i4000827 MT 14478 C rs12116415 Y 9938087 T rs6526110 X 151603258 AG

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Blazing fast parsing • C++ • Packed structs w/bit-fields • Memory maps • Forward-only reads • Dense hash map • Exposed through Python

Slide 15

Slide 15 text

csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s csv 2.16s 24 Mb 11 Mb/s dna-traits 0.22s 24 Mb 112 Mb/s csv 22.5s 496 Mb 23 Mb/s dna-traits 3.1s 496 Mb 168 Mb/s

Slide 16

Slide 16 text

GWAS

Slide 17

Slide 17 text

from dna_traits.match import unphased_match import dna_traits as dna import sys genome = dna.parse(sys.argv[1]) print( unphased_match(genome.rs713598, { "CC": "Probably can't taste certain bitter flavours", "CG": "Can taste bitter flavours that others can't", "GG": "Can taste bitter flavours that others can’t"})) from dna_traits.match import unphased_match import dna_traits as dna import sys genome = dna.parse(sys.argv[1]) print( unphased_match(genome.rs713598, { "CC": "Probably can't taste certain bitter flavours", "CG": "Can taste bitter flavours that others can't", "GG": "Can taste bitter flavours that others can’t"}))

Slide 18

Slide 18 text

unphased_match(genome.rs713598, { "CC": "Probably can't taste certain bitter flavours", "CG": "Can taste bitter flavours that others can't", "GG": "Can taste bitter flavours that others can’t"}))

Slide 19

Slide 19 text

def earwax_type(genome): "Earwax type." return unphased_match(genome.rs17822931, { "CC": "Wet earwax (sticky, honey-colored)", "CT": "Wet earwax (sticky, honey-colored)", "TT": "Dry earwax (flaky, pale)", None: "Unable to determine"})

Slide 20

Slide 20 text

def caffeine_metabolism(genome): """Caffeine metabolism.""" assert_european(genome) return unphased_match(genome.rs762551, { "AA": "Fast metabolizer", "AC": "Slow metabolizer", "CC": "Slow metabolizer", None: "Unable to determine"})

Slide 21

Slide 21 text

def pain_sensitivity(genome): """Pain sensitivity."" return unphased_match(genome.rs6269, { "AA": "Increased sensitivity to pain", "AG": "Typical sensitivity to pain", "GG": "Less sensitive to pain", None: "Unable to determine"})

Slide 22

Slide 22 text

def blood_glucose(genome): """Blood glucose.""" assert_european(genome) return unphased_match(genome.rs560887, { "CC": "Avg fasting plasma glucose levels of 5.18mmol/L", "CT": "Avg fasting plasma glucose levels of 5.12mmol/L", "TT": "Avg fasting plasma glucose levels of 5.06mmol/L", None: "Unable to determine"})

Slide 23

Slide 23 text

def heroin_addiction(genome): """Heroin addiction.""" assert_european(genome) return unphased_match(genome.rs1799971, { "AA": "Typical odds of addiction", "AG": "Higher odds of addiction", "GG": "Higher odds of addiction", None: "Unable to determine"})

Slide 24

Slide 24 text

variants = { "CCCC": "e4/e4", "CCCT": "e1/e4", "CCTT": "e1/e1", "CTCC": "e3/e4", "CTCT": "e1/e3 or e2/e4", # ambiguous "CTTT": "e1/e2", "TTCC": "e3/e3", "TTCT": "e2/e3", "TTTT": "e2/e2", } PATENTED

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

apoe = { "CT": "e1", "TT": "e2", "TC": "e3", "CC": "e4"} return ‘/‘.join(sorted( [apoe[x+y] for x,y in zip(rs429358, rs7412)]))

Slide 27

Slide 27 text

$ blue-eyes.py genome.txt genome.txt: Blue eyes (gs237): Likely

Slide 28

Slide 28 text

$ profile.py genome.txt Gender: Male Blue eyes: Likely, if European Skin color: Likely light-skinned of European descent

Slide 29

Slide 29 text

$ health-risks.py genome.txt Health report ============= APOE-variants (Alzheimer’s): e3/e4 Chronic kidney disease (CKD): 1.20 relative risk, 1.21 odds ratio (2 markers)

Slide 30

Slide 30 text

Hypothyroidism About 42.9% higher risk than baseline (1.5 vs -1.5 of 3.5 points, test is unweighted) Migranes Slightly higher/typical/slightly lower odds of migraines

Slide 31

Slide 31 text

Scleroderma (limited cutaneous type) Higher odds Stroke Slightly increased risk of having a stroke Inherited conditions Alpha-1 Antitrypsin Deficiency: MM: Normal form of the SERPINA1 gene

Slide 32

Slide 32 text

Adiponectin levels Slightly lower, which may be bad Alcohol flush reaction Little to no reaction (two copies of the ALDH2 gene) Aspargus metabolite detection Typical odds of smelling aspargus in urine

Slide 33

Slide 33 text

Biological aging (telomere lengths) From 2.8 years younger to 0.0 years younger than actual age The sum is 2.8 years younger, compared to actual age Birth weight From -60.0g to 0.0g (sum: -60.0g) compared to typical weight

Slide 34

Slide 34 text

Bitter taste perception Probably can't taste certain bitter flavours Blood glucose Average fasting plasma glucose levels of 5.18mmol/L Caffeine metabolism Slow metabolizer Earwax type Wet earwax (sticky, honey-colored)

Slide 35

Slide 35 text

Eye color Most likely blue, but 30% have green and 1% brown Hair color; blond versus brown Typical odds of having blond hair vs. brown hair Hair color; odds for red hair Typical odds for red hair

Slide 36

Slide 36 text

Hair curl Slightly curlier hair on average Heroin addiction Higher odds of addiction Lactose intolerance Likely lactose tolerant Malaria resistance (Duffy antigen) Likely not resistant to P. vivax

Slide 37

Slide 37 text

Muscle performance Likely sprinter, perhaps endurance athlete (one copy) Norovirus resistance (most common strain) Likely not resistant to most common strain Pain sensitivity Increased sensitivity to pain Smoking behaviour Likely to smoke a little bit more than average

Slide 38

Slide 38 text

rs17822931 CC: Wet earwax, normal body odour rs1815739 CT: Mix of muscle types, likely sprinter rs4680 AA: Advantage in memory and attention tasks, lower pain threshold, enhanced vulnerability to stress, more efficient at processing information rs53576 GG: Optimistic and emphatic, handles stress well

Slide 39

Slide 39 text

https://github.com/cslarsen/dna-traits/ See also: Promethease, SNPedia, OpenSNP, 23andMe, ncbi.nlm.nih.gov, etc. Learn: http://rosalind.info Image credits: Google images (sorry, too many to mention)

Slide 40

Slide 40 text

No content