Slide 81
Slide 81 text
Overview Introduction Methods Applications
K-MERS FROM REDUCED AMINO ACID ALPHABETS RETAIN
EVOLUTIONARILY CONSERVED BIOCHEMICAL PROPERTIES
17
Amino acid
C
A, G, P, S, T
D, E, N, Q
H, K, R
I, L, M, V
F, W, Y
Property
Sulfur
polymerization
Small
Acid and
amide
Basic
Hydrophobic
Aromatic
Dayhoff
a
b
c
d
e
f
Hydrophobic-
polar (hp)
p
A, G, P: h
S,T: p
p
p
h
h
Protein: FLAWLESS
Dayhoff: febfecbb
HP: hhhhhppp
Dayhoff MO (1972). An Atlas of Protein Sequence.
Phillips R, Kondev J, & Theriot J. (2012) Physical Biology of the Cell
Peris, P., López, D., & Campos, M. (2008). IgTM: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics, 9(1), 1029–11. http://doi.org/10.1186/1471-2105-9-367
Reduced alphabet k-mers are resilient to amnio acid changes
FLAWLESS
FLAWLESS
FLAWLESS
FLAWLESS
FLAWLESS
FLAWVESS
FLAWVESS
FLAWLESS
FLAWVESS
FLAWVESS
febfecbb
febfecbb
febfecbb
febfecbb
febfecbb
febfecbb
febfecbb
febfecbb
febfecbb
febfecbb
Reduced alphabets require longer k-mers to encode equivalent information
content.
For example, if k=7 is minimum for proteins as it corresponds to DNA ksize of
21, then can derive minimum informative k-mer size for any alphabet size, |Σ|:
Dayhoff alphabet: no change in k-mers!
Protein alphabet: k k-mers affected
Single amino acid change: Leucine (L) → Valine (V)
207 = |⌃|k
7 log 20 = k log |⌃|
k =
⇠
7 log 20
log |⌃|
⇡
AAACV3icdVFNa9swGJbdj2VZt6XbcRfRUNgp2KZrlkOhbJcdO7a0hSgNsvLaEZZlI70uBDd/cuzSv7LLJsfpaEv7guDh+eCVHsWlkhaD4Nbzt7Z3dl90XnZf7b1+87a3/+7cFpURMBaFKsxlzC0oqWGMEhVclgZ4Hiu4iLOvjX5xDcbKQv/EZQnTnKdaJlJwdNSsp6PgakhP6A37IdOc31xljNHukDJVpDQKnJK1+M7QyJmjmYIEmRIgFUsMF/X/zKp+EFgxI9MFMtNYZ71+MIhGx1Ewoi34dNyCcBTRcBCsp082czbr/WLzQlQ5aBSKWzsJgxKnNTcohYJVl1UWSi4ynsLEQc1zsNN63cuKHjpmTpPCuKORrtn7iZrn1i7z2Dlzjgv7WGvIp7RJhcnnaS11WSFo0S5KKkWxoE3JdC4NCFRLB7gw0t2VigV3LaH7iq4r4e6l9HlwHg3Co8Ho+1H/9Mumjg75QA7IRxKSITkl38gZGRNBfpM/3pa37d16f/1dv9NafW+TeU8ejL//D5uiscI=