taxonomic units or OTUs. OTUs are families of related organisms. Internal nodes: hypothe2cal ancestors -‐ we postulate their existence but oRen don’t have direct evidence.
was a member of the group (e.g., mul2cellular organisms) Polyphyle2c group: the last common ancestor was not a member of the group (e.g., thermophiles)
gi|5821183 A: gi|5821186 E: gi|199601522 E: gi|163247538 A: gi|5821190 B: gi|13122391 E: gi|13488777 A: gi|5821181 A: gi|5821195 E: gi|1165232 B: gi|710315 0.2 B: gi|13122400 B: gi|13122405 E: gi|2660560 B: gi|13122402 A: gi|5821183 A: gi|5821186 E: gi|199601522 E: gi|163247538 A: gi|5821190 B: gi|13122391 E: gi|13488777 A: gi|5821181 A: gi|5821195 E: gi|1165232 B: gi|710315 Rooted trees: includes an assump2on about the last common ancestor of all sequences Unrooted trees: no assump2on about the last common ancestor of all sequences Trees are oRen built from gene sequences, and thus represent gene trees. If the genes are orthologous, this can also represent a species tree.
at a subset of the possible trees, and don’t guarantee to find the best tree. • Designed to scale to trees for many OTUs (how well they scale depends on the method, and there is a lot of variability) • ORen provide a single tree, so do not include informa2on on how likely other tree topologies are (we’ll talk about methods, such as bootstrapping, to address this).
(non-‐nega2vity) – d(x,y) = 0 if and only if x = y (iden2ty of indiscernibles) – d(x,y) = d(y,x) (symmetry) – d(x,z) <= d(x,y) + d(y,z) (triangle inequality) hhp://www-‐history.mcs.st-‐and.ac.uk/~john/MT4522/Lectures/L5.html hhp://en.wikipedia.org/wiki/Metric_(mathema2cs)
Y (20.0) x y z x 0 14 4 y 14 0 18 z 4 18 0 Distance: – d(x,y) >= 0 (non-‐nega2vity) – d(x,y) = 0 if and only if x = y (iden2ty of indiscernibles) – d(x,y) = d(y,x) (symmetry) – d(x,z) <= d(x,y) + d(y,z) (triangle inequality)
• Most oRen derived from a mul2ple sequence alignment. These differ from the pairwise alignments that we’ve looked at thus far, but use the same underlying algorithms.
Unweighted: all 2p-‐to-‐2p distances contribute equally • Pair-‐group: all branch points lead to exactly two clades • Arithme2c mean: distances to each clade are the mean of distances to all members of that clade hhp://www.southampton.ac.uk/~re1u06/teaching/upgma/
the matrix and create a new group containing only those members. Step 2: Create a new distance matrix with an entry represen2ng the clade created in step 1. Calcula2ng the mean distance from each of the 2ps of the new clade to all other 2ps in the distance matrix. Step 3: If there is only one distance in the distance matrix, stop. Otherwise repeat step 1.
compiled while reviewing the following sources: • The Phylogene&c Handbook (Lemey, Salemi, Vandamme) • Inferring Phylogeny (Felsenstein) • Richard Edwards’s teaching website: hhp://www.southampton.ac.uk/~re1u06/teaching/upgma/
United States License. To view a copy of this license, visit hhp://crea2vecommons.org/licenses/by/3.0/us/ or send a leher to Crea2ve Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Feel free to use or modify these slides, but please credit me by placing the following ahribu2on informa2on where you feel that it makes sense: Greg Caporaso, www.caporaso.us.