every possible alternative to find a solution. • Greedy algorithms • Choose the ‘most attractive’ alternative at each iteration. • Divide-and-Conquer algorithms • Break problem into non-overlapping subproblems. • Stitch solutions of subproblems together to solve larger problem. • Dynamic programming • Break problem into overlapping subproblems. • Remember solutions of subproblems, and use them to construct solutions to larger problems. • Machine-learning / Statistical learning theory • Learn the solution from observed data. • Typically models problems probabilistically.
sequences? • Identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships. • Sequence alignment: • Modeling the evolutionary events that occurred in a pair of homologous sequences since their last common ancestor. • Events: Substitutions, Insertions, Deletions
Y= TTGTTCTTACTGTCTAAG Align Given two sequence strings (X & Y), an alignment assigns gaps to the sequences such that each letter in X lines up with a letter or a gap in Y, and vice versa. The best alignment assigns gaps such that the similarity between the resulting strings is maximized.
mismatch, 1 gap 6 match 1 mismatch, 3 gap Scoring Function: s (Xi , Yj ) : Xi == Yj (match): +m Xi ≠ Yj (mismatch): -k Gap penalty: -d Alignment Score: F = number of matches multiplied by +m + number of mismatches multiplied by -k + number of gaps multiplied by -d Example: m = +3 k = -1 d = -2
a diagonal) 3. gap in one sequence (move vertically!) 4. gap in the other sequence (move horizontally!) B&FG 3e Fig. 3-20 Page 97 Four possible outcomes in aligning two sequences 1 2
…. Yj and Xi+1 …. XM aligns to Yj+1 …. YN Then: F(X1…M , Y1…N ) = F(X1…i , Y1…j ) + F(Xi+1…M , Yj+1…N ) So, the original problem, Align X1 …XM to Y1 …YN Can be decomposed into smaller subproblems: Align X1 …Xi to Y1 …Yj And we can apply Dynamic Programming to solve.
the algorithm does not search all possible alignments. • It is an example of a dynamic programming algorithm: an optimal path (alignment) is identified by incrementally extending optimal subpaths. Thus, a series of decisions is made at each step of the alignment to find the pair of residues with the best score. These solutions are then used by future steps of the algorithm. Needleman-Wunsch: dynamic programming
Needleman-Wunsch Three possibilities: • Xi aligns to Yj : Fi, j = Fi-1, j-1 + s(Xi , Yj ) • Xi aligns to gap : Fi, j = Fi-1, j – d • Yj aligns to gap : Fi, j = Fi, j-1 – d
involves +1 (proceed from upper left to gray, lower right square). If we instead select an alignment involving a gap the score would be worse (-4). Global pairwise alignment using Needleman-Wunsch
j * d F1…i, 0 = - i * d for each i = 1…M for each j = 1…N Fi-1, j-1 + s(Xi , Yj ) [match] Fi, j = max Fi-1, j – d [gap in X] Fi, j-1 – d [gap in Y] DIAG, if [match] Ptri, j = LEFT, if [gap in X] UP, if [gap in Y] Initialization Iteration Termination: FM, N is the score of the optimal alignment. Alignment path can be traced back from PtrM, N
Brute force alignment: • Possible pairwise alignments: • Needleman-Wunsch alignment: • 3 summations and a max operation per matrix entry • L x L matrix entries to compute • à O(L2) 22L 2πL
X and Y. Given X1 …XM and Y1 …YN , find i, j, k, l such that the score of alignment between Xi …Xj and Yk …Yl is maximal. Idea: If the alignment score becomes negative, it is better to start a new alignment. i.e. set the score to 0
F1…i, 0 = 0 for each i = 1…M for each j = 1…N Fi-1, j-1 + s(Xi , Yj ) [match] Fi, j = max Fi-1, j – d [gap in X] Fi, j-1 – d [gap in Y] 0 DIAG, if [match] Ptri, j = LEFT, if [gap in X] UP, if [gap in Y] Initialization Iteration Termination: Best local alignment score is the Fi, j with maximum value. Best local alignment path can be traced back from Ptri, j corresponding to maximum Fi, j
occurred in a pair of homologous sequences since their last common ancestor. • Placing gaps in sequences such that similarity is maximized. • Dynamic programming: strategy to solve a complex problem by breaking it into simpler sub-problems. • The Needleman-Wunsch algorithm uses a dynamic programming strategy to compute the optimal global alignment of two sequences. Next up… • Lecture 11: Sequence alignment continued