Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lecture 10: Sequence alignment 1

Avatar for shaunmahony shaunmahony
February 14, 2022
250

Lecture 10: Sequence alignment 1

BMMB 554: Lecture 10

Avatar for shaunmahony

shaunmahony

February 14, 2022
Tweet

Transcript

  1. Today’s learning objectives 1. Understand the goal of sequence alignments.

    2. Understand how dynamic programming can be applied to sequence alignment problems. • Relevant Reading: • Bioinformatics & Functional Genomics (Pevsner): Chapter 3
  2. Algorithm design techniques • Exhaustive search • “Brute-force” • Examine

    every possible alternative to find a solution. • Greedy algorithms • Choose the ‘most attractive’ alternative at each iteration. • Divide-and-Conquer algorithms • Break problem into non-overlapping subproblems. • Stitch solutions of subproblems together to solve larger problem. • Dynamic programming • Break problem into overlapping subproblems. • Remember solutions of subproblems, and use them to construct solutions to larger problems. • Machine-learning / Statistical learning theory • Learn the solution from observed data. • Typically models problems probabilistically.
  3. Dynamic programming • Break a problem into overlapping subproblems. •

    Remember solutions of subproblems, and use them to construct solutions to larger problems.
  4. Sequence alignment • Why would we want to align two

    sequences? • Identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships. • Sequence alignment: • Modeling the evolutionary events that occurred in a pair of homologous sequences since their last common ancestor. • Events: Substitutions, Insertions, Deletions
  5. Sequence alignment: problem definition X: -TGAACTCCTACTGT--AAG Y: TTGTTCT--TACTGTCTAAG X= TGAACTCCTACTGTAAG

    Y= TTGTTCTTACTGTCTAAG Align Given two sequence strings (X & Y), an alignment assigns gaps to the sequences such that each letter in X lines up with a letter or a gap in Y, and vice versa. The best alignment assigns gaps such that the similarity between the resulting strings is maximized.
  6. Scoring alignments AGGCTAGT-T AGCGAAGTAT AGGCTA-GT-T AG-CGAAGTAT vs. 6 match 3

    mismatch, 1 gap 6 match 1 mismatch, 3 gap Scoring Function: s (Xi , Yj ) : Xi == Yj (match): +m Xi ≠ Yj (mismatch): -k Gap penalty: -d Alignment Score: F = number of matches multiplied by +m + number of mismatches multiplied by -k + number of gaps multiplied by -d Example: m = +3 k = -1 d = -2
  7. T G A A C T C C T A

    C T G T A A G T T G T T C T T A C T G T C T A A G X: TGAACTCCTACTGTAAG Y: TTGTTCTTACTGTCTAAG
  8. T G A A C T C C T A

    C T G T A A G T T G T T C T T A C T G T C T A A G X: -TGAACTCCTACTGT--AAG Y: TTGTTCT--TACTGTCTAAG
  9. T G A A C T C C T A

    C T G T A A G T T G T T C T T A C T G T C T A A G X: -TGAACTCCTACTGT--AAG Y: TTGTTCT--TACTGTCTAAG
  10. 1. identity (stay along a diagonal) 2. mismatch (stay along

    a diagonal) 3. gap in one sequence (move vertically!) 4. gap in the other sequence (move horizontally!) B&FG 3e Fig. 3-20 Page 97 Four possible outcomes in aligning two sequences 1 2
  11. B&FG 3e Fig. 3-20 Page 97 Four possible outcomes in

    aligning two sequences match (diagonal) mismatch (diagonal) gap in seq1 (vertical) gap in seq2 (horizontal)
  12. Alignment is additive If: X1 …. Xi aligns to Y1

    …. Yj and Xi+1 …. XM aligns to Yj+1 …. YN Then: F(X1…M , Y1…N ) = F(X1…i , Y1…j ) + F(Xi+1…M , Yj+1…N ) So, the original problem, Align X1 …XM to Y1 …YN Can be decomposed into smaller subproblems: Align X1 …Xi to Y1 …Yj And we can apply Dynamic Programming to solve.
  13. • Needleman-Wunsch is guaranteed to find optimal alignments, even though

    the algorithm does not search all possible alignments. • It is an example of a dynamic programming algorithm: an optimal path (alignment) is identified by incrementally extending optimal subpaths. Thus, a series of decisions is made at each step of the alignment to find the pair of residues with the best score. These solutions are then used by future steps of the algorithm. Needleman-Wunsch: dynamic programming
  14. B&FG 3e Fig. 3-21 Page 98 Global pairwise alignment using

    Needleman-Wunsch Three possibilities: • Xi aligns to Yj : Fi, j = Fi-1, j-1 + s(Xi , Yj ) • Xi aligns to gap : Fi, j = Fi-1, j – d • Yj aligns to gap : Fi, j = Fi, j-1 – d
  15. B&FG 3e Fig. 3-21 Page 98 Here the best score

    involves +1 (proceed from upper left to gray, lower right square). If we instead select an alignment involving a gap the score would be worse (-4). Global pairwise alignment using Needleman-Wunsch
  16. B&FG 3e Fig. 3-21 Page 98 Proceed to calculate the

    optimal score for the next position. Global pairwise alignment using Needleman-Wunsch
  17. Needleman-Wunsch algorithm F0, 0 = 0 F0, 1…j = -

    j * d F1…i, 0 = - i * d for each i = 1…M for each j = 1…N Fi-1, j-1 + s(Xi , Yj ) [match] Fi, j = max Fi-1, j – d [gap in X] Fi, j-1 – d [gap in Y] DIAG, if [match] Ptri, j = LEFT, if [gap in X] UP, if [gap in Y] Initialization Iteration Termination: FM, N is the score of the optimal alignment. Alignment path can be traced back from PtrM, N
  18. T G A A C T C C T A

    C T G T A A G T T G T T C T T A C T G T C T A A G X: -TGAACTCCTACTGT--AAG Y: TTGTTCT--TACTGTCTAAG Global Alignment
  19. A C G T A C T Scoring scheme Match

    = +3 Mismatch = -1 Gap = -2 Problem: Align: ACGT vs ACT
  20. Example global alignment A C G T A C T

    0 -2 -4 -6 -8 -2 -4 -6 Scoring scheme Match = +3 Mismatch = -1 Gap = -2 +3 +6 +7 Problem: Align: ACGT vs ACT +1 -1 -3 +1 -1 +4 +5 +4 +2
  21. Algorithmic complexity • Given two sequences of length L •

    Brute force alignment: • Possible pairwise alignments: • Needleman-Wunsch alignment: • 3 summations and a max operation per matrix entry • L x L matrix entries to compute • à O(L2) 22L 2πL
  22. T G A A C T C C T A

    C T G T A A G T T G T T C T T A C T G T C T A A G X: -TGAACTCCTACTGT--AAG Y: TTGTTCT--TACTGTCTAAG Global Alignment
  23. A C C G A T G T A C

    T G T A G G T G A G T C T A C T G T T T A A T C X: ACCGATGTACTGTAGGT Y: GAGTCTACTGTTTAATC Local Alignment
  24. Local alignment (Smith-Waterman) Problem: Find optimal alignments between subsequences of

    X and Y. Given X1 …XM and Y1 …YN , find i, j, k, l such that the score of alignment between Xi …Xj and Yk …Yl is maximal. Idea: If the alignment score becomes negative, it is better to start a new alignment. i.e. set the score to 0
  25. Smith-Waterman algorithm F0, 0 = 0 F0, 1…j = 0

    F1…i, 0 = 0 for each i = 1…M for each j = 1…N Fi-1, j-1 + s(Xi , Yj ) [match] Fi, j = max Fi-1, j – d [gap in X] Fi, j-1 – d [gap in Y] 0 DIAG, if [match] Ptri, j = LEFT, if [gap in X] UP, if [gap in Y] Initialization Iteration Termination: Best local alignment score is the Fi, j with maximum value. Best local alignment path can be traced back from Ptri, j corresponding to maximum Fi, j
  26. Example local alignment T A C G A C T

    0 0 0 0 0 0 0 0 Scoring scheme Match = +3 Mismatch = -3 Gap = -4 0 0 +3 Problem: Align: TACGT vs ACT +3 0 0 0 0 +2 +6 +2 T 0 0 0 +5 +3
  27. Summary • Sequence alignment: • Modeling the evolutionary events that

    occurred in a pair of homologous sequences since their last common ancestor. • Placing gaps in sequences such that similarity is maximized. • Dynamic programming: strategy to solve a complex problem by breaking it into simpler sub-problems. • The Needleman-Wunsch algorithm uses a dynamic programming strategy to compute the optimal global alignment of two sequences. Next up… • Lecture 11: Sequence alignment continued