Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Differentiable sequence alignment

Differentiable sequence alignment

Michiel Stock

May 11, 2021
Tweet

More Decks by Michiel Stock

Other Decks in Science

Transcript

  1. DIFFERENTIABLE SEQUENCE ALIGNMENT Michiel Stock, Dimitri Boeckaerts, Steff Taelman &

    Wim Van Criekinge @michielstock [email protected] 1 Photo by Andrew Schultz on Unsplash KERMIT
  2. DIFFERENTIABLE COMPUTING 3 Deep learning revolution largely made possible by

    automatic differentiation, making computing gradients easy, accurate and performant. Differentiable computing is a paradigm where one uses gradients in general computer programs. neural Turing machine Our goal: given a sequence alignment algorithm with parameters to align two sequences, and , and yields an alignment score . How will this score change under an in fi nitesimal change of the parameters? θ s t v
  3. DIFFERENTIATING A MAXIMUM 4 Sequence alignment is just dynamic programming,

    i.e. computing a bunch of maxima of the subproblems. Why can’t we differentiate that? maximum of 0.12, 0.90, 0.22, 0.85, 0.43, 0.77 is 0.90 Solution: create a smoothed maximum operator with better partial derivatives 0, 0.69, 0, 0.26, 0, 0.05 partial derivatives correspond to the argmax, i.e. 0, 1, 0, 0, 0, 0 ∂ ∂xi gradient only uses identity of largest element! A great loss of information!
  4. SMOOTH MAXIMUM OPERATORS 5 maxΩ (x) = max q∈△n−1 ⟨

    q, x⟩ − Ω(q) regular maximum Ω(q) = 0 negative entropy Ω(q) = γ∑ i qi log(qi ) regularization ℓ2 Ω(q) = γ∑ i q2 i convex regularizer
  5. DIFFERENTIATING NEEDLEMAN-WUNSCH (GLOBAL ALIGNMENT) 6 : substitution scores θ in

    general, is just obtained from the substitution matrix, i.e. θ θi,j = Ssi ,tj : DP matrix D Di+1,j+i = maxΩ (Di+1,j + cs i , Di,j + θi,j , Di,j+i + ct j ) : gradient E ∂Dn+1,m+1 ∂Di,j
  6. DIFFERENTIATING NEEDLEMAN-WUNSCH 7 function ∇needleman_wunsch(max_argmaxᵧ, θ, (cˢ, cᵗ)) n, m

    = size(θ) D = zeros(n+1, m+1) # initialize dynamic programming matrix D[2:n+1,1] .= -cumsum(cˢ) # cost of starting with gaps in s D[1,2:m+1] .= -cumsum(cᵗ) # cost of starting with gaps in t E = zeros(n+2, m+2) # matrix for the gradient E[n+2,m+2] = 1.0 Q = zeros(n+2, m+2, 3) # matrix for backtracking Q[n+2,m+2,2] = 1.0 # forward pass, performing dynamic programming for i in 1:n, j in 1:m v, q = max_argmaxᵧ((D[i+1,j] - cˢ[i], # gap in first sequence D[i,j] + θ[i,j], # extending the alignment D[i,j+1] - cᵗ[j])) # gap in second sequence D[i+1,j+1] = v # store smooth max Q[i+1,j+1,:] .= q # store directions end # backtracking through the directions to compute the gradient for i in n:-1:1, j in m:-1:1 E[i+1,j+1] = Q[i+1,j+2,1] * E[i+1,j+2] + Q[i+2,j+2,2] * E[i+2,j+2] + Q[i+2,j+1,3] * E[i+2,j+1] end return D[n+1,2:m+1], E[n+1,2:m+1] # value and gradient end initialize arrays for dynamic programming ( ), backtracking ( ) and the gradient ( ) D Q E compute optimal local choice and store soft maximum and its gradient backtrack to obtain the gradient
  7. PLAYING WITH THE MAXIMUM OPERATOR 8 squared maximum yields sparser

    smoothing Ω(q) = γ∑ i q2 i more regularization encourages random walking behaviour Ω(q) = γ∑ i qi log(qi ) large γ regular maximum recovers vanilla alignment algorithm Ω(q) = 0 (or ) γ → 0
  8. DIFFERENTIATING SMITH-WATERMAN (LOCAL ALIGNMENT) 9 θ D Di+1,j+i = maxΩ(Di+1,j

    + cs i , Di,j + θi,j, Di,j+i + ct j , 0) M v = maxΩ (D) M = ∇D maxΩ (D) E ∂v ∂Di,j
  9. DIFFERENTIATING SMITH-WATERMAN 10 function ∇smith_waterman(max_argmaxᵧ, θ, (cˢ, cᵗ)) n, m

    = size(θ) D = zeros(n+1, m+1) # initialize dynamic programming matrix E = zeros(n+2, m+2) # matrix for the gradient Q = zeros(n+2, m+2, 3) # matrix for backtracking for i in 1:n, j in 1:m v, q = max_argmaxᵧ((D[i+1,j] - cˢ[i], # gap in first sequence D[i,j] + θ[i,j], # extending the alignment D[i,j+1] - cᵗ[j], # gap in second sequence 0.0)) D[i+1,j+1] = v # store smooth max Q[i+1,j+1,:] .= q[1], q[2], q[3] # store directions end v, M = max_argmaxᵧ(D[2:n+1, 2:m+1]) # compute smooth max and gradient # backtracking through the directions to compute the gradient for i in n:-1:1, j in m:-1:1 E[i+1,j+1] = M[i,j] + # contribution to v Q[i+1,j+2,1] * E[i+1,j+2] + Q[i+2,j+2,2] * E[i+2,j+2] + Q[i+2,j+1,3] * E[i+2,j+1] end return v, E[2:n+1,2:m+1] # value and gradient end initialize arrays for dynamic programming ( ), backtracking ( ) and the gradient ( ) D Q E compute optimal local choice and store soft maximum and its gradient backtrack to obtain the gradient take smooth maximum of and its gradient D
  10. PROPAGATING GRADIENTS OF ALIGNMENT SCORES 11 Up to now, we

    provided the the gradient of the alignment score w.r.t. the DP matrix ∂v ∂Di,j By applying the chain to the DP update rules, we can easily obtain the derivatives w.r.t. the parameters, e.g. ∂v ∂θi,j = ∂v ∂Di,j ∂Di,j ∂θi,j Autodiff can propagate these gradients further, e.g. towards the substitution matrix or as part of larger artificial neural network! gap in s gap in t substitution = + + E
  11. COMPUTATION TIME 12 length NW NW + grad SW SW

    + grad max 10 0.00001 0.00002 0.00001 0.00002 100 0.00004 0.00064 0.00019 0.00062 500 0.00109 0.01930 0.00673 0.02339 1000 0.00370 0.06548 0.01511 0.06109 entropy 10 0.00002 0.00002 0.00002 0.00002 100 0.00077 0.00113 0.00104 0.00145 500 0.01925 0.02630 0.02884 0.04001 1000 0.06677 0.09468 0.08752 0.12903 squared 10 0.00002 0.00001 0.00002 0.00002 100 0.00046 0.00053 0.00079 0.00092 500 0.01149 0.01384 0.01856 0.02207 1000 0.04039 0.04867 0.07319 0.08620 Running time in seconds in 32-bit precision (excluding compiling and array initialization)
  12. GRADIENTS AVAILABLE VIA CHAINRULES.JL 13 Repo: https://github.com/MichielStock/DiffDynProg.jl Custom adjoints provided

    via ChainRulesCore.jl (also interoperable with various automatic differentiation libraries) Compute derivatives of arbitrary pieces or Julia code. Interoperable with e.g., bioinformatics libraries.
  13. CONCLUSION 14 Our work builds upon the framework by Mensch

    and Blondel. Mensch, A., & Blondel, M. (2018). Differentiable dynamic programming for structured prediction and attention. Retrieved from https://arxiv.org/pdf/ 1802.03676.pdf Differentiable sequence alignment as a natural generalization of vanilla alignment.