Differentiable sequence alignment

DIFFERENTIABLE SEQUENCE ALIGNMENT Michiel Stock, Dimitri Boeckaerts, Steff Taelman &
Wim Van Criekinge @michielstock [email protected] 1 Photo by Andrew Schultz on Unsplash KERMIT

SEQUENCE ALIGNMENT 2 Sequence alignment to analyse biosequences “Deep learning”
methods for biosequences

DIFFERENTIABLE COMPUTING 3 Deep learning revolution largely made possible by
automatic differentiation, making computing gradients easy, accurate and performant. Differentiable computing is a paradigm where one uses gradients in general computer programs. neural Turing machine Our goal: given a sequence alignment algorithm with parameters to align two sequences, and , and yields an alignment score . How will this score change under an in fi nitesimal change of the parameters? θ s t v

DIFFERENTIATING A MAXIMUM 4 Sequence alignment is just dynamic programming,
i.e. computing a bunch of maxima of the subproblems. Why can’t we differentiate that? maximum of 0.12, 0.90, 0.22, 0.85, 0.43, 0.77 is 0.90 Solution: create a smoothed maximum operator with better partial derivatives 0, 0.69, 0, 0.26, 0, 0.05 partial derivatives correspond to the argmax, i.e. 0, 1, 0, 0, 0, 0 ∂ ∂xi gradient only uses identity of largest element! A great loss of information!

SMOOTH MAXIMUM OPERATORS 5 maxΩ (x) = max q∈△n−1 ⟨
q, x⟩ − Ω(q) regular maximum Ω(q) = 0 negative entropy Ω(q) = γ∑ i qi log(qi ) regularization ℓ2 Ω(q) = γ∑ i q2 i convex regularizer

DIFFERENTIATING NEEDLEMAN-WUNSCH (GLOBAL ALIGNMENT) 6 : substitution scores θ in
general, is just obtained from the substitution matrix, i.e. θ θi,j = Ssi ,tj : DP matrix D Di+1,j+i = maxΩ (Di+1,j + cs i , Di,j + θi,j , Di,j+i + ct j ) : gradient E ∂Dn+1,m+1 ∂Di,j

DIFFERENTIATING NEEDLEMAN-WUNSCH 7 function ∇needleman_wunsch(max_argmaxᵧ, θ, (cˢ, cᵗ)) n, m
= size(θ) D = zeros(n+1, m+1) # initialize dynamic programming matrix D[2:n+1,1] .= -cumsum(cˢ) # cost of starting with gaps in s D[1,2:m+1] .= -cumsum(cᵗ) # cost of starting with gaps in t E = zeros(n+2, m+2) # matrix for the gradient E[n+2,m+2] = 1.0 Q = zeros(n+2, m+2, 3) # matrix for backtracking Q[n+2,m+2,2] = 1.0 # forward pass, performing dynamic programming for i in 1:n, j in 1:m v, q = max_argmaxᵧ((D[i+1,j] - cˢ[i], # gap in first sequence D[i,j] + θ[i,j], # extending the alignment D[i,j+1] - cᵗ[j])) # gap in second sequence D[i+1,j+1] = v # store smooth max Q[i+1,j+1,:] .= q # store directions end # backtracking through the directions to compute the gradient for i in n:-1:1, j in m:-1:1 E[i+1,j+1] = Q[i+1,j+2,1] * E[i+1,j+2] + Q[i+2,j+2,2] * E[i+2,j+2] + Q[i+2,j+1,3] * E[i+2,j+1] end return D[n+1,2:m+1], E[n+1,2:m+1] # value and gradient end initialize arrays for dynamic programming ( ), backtracking ( ) and the gradient ( ) D Q E compute optimal local choice and store soft maximum and its gradient backtrack to obtain the gradient

PLAYING WITH THE MAXIMUM OPERATOR 8 squared maximum yields sparser
smoothing Ω(q) = γ∑ i q2 i more regularization encourages random walking behaviour Ω(q) = γ∑ i qi log(qi ) large γ regular maximum recovers vanilla alignment algorithm Ω(q) = 0 (or ) γ → 0

DIFFERENTIATING SMITH-WATERMAN (LOCAL ALIGNMENT) 9 θ D Di+1,j+i = maxΩ(Di+1,j
+ cs i , Di,j + θi,j, Di,j+i + ct j , 0) M v = maxΩ (D) M = ∇D maxΩ (D) E ∂v ∂Di,j

DIFFERENTIATING SMITH-WATERMAN 10 function ∇smith_waterman(max_argmaxᵧ, θ, (cˢ, cᵗ)) n, m
= size(θ) D = zeros(n+1, m+1) # initialize dynamic programming matrix E = zeros(n+2, m+2) # matrix for the gradient Q = zeros(n+2, m+2, 3) # matrix for backtracking for i in 1:n, j in 1:m v, q = max_argmaxᵧ((D[i+1,j] - cˢ[i], # gap in first sequence D[i,j] + θ[i,j], # extending the alignment D[i,j+1] - cᵗ[j], # gap in second sequence 0.0)) D[i+1,j+1] = v # store smooth max Q[i+1,j+1,:] .= q[1], q[2], q[3] # store directions end v, M = max_argmaxᵧ(D[2:n+1, 2:m+1]) # compute smooth max and gradient # backtracking through the directions to compute the gradient for i in n:-1:1, j in m:-1:1 E[i+1,j+1] = M[i,j] + # contribution to v Q[i+1,j+2,1] * E[i+1,j+2] + Q[i+2,j+2,2] * E[i+2,j+2] + Q[i+2,j+1,3] * E[i+2,j+1] end return v, E[2:n+1,2:m+1] # value and gradient end initialize arrays for dynamic programming ( ), backtracking ( ) and the gradient ( ) D Q E compute optimal local choice and store soft maximum and its gradient backtrack to obtain the gradient take smooth maximum of and its gradient D

PROPAGATING GRADIENTS OF ALIGNMENT SCORES 11 Up to now, we
provided the the gradient of the alignment score w.r.t. the DP matrix ∂v ∂Di,j By applying the chain to the DP update rules, we can easily obtain the derivatives w.r.t. the parameters, e.g. ∂v ∂θi,j = ∂v ∂Di,j ∂Di,j ∂θi,j Autodiff can propagate these gradients further, e.g. towards the substitution matrix or as part of larger artificial neural network! gap in s gap in t substitution = + + E

COMPUTATION TIME 12 length NW NW + grad SW SW
+ grad max 10 0.00001 0.00002 0.00001 0.00002 100 0.00004 0.00064 0.00019 0.00062 500 0.00109 0.01930 0.00673 0.02339 1000 0.00370 0.06548 0.01511 0.06109 entropy 10 0.00002 0.00002 0.00002 0.00002 100 0.00077 0.00113 0.00104 0.00145 500 0.01925 0.02630 0.02884 0.04001 1000 0.06677 0.09468 0.08752 0.12903 squared 10 0.00002 0.00001 0.00002 0.00002 100 0.00046 0.00053 0.00079 0.00092 500 0.01149 0.01384 0.01856 0.02207 1000 0.04039 0.04867 0.07319 0.08620 Running time in seconds in 32-bit precision (excluding compiling and array initialization)

GRADIENTS AVAILABLE VIA CHAINRULES.JL 13 Repo: https://github.com/MichielStock/DiffDynProg.jl Custom adjoints provided
via ChainRulesCore.jl (also interoperable with various automatic differentiation libraries) Compute derivatives of arbitrary pieces or Julia code. Interoperable with e.g., bioinformatics libraries.

CONCLUSION 14 Our work builds upon the framework by Mensch
and Blondel. Mensch, A., & Blondel, M. (2018). Differentiable dynamic programming for structured prediction and attention. Retrieved from https://arxiv.org/pdf/ 1802.03676.pdf Differentiable sequence alignment as a natural generalization of vanilla alignment.

Differentiable sequence alignment

Differentiable sequence alignment

Michiel Stock

More Decks by Michiel Stock

Other Decks in Science

Featured

Transcript

DIFFERENTIABLE SEQUENCE ALIGNMENT Michiel Stock, Dimitri Boeckaerts, Steff Taelman &

SEQUENCE ALIGNMENT 2 Sequence alignment to analyse biosequences “Deep learning”

DIFFERENTIABLE COMPUTING 3 Deep learning revolution largely made possible by

DIFFERENTIATING A MAXIMUM 4 Sequence alignment is just dynamic programming,

SMOOTH MAXIMUM OPERATORS 5 maxΩ (x) = max q∈△n−1 ⟨

DIFFERENTIATING NEEDLEMAN-WUNSCH (GLOBAL ALIGNMENT) 6 : substitution scores θ in

DIFFERENTIATING NEEDLEMAN-WUNSCH 7 function ∇needleman_wunsch(max_argmaxᵧ, θ, (cˢ, cᵗ)) n, m

PLAYING WITH THE MAXIMUM OPERATOR 8 squared maximum yields sparser

DIFFERENTIATING SMITH-WATERMAN (LOCAL ALIGNMENT) 9 θ D Di+1,j+i = maxΩ(Di+1,j

DIFFERENTIATING SMITH-WATERMAN 10 function ∇smith_waterman(max_argmaxᵧ, θ, (cˢ, cᵗ)) n, m

PROPAGATING GRADIENTS OF ALIGNMENT SCORES 11 Up to now, we

COMPUTATION TIME 12 length NW NW + grad SW SW

GRADIENTS AVAILABLE VIA CHAINRULES.JL 13 Repo: https://github.com/MichielStock/DiffDynProg.jl Custom adjoints provided

CONCLUSION 14 Our work builds upon the framework by Mensch