230

Dynamic Programming as Path Finding

Based on http://thume.ca/2017/06/17/tree-diffing/, presented at the UWaterloo CS Club Alt+Tab short talks event

Tristan Hume

November 07, 2017

Transcript

2. Overview 1. The standard memoization viewpoint 2. Viewing (some) dynamic

programming as path finding 3. Using this new view to solve harder problems! 4. Using this new view to write faster algorithms!
3. Diffing • Take an old thing and a new thing,

find what changed • But, there are many ways to describe the differences • Diff algorithms try to find the shortest description of the changes • The number of different diffs is exponential, so we need to be smart • Dynamic Programming is how we make diffing feasible
4. Levenshtein Distance (ish) • Find minimum number of insertions or

deletions from old to new • Consider changing abcd -> bcad abcdbcad - 4 deletions + 4 insertions = 8 changes bcabdcd - 3 deletions + 3 insertions = 6 changes abcad - 1 deletion + 1 insertion = 2 changes
5. First solution • Lets just try and find the minimum

change count between A and B • We define best_cost(A,B) as the min change count between A and B • Now there’s three recursive possibilities: ◦ Same: best_cost(A,B) = best_cost(A[1..],A[1..]) if A == B ◦ Insert: best_cost(A,B) = 1+best_cost(A,B[1..]) ◦ Delete: best_cost(A,B) = 1+best_cost(A[1..], B) ◦ Base case: best_cost(“”,””) = 0 • Problem: this is exponential, but best_cost gets called with the same parameters a lot!

9. Consequences • There’s a one to one correspondence between dynamic

programming problems on grids and path finding on a grid. • … At least for a large class of diff-like problems. • If we want to come up with a new diff-like algorithm, we can think of it as a path finding problem. • Then translate to a memoized recursive algorithm when we want to implement it.

11. Scoping config file differences (thing-processor-config (:date-switch (case 2017-04-07 (speed 5)

(power 7)) (else (speed 3) (power 9))))
12. Automatic Scoping = Tree Diffing (thing-config (:date-switch (case 2017-04-07 (speed

5) (power 7)) (else (speed 3) (power 9)))) (thing-config (speed 5) (power 7))) (thing-config (speed 3) (power 9)))
13. Let’s write an optimal tree diffing algorithm! • Not so

fast! This is really tricky. • I read a bunch of tree diffing papers, none were relevant. • Insertions and deletions can happen at multiple levels • It’s nicer (less characters) to group adjacent changes in one :date-switch
14. The Path Finding Viewpoint to the rescue! • Thinking about

it as a path finding problem let me fit the problem in my head. • Allowed me to easily sketch ideas and edge cases in my notebook. • I came up with an algorithm by thinking of the different costs in my problem and the situations they applied, and coming up with a corresponding set of moves.

16. It worked! • Finally my algorithm gave great results on

all cases we could think of. • But it was too slow, empirically something like O(n^3) in the file size. • Jane Street had files that were 10,000+ lines long, so this was a problem. • I spent a bunch of time profiling, caching and optimizing and got it down to O(n^2 log(n)) with a low constant factor. • But it still took ~6 minutes to run on our largest file :-(
17. The path finding view pays off again! • I needed

a big algorithmic improvement. • Real life diff algorithms use specialized insights to improve their time complexity to something like O(n + d^2) where d is the amount changed. • But these special algorithms were hard to understand and generalize.
18. A better search • When we “converted” our path finding

problem back to a memoized recursive search to implement it, we were just coding a depth first search path finding algorithm! • What if we used a better path finding algorithm instead? • Early on when I started thinking about the problem as path finding, I had a simple idea that I scrawled in my notes for later: A* • The A* algorithm is a path finding algorithm frequently used in games because it uses a heuristic to work much much faster.
19. Implementing A* • I looked up how it worked on

Wikipedia and learned that all I needed was a heuristic that never underestimated the remaining distance to the destination. • It turned out that the maximum of the remaining size on either side worked. • Now it mostly searched near the diagonal of the grid. • Made my program run instantly on the largest files, being linear-ish in file size but still ~O(n + d^2 log(d)) where d is amount of changes. • Used a hash table for memoization so space use was also linear-ish.
20. Conclusion • The Path Finding viewpoint is great for diff-like

problems: • It can make it easier to think of solutions to new problems. • It’s easy to adapt to new problems by just adding/changing the moves. • You can easily accelerate it with A* instead of having to think of fancy algorithm-specific tricks.
21. Interested in more? Way more detail and example code on

my blog: http://thume.ca/2017/06/17/tree-diffing/