a simple one-step lookahead relationship amongst optimal utility values: Optimal rewards = maximize over first action and then follow optimal policy Formally: a s s, a s,a,s’ s’
Assign an estimated cost h(n) to go from node n to goal node. This is called a heuristic. Let f(n) = p(n) + h(n) be the cost from current node to node n. Perform a search by choosing adjacent node n with smallest f(n) at each step. Return when we hit the goal.
h(x) <= distance(x, y) + h(y) Warning: 3e book has a more complex, but also correct, variant A* Graph Search Gone Wrong? S A B C G 1 1 1 2 3 h=2 h=1 h=4 h=1 h=0 S (0+2) A (1+4) B (1+1) C (2+1) G (5+0) C (3+1) G (6+0) State space graph Search tree
! L(Y) = P(F1 , F2 , ... Fn | Y) Give it a test data set and tune parameters until we satisfy success metric One method: Find probabilities using counting