Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Shapley value of a phylogenetic tree... and...

The Shapley value of a phylogenetic tree... and beyond

Fisrt session of a seminar on applications of the Shapley value in computational biology

Francesc Rossello

February 25, 2015

More Decks by Francesc Rossello

Other Decks in Research


  1. The Shapley value of a phylogenetic tree. . . and

    beyond Francesc Rosselló Computational Biology and Bioinformatics Research Group, UIB
  2. The problem In biodiversity, the value of a species relies

    on: rarity, distribution, ecology, charisma and phylogeny. For instance: if a species has far fewer close relatives than others, it is expected to contribute more unique features Given a phylogenetic tree of a set of species, we aim to determine the importance of each species for overall biodiversity from this phylogenetic point of view. 2 / 26
  3. Phylogenetic diversity Given a weighted phylogenetic tree T on X,

    and a subset S ⊆ X: • (If T is rooted or unrooted) The (unrooted) Phylogenetic Diversity of S, PD(S), is the sum of the weights of the edges of the spanning subtree defined by S • (If T is rooted) The (rooted) Phylogenetic Diversity of S, rPD(S), is the sum of the weights of the edges of the subtree defined by S ∪{root} 1 2 3 4 5 5 1 4 5 6 7 8 PD({3,4}) = 13, PD({1,3}) = 16 3 / 26
  4. Phylogenetic diversity Given a weighted phylogenetic tree T on X,

    and a subset S ⊆ X: • (If T is rooted or unrooted) The (unrooted) Phylogenetic Diversity of S, PD(S), is the sum of the weights of the edges of the spanning subtree defined by S • (If T is rooted) The (rooted) Phylogenetic Diversity of S, rPD(S), is the sum of the weights of the edges of the subtree defined by S ∪{root} 1 2 3 4 5 r 5 1 4 5 6 7 8 PD({3,4}) = 13, rPD({1,3}) = 14; PD({1,3}) = rPD({1,3}) = 16 4 / 26
  5. A little game theory A cooperative game is a pair

    (X,v) where X = {1,...,n} is a set of players and v : P(X) → R is a coalition worth mapping such that v(/ 0) = 0. Problem: If players cooperate, how worth (profit) should be fairly distributed among them? Lloyd Shapley, 1953 The Shapley value of (X,v) is the vector ϕX,v = (ϕX,v (i))i∈X , where ϕX,v (i) = ∑ i∈S⊆X (|S|−1)!(n −|S|)! n! (v(S)−v(S \{i})) = ∑ i∈S⊆X 1 n n−1 s−1 (v(S)−v(S \{i})) Sort of “average marginal contribution” made by i. 5 / 26
  6. Properties of the Shapley value The Shapley value is: •

    Pareto efficient: ∑i∈X ϕX,v (i) = v(X) • Symmetric: For every permutation π of X, ϕX,v (π(i)) = ϕX,v◦π (i) • Additive: ϕX,v+w (i) = ϕX,v (i)+ϕX,w )i for every pair v,w : P(X) → R • Strict: If v(S) = v(S \{i}) for every S containing i, then ϕX,v (i) = 0. • Characterized by the four conditions above 6 / 26
  7. Phylogenetic tree games Given a (binary, unrooted, weighted) phylogenetic tree

    T on X, consider the game (S,vT ) where vT (S) := PD(S). Definition (Haake et al, 2007) The Shapley value of T is ϕT := ϕX,v . It measures the average diversity each species adds to a group of species that it joins. ϕX,v (i) == ∑ i∈S⊆X 1 n n−1 s−1 (v(S)−v(S \{i})) 7 / 26
  8. Properties of the Shapley value of a tree The Shapley

    value is: • Pareto efficient: It gives the share of each leaf of vT (X). • Symmetric: The worth of a leaf does not depend on its name; and, when two leaves add exactly the same worth to each S (two especies play the same role in a tree), then they must have the same Shapley value (they have the same share of biological diversity). • Additive: If we reconstruct a tree from the sum of two additive metrics, its Shapley values are the sum of the Shapley values of the trees corresponding to each metric. • Strict: Meaningless if all weights are non-zero. 8 / 26
  9. Computation of the Shapley value of a tree Notations Let

    T be a phylogenetic tree on T = {1,...,n}, associated leaf weights ω1 ,...,ωn, and internal edges I1 ,...,In−3 with associated internal edge weights ωI1 ,...,ωIn−3 . Let E = (ω1 ,...,ωn ,ωI1 ,...,ωIn−3 )t ∈ R2n−3 ϕT = (ϕT (1),...,ϕT (n))t ∈ Rn We denote by MT the matrix of order n ×(2n −3) such that ϕT = MT ·E 9 / 26
  10. Example: n = 4 v(A) = v(B) = v(C) =

    v(D) = 0 v(A,B) = ω+β,v(A,C) = ω+µ+γ,v(A,D) = ω+µ+δ, v(B,C) = β+µ+γ,v(A,D) = β+µ+δ,v(B,D) = γ+δ v(A,B,C) = ω+β+µ+γ,v(A,B,D) = ω+β+µ+δ v(A,C,D) = ω+µ+γ+δ,v(B,C,D) = β+µ+γ+δ v(A,B,C,D) = ω+β+µ+γ+δ 10 / 26
  11. Example: n = 4 v(A) = v(B) = v(C) =

    v(D) = 0 v(A,B) = ω+β,v(A,C) = ω+µ+γ,v(A,D) = ω+µ+δ, v(B,C) = β+µ+γ,v(A,D) = β+µ+δ,v(B,D) = γ+δ v(A,B,C) = ω+β+µ+γ,v(A,B,D) = ω+β+µ+δ v(A,C,D) = ω+µ+γ+δ,v(B,C,D) = β+µ+γ+δ v(A,B,C,D) = ω+β+µ+γ+δ ϕ(A) = 3! 4! (v(A)−v(/ 0)) +2! 4! ((v(A,B)−v(B))+(v(A,C)−v(C))+(v(A,D)−v(D))) +2! 4! ((v(A,B,C)−v(B,C))+(v(A,B,D)−v(B,D)) +(v(A,C,D)−v(C,D))) +3! 4! (v(A,B,C,D)−v(B,C,D)) = 1 4 ·0 + 1 12 (3ω+2µ+β+γ+δ)+ 1 12 (3ω+µ)+ 1 4 ω = 9 12 ω+ 1 12 β+ 1 12 γ+ 1 12 δ+ 3 12 µ 11 / 26
  12. Computation of the Shapley value of a tree: n =

    4     ϕ(A) ϕ(B) ϕ(C) ϕ(D)     = MT 1 12 ·     9 2 2 2 3 2 9 2 2 3 2 2 9 2 3 2 2 2 9 3     ·       ω β γ δ µ       rank(MT ) = 4 Given a Shapley value, there need not exist a phylogenetic tree defining it (it may need negative weights), and when it exists, it need not be unique. 12 / 26
  13. Computation of MT For every i ∈ X and e

    ∈ E (edges) let • C(i,e) the component of the split σ(e) of E that contains i (Close leaves w.r.t. e) • F(i,e) the component of the split σ(e) of E that does not contain i (Far off leaves w.r.t. e) • c(i,e) = |C(i,e)| and f(i,e) = |F(i,e)| Theorem (Haake et al, 2007) For every i ∈ X and e ∈ E, MT (i,e) = f(i,e) n ·c(i,e) 13 / 26
  14. Computation of MT Proof: ϕi = ∑ i∈S⊆X (s −1)!(n

    −s)! n! (v(S)−v(S −{i})) = n ∑ s=2 (s −1)!(n −s)! n! ∑ i∈S,|S|=s ∑ e∈TS−TS−i ωe = n ∑ s=2 (s −1)!(n −s)! n! ∑ i∈S,|S|=s ∑ e s. t. S−i⊆F(i,e) ωe = n ∑ s=2 (s −1)!(n −s)! n! ∑ e∈E ∑ S ⊆F(i,e) |S |=s−1 ωe = ∑ e∈E n ∑ s=2 (s −1)!(n −s)! n! f(i,e) s −1 ωe Therefore M(i,e) = n ∑ s=2 (s −1)!(n −s)! n! f s −1 14 / 26
  15. Computation of MT M(i,e) = n ∑ s=2 (s −1)!(n

    −s)! n! f s −1 = n ∑ s=2 (s −1)!(n −s)!(n −c)! n!(s −1)!(n −c +s −1)! = n ∑ s=2 (n −s)!(n −c)!(c −1)! n!(n −c +s −1)!(c −1)! = n ∑ s=2 (n −c)!(c −1)! n! n −s c −1 = (n −c)!(c −1)! n! n−1 ∑ j=1 j −1 c −1 = (n −c)!(c −1)! n! n ∑ j=1 j −1 c −1 − n −1 c −1 = (n −c)!(c −1)! n! n c − n −1 c −1 = (n −c)!(c −1)! n! n −1 c −1 n c −1 = (n −c)!(c −1)! n! · (n −1)! (c −1)!(n −c)! · f c = f nc 15 / 26
  16. Properties of MT Theorem (Haake et al, 2007) dimNull(MT )

    = n −3 (i.e., rank(MT ) = n) and a (canonical) basis is wI1 ,...,wIn−3 where (wIk )i =      − f(i,Ik )−1 (n−2)c(i,Ik ) if 1 ≤ i ≤ n 1 if i = n +k 0 otherwise Theorem (Haake et al, 2007) If T1 ∼ = T2 up to labels, then MT1 and MT2 are permutation-equivalent and their kernels are permutation-equivalent subspaces of R2n−3. Open problem: Is the converse implication true? (Probably not) 16 / 26
  17. Properties of ϕ Theorem (Haake et al, 2007) Let V

    T the set of all tree games that can be defined on a tree T = (X,E) (with weights in R). Then, the Shapley value ϕ : V T → Rn is the only one that satisfies the following properties: • Pareto efficient: ∑i∈X ϕv (i) = v(X) • Symmetric: For every permutation π of S, ϕv (π(i)) = ϕv◦π (i) • Additive: ϕv+w = ϕv +ϕw • Group proportional: There exists a constant d > 0 such that, for every i ∈ X and e ∈ E, ∑j∈C(i,e) ϕve (j) = d ·f(i,e) (where ve is defined by weighting e with 1 and all other edges by 0). If the distribution of diversity among species should satisfy these axioms, the Shapley value is the only possible choice. 17 / 26
  18. Fair proportion index Definition Given a rooted phylogenetic tree T

    over X, FPT (i) = ∑ i∈CT (e) ωT (e) κT (e) where ωT (e) is e’s weight, CT (e) is e’s cluster, and κT (e) = |CT (e)|. The weight of each edge is equally distributed among all its descendant leaves, and FP(i) is the sum of the share of i of the weights of all its ancestor edges. Applied in the Zoological Society of London’s EDGE project http://www.edgeofexistence.org 18 / 26
  19. FP vs Shapley Set • θFP (i,e): the contribution of

    e to FP(i): θFP (i,e) = ω(e)/κ(e) if i ∈ CT (e) 0 otherwise • θSV (i,e): the contribution of e to ϕi : θSV (i,e) = ω(e) n ∑ s=2 (s −1)!(n −s)! n! n −κ(e) s −1 = ω(e) n ∑ s=2 (n −κ(e))!(n −s)! n!(n −κ(e)−s +1)! if i ∈ CT (e) θSV (i,e) = ω(e) n ∑ s=2 κ(e)!(n −s)! n!(κ(e)−s +1)! if i / ∈ CT (e) 21 / 26
  20. FP vs Shapley Theorem? (Hartmann, 2013) When n → ∞,

    θFP (i,e) = θSV (i,e) for every e ∈ E and i ∈ X, and therefore, when n → ∞, FP(i) = ϕ(i). The FP index is a good approximation for the value of a taxon in a large phylogenetic tree “The Shapley value should only be used if minor gains in the quality of the index are more important than transparency and simplicity” 22 / 26
  21. FP vs Shapley But. . . This is the wrong

    question! FP takes into account the paths from the root to the leaves On a rooted tree, we can use rPD, that involves the root. 23 / 26
  22. FP vs Shapley But. . . This is the wrong

    question! FP takes into account the paths from the root to the leaves On a rooted tree, we can use rPD, that involves the root. Theorem (Fuchs, Jin, 2015) If we use rPD to define the Shapley value on a rooted tree, then FP = ϕ Main ingredient: FPT (a) = ω(e) κ(e) +FPTl (a) ϕT (a) = ω(e) κ(e) +ϕTl (a) 23 / 26
  23. FP vs Shapley This allows, using the usual techniques, to

    prove the following result: Theorem (Fuchs, Jin, 2015) Under the Yule model, E(ϕ(n)) = 3 − 2 n , σ2(ϕ(n)) ∼ 10 −π2 Under the uniform model, E(ϕ(n)) = 3 − 2 n , σ2(ϕ(n)) ∼ 12ln(2)−8 24 / 26
  24. Some extra problems • Trees on S can be understood

    as the linear subspace A ⊆ R (n 2 ) of additive metrics, and therefore Shapley values define a map ϕ : A → Rn. Is there a “natural” extension of this map to R (n 2 )? Or at least to the space of metrics on S? “Graph” metrics are treated in the next session. • If, in a tree T, we take the k species with the highest Shapley values, is there some bound, or relation or something (in terms only of k and n) relating the phylogenetic diversity of these species and the total phylogenetic diversity? • What properties do the Shapley values on (different types of) phylogenetic networks have? 25 / 26
  25. Next session Shapley value of (social, metabolic, gene, but not

    PPI, so far) network games (À suivre. . . ) 26 / 26