Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Shapley value of a phylogenetic tree... and beyond

The Shapley value of a phylogenetic tree... and beyond

Fisrt session of a seminar on applications of the Shapley value in computational biology

Francesc Rossello

February 25, 2015
Tweet

More Decks by Francesc Rossello

Other Decks in Research

Transcript

  1. The Shapley value of a phylogenetic tree. . . and

    beyond Francesc Rosselló Computational Biology and Bioinformatics Research Group, UIB
  2. The problem In biodiversity, the value of a species relies

    on: rarity, distribution, ecology, charisma and phylogeny. For instance: if a species has far fewer close relatives than others, it is expected to contribute more unique features Given a phylogenetic tree of a set of species, we aim to determine the importance of each species for overall biodiversity from this phylogenetic point of view. 2 / 26
  3. Phylogenetic diversity Given a weighted phylogenetic tree T on X,

    and a subset S ⊆ X: • (If T is rooted or unrooted) The (unrooted) Phylogenetic Diversity of S, PD(S), is the sum of the weights of the edges of the spanning subtree defined by S • (If T is rooted) The (rooted) Phylogenetic Diversity of S, rPD(S), is the sum of the weights of the edges of the subtree defined by S ∪{root} 1 2 3 4 5 5 1 4 5 6 7 8 PD({3,4}) = 13, PD({1,3}) = 16 3 / 26
  4. Phylogenetic diversity Given a weighted phylogenetic tree T on X,

    and a subset S ⊆ X: • (If T is rooted or unrooted) The (unrooted) Phylogenetic Diversity of S, PD(S), is the sum of the weights of the edges of the spanning subtree defined by S • (If T is rooted) The (rooted) Phylogenetic Diversity of S, rPD(S), is the sum of the weights of the edges of the subtree defined by S ∪{root} 1 2 3 4 5 r 5 1 4 5 6 7 8 PD({3,4}) = 13, rPD({1,3}) = 14; PD({1,3}) = rPD({1,3}) = 16 4 / 26
  5. A little game theory A cooperative game is a pair

    (X,v) where X = {1,...,n} is a set of players and v : P(X) → R is a coalition worth mapping such that v(/ 0) = 0. Problem: If players cooperate, how worth (profit) should be fairly distributed among them? Lloyd Shapley, 1953 The Shapley value of (X,v) is the vector ϕX,v = (ϕX,v (i))i∈X , where ϕX,v (i) = ∑ i∈S⊆X (|S|−1)!(n −|S|)! n! (v(S)−v(S \{i})) = ∑ i∈S⊆X 1 n n−1 s−1 (v(S)−v(S \{i})) Sort of “average marginal contribution” made by i. 5 / 26
  6. Properties of the Shapley value The Shapley value is: •

    Pareto efficient: ∑i∈X ϕX,v (i) = v(X) • Symmetric: For every permutation π of X, ϕX,v (π(i)) = ϕX,v◦π (i) • Additive: ϕX,v+w (i) = ϕX,v (i)+ϕX,w )i for every pair v,w : P(X) → R • Strict: If v(S) = v(S \{i}) for every S containing i, then ϕX,v (i) = 0. • Characterized by the four conditions above 6 / 26
  7. Phylogenetic tree games Given a (binary, unrooted, weighted) phylogenetic tree

    T on X, consider the game (S,vT ) where vT (S) := PD(S). Definition (Haake et al, 2007) The Shapley value of T is ϕT := ϕX,v . It measures the average diversity each species adds to a group of species that it joins. ϕX,v (i) == ∑ i∈S⊆X 1 n n−1 s−1 (v(S)−v(S \{i})) 7 / 26
  8. Properties of the Shapley value of a tree The Shapley

    value is: • Pareto efficient: It gives the share of each leaf of vT (X). • Symmetric: The worth of a leaf does not depend on its name; and, when two leaves add exactly the same worth to each S (two especies play the same role in a tree), then they must have the same Shapley value (they have the same share of biological diversity). • Additive: If we reconstruct a tree from the sum of two additive metrics, its Shapley values are the sum of the Shapley values of the trees corresponding to each metric. • Strict: Meaningless if all weights are non-zero. 8 / 26
  9. Computation of the Shapley value of a tree Notations Let

    T be a phylogenetic tree on T = {1,...,n}, associated leaf weights ω1 ,...,ωn, and internal edges I1 ,...,In−3 with associated internal edge weights ωI1 ,...,ωIn−3 . Let E = (ω1 ,...,ωn ,ωI1 ,...,ωIn−3 )t ∈ R2n−3 ϕT = (ϕT (1),...,ϕT (n))t ∈ Rn We denote by MT the matrix of order n ×(2n −3) such that ϕT = MT ·E 9 / 26
  10. Example: n = 4 v(A) = v(B) = v(C) =

    v(D) = 0 v(A,B) = ω+β,v(A,C) = ω+µ+γ,v(A,D) = ω+µ+δ, v(B,C) = β+µ+γ,v(A,D) = β+µ+δ,v(B,D) = γ+δ v(A,B,C) = ω+β+µ+γ,v(A,B,D) = ω+β+µ+δ v(A,C,D) = ω+µ+γ+δ,v(B,C,D) = β+µ+γ+δ v(A,B,C,D) = ω+β+µ+γ+δ 10 / 26
  11. Example: n = 4 v(A) = v(B) = v(C) =

    v(D) = 0 v(A,B) = ω+β,v(A,C) = ω+µ+γ,v(A,D) = ω+µ+δ, v(B,C) = β+µ+γ,v(A,D) = β+µ+δ,v(B,D) = γ+δ v(A,B,C) = ω+β+µ+γ,v(A,B,D) = ω+β+µ+δ v(A,C,D) = ω+µ+γ+δ,v(B,C,D) = β+µ+γ+δ v(A,B,C,D) = ω+β+µ+γ+δ ϕ(A) = 3! 4! (v(A)−v(/ 0)) +2! 4! ((v(A,B)−v(B))+(v(A,C)−v(C))+(v(A,D)−v(D))) +2! 4! ((v(A,B,C)−v(B,C))+(v(A,B,D)−v(B,D)) +(v(A,C,D)−v(C,D))) +3! 4! (v(A,B,C,D)−v(B,C,D)) = 1 4 ·0 + 1 12 (3ω+2µ+β+γ+δ)+ 1 12 (3ω+µ)+ 1 4 ω = 9 12 ω+ 1 12 β+ 1 12 γ+ 1 12 δ+ 3 12 µ 11 / 26
  12. Computation of the Shapley value of a tree: n =

    4     ϕ(A) ϕ(B) ϕ(C) ϕ(D)     = MT 1 12 ·     9 2 2 2 3 2 9 2 2 3 2 2 9 2 3 2 2 2 9 3     ·       ω β γ δ µ       rank(MT ) = 4 Given a Shapley value, there need not exist a phylogenetic tree defining it (it may need negative weights), and when it exists, it need not be unique. 12 / 26
  13. Computation of MT For every i ∈ X and e

    ∈ E (edges) let • C(i,e) the component of the split σ(e) of E that contains i (Close leaves w.r.t. e) • F(i,e) the component of the split σ(e) of E that does not contain i (Far off leaves w.r.t. e) • c(i,e) = |C(i,e)| and f(i,e) = |F(i,e)| Theorem (Haake et al, 2007) For every i ∈ X and e ∈ E, MT (i,e) = f(i,e) n ·c(i,e) 13 / 26
  14. Computation of MT Proof: ϕi = ∑ i∈S⊆X (s −1)!(n

    −s)! n! (v(S)−v(S −{i})) = n ∑ s=2 (s −1)!(n −s)! n! ∑ i∈S,|S|=s ∑ e∈TS−TS−i ωe = n ∑ s=2 (s −1)!(n −s)! n! ∑ i∈S,|S|=s ∑ e s. t. S−i⊆F(i,e) ωe = n ∑ s=2 (s −1)!(n −s)! n! ∑ e∈E ∑ S ⊆F(i,e) |S |=s−1 ωe = ∑ e∈E n ∑ s=2 (s −1)!(n −s)! n! f(i,e) s −1 ωe Therefore M(i,e) = n ∑ s=2 (s −1)!(n −s)! n! f s −1 14 / 26
  15. Computation of MT M(i,e) = n ∑ s=2 (s −1)!(n

    −s)! n! f s −1 = n ∑ s=2 (s −1)!(n −s)!(n −c)! n!(s −1)!(n −c +s −1)! = n ∑ s=2 (n −s)!(n −c)!(c −1)! n!(n −c +s −1)!(c −1)! = n ∑ s=2 (n −c)!(c −1)! n! n −s c −1 = (n −c)!(c −1)! n! n−1 ∑ j=1 j −1 c −1 = (n −c)!(c −1)! n! n ∑ j=1 j −1 c −1 − n −1 c −1 = (n −c)!(c −1)! n! n c − n −1 c −1 = (n −c)!(c −1)! n! n −1 c −1 n c −1 = (n −c)!(c −1)! n! · (n −1)! (c −1)!(n −c)! · f c = f nc 15 / 26
  16. Properties of MT Theorem (Haake et al, 2007) dimNull(MT )

    = n −3 (i.e., rank(MT ) = n) and a (canonical) basis is wI1 ,...,wIn−3 where (wIk )i =      − f(i,Ik )−1 (n−2)c(i,Ik ) if 1 ≤ i ≤ n 1 if i = n +k 0 otherwise Theorem (Haake et al, 2007) If T1 ∼ = T2 up to labels, then MT1 and MT2 are permutation-equivalent and their kernels are permutation-equivalent subspaces of R2n−3. Open problem: Is the converse implication true? (Probably not) 16 / 26
  17. Properties of ϕ Theorem (Haake et al, 2007) Let V

    T the set of all tree games that can be defined on a tree T = (X,E) (with weights in R). Then, the Shapley value ϕ : V T → Rn is the only one that satisfies the following properties: • Pareto efficient: ∑i∈X ϕv (i) = v(X) • Symmetric: For every permutation π of S, ϕv (π(i)) = ϕv◦π (i) • Additive: ϕv+w = ϕv +ϕw • Group proportional: There exists a constant d > 0 such that, for every i ∈ X and e ∈ E, ∑j∈C(i,e) ϕve (j) = d ·f(i,e) (where ve is defined by weighting e with 1 and all other edges by 0). If the distribution of diversity among species should satisfy these axioms, the Shapley value is the only possible choice. 17 / 26
  18. Fair proportion index Definition Given a rooted phylogenetic tree T

    over X, FPT (i) = ∑ i∈CT (e) ωT (e) κT (e) where ωT (e) is e’s weight, CT (e) is e’s cluster, and κT (e) = |CT (e)|. The weight of each edge is equally distributed among all its descendant leaves, and FP(i) is the sum of the share of i of the weights of all its ancestor edges. Applied in the Zoological Society of London’s EDGE project http://www.edgeofexistence.org 18 / 26
  19. FP vs Shapley Set • θFP (i,e): the contribution of

    e to FP(i): θFP (i,e) = ω(e)/κ(e) if i ∈ CT (e) 0 otherwise • θSV (i,e): the contribution of e to ϕi : θSV (i,e) = ω(e) n ∑ s=2 (s −1)!(n −s)! n! n −κ(e) s −1 = ω(e) n ∑ s=2 (n −κ(e))!(n −s)! n!(n −κ(e)−s +1)! if i ∈ CT (e) θSV (i,e) = ω(e) n ∑ s=2 κ(e)!(n −s)! n!(κ(e)−s +1)! if i / ∈ CT (e) 21 / 26
  20. FP vs Shapley Theorem? (Hartmann, 2013) When n → ∞,

    θFP (i,e) = θSV (i,e) for every e ∈ E and i ∈ X, and therefore, when n → ∞, FP(i) = ϕ(i). The FP index is a good approximation for the value of a taxon in a large phylogenetic tree “The Shapley value should only be used if minor gains in the quality of the index are more important than transparency and simplicity” 22 / 26
  21. FP vs Shapley But. . . This is the wrong

    question! FP takes into account the paths from the root to the leaves On a rooted tree, we can use rPD, that involves the root. 23 / 26
  22. FP vs Shapley But. . . This is the wrong

    question! FP takes into account the paths from the root to the leaves On a rooted tree, we can use rPD, that involves the root. Theorem (Fuchs, Jin, 2015) If we use rPD to define the Shapley value on a rooted tree, then FP = ϕ Main ingredient: FPT (a) = ω(e) κ(e) +FPTl (a) ϕT (a) = ω(e) κ(e) +ϕTl (a) 23 / 26
  23. FP vs Shapley This allows, using the usual techniques, to

    prove the following result: Theorem (Fuchs, Jin, 2015) Under the Yule model, E(ϕ(n)) = 3 − 2 n , σ2(ϕ(n)) ∼ 10 −π2 Under the uniform model, E(ϕ(n)) = 3 − 2 n , σ2(ϕ(n)) ∼ 12ln(2)−8 24 / 26
  24. Some extra problems • Trees on S can be understood

    as the linear subspace A ⊆ R (n 2 ) of additive metrics, and therefore Shapley values define a map ϕ : A → Rn. Is there a “natural” extension of this map to R (n 2 )? Or at least to the space of metrics on S? “Graph” metrics are treated in the next session. • If, in a tree T, we take the k species with the highest Shapley values, is there some bound, or relation or something (in terms only of k and n) relating the phylogenetic diversity of these species and the total phylogenetic diversity? • What properties do the Shapley values on (different types of) phylogenetic networks have? 25 / 26
  25. Next session Shapley value of (social, metabolic, gene, but not

    PPI, so far) network games (À suivre. . . ) 26 / 26