Time-sensitive Network Inference in Diffusion Networks

Time-sensitive Network Inference in Diffusion Networks

Final project presentation for the CS229 (machine learning) course of Spring 2014, at KAUST.

Ed09e933a899fcae158439f11f66fed0?s=128

Emaad Manzoor

May 08, 2014
Tweet

Transcript

  1. Time-sensitive Network Inference in Continuous-Time Diffusion Networks Emaad Ahmed Manzoor

    CS229: Final Presentation
  2. Networks

  3. Social Networks

  4. Epidemic Networks Individual stick figures from xkcd.com

  5. Information Networks

  6. Diffusion

  7. None
  8. 0

  9. 0

  10. 0

  11. 0

  12. 0 T

  13. Diffusion Formalization

  14. 0 T Cascade

  15. 0 T Parents & Children

  16. 0 T Infection Times t 1 t 2 t 3

    t 4
  17. 0 T Observation Limit t 1 t 2 t 3

    t 4
  18. 0 T Underlying Network t 1 t 2 t 3

    t 4
  19. 0 T Transmission Times 0.1 0.7 0.1 0.4 0.7 0.3

    0.5 0.8 0.6 0.9
  20. f(t i ) Probability that node i is infected at

    time t i
  21. f(t i ) Probability that node i is infected at

    time t i f(t i |t j ) Probability that node i is infected at time t i given that node j is infected at time t j
  22. f(t i ) Probability that node i is infected at

    time t i f(t i |t j ) Probability that node i is infected at time t i given that node j is infected at time t j f(t i |t j ) = f ij (t i - t j )
  23. Exponential Pairwise Transmission Function

  24. Network Inference

  25. Set C of cascades

  26. Set C of cascades Each cascade is a set of

    observations
  27. Set C of cascades Each cascade is a set of

    observations Each cascade is observed until a horizon time
  28. Set C of cascades Each cascade is a set of

    observations Each cascade is observed until a horizon time Nodes not infected before this horizon time
  29. Set C of cascades Each cascade is a set of

    observations Each cascade is observed until a horizon time Nodes not infected before this horizon time Pairwise transmission functions
  30. Set C of cascades Each cascade is a set of

    observations Each cascade is observed until a horizon time Nodes not infected before this horizon time Pairwise transmission functions Find transmission rates that maximise the likelihood of the observed cascades
  31. None
  32. Metrics Precision Recall

  33. State of the Art

  34. Uncovering the temporal dynamics of diffusion networks Influence maximisation in

    continuous-time diffusion networks Scalable influence estimation in continuous time diffusion networks Rodriguez et al. ICML '11 Rodriguez et al. ICML '12 Du et al. NIPS '13
  35. Rodriguez et al. ICML '11 Uncovering the temporal dynamics of

    diffusion networks 1. Define cascade likelihood as the objective function 2. Since this function is convex, the problem is a constrained maximisation problem over transmission rates
  36. Rodriguez et al. ICML '11 Uncovering the temporal dynamics of

    diffusion networks "Our formulation thus does not depend on the absolute time of infection of the root node" "Transmission functions are shift invariant, and do not depend on the absolute times of infection of the pair of nodes"
  37. Contributions

  38. Independent transmission rates

  39. Independent transmission rates Bayes network inference?

  40. Independent transmission rates Bayes network inference? ?

  41. States aSleep Awake

  42. Contribution 1: Formulate a time-dependent transmission function as a discrete

    mixture of distributions.
  43. Contribution 1: Formulate a time-dependent transmission function as a discrete

    mixture of distributions.
  44. Contribution 1: Formulate a time-dependent transmission function as a discrete

    mixture of distributions.
  45. Contribution 2: Model the time-dependent priors with circular normal distributions

  46. Per node Per edge Unknowns How do we fit these

    from the data?
  47. Contribution 3: EM algorithm to fit the unknown parameters from

    the data 1. Initialize the state for each node in each cascade randomly; S ic = random(A, S) 2. Estimate and d for every pair of nodes using convex optimisation (Manuel et al., 2011). 3. Estimate and using closed- form maximum-likelihood estimates. 4. Reassign new states S ic to nodes in each cascade
  48. Contribution 3: EM algorithm to fit the unknown parameters from

    the data 4. Reassign new states S ic to nodes in each cascade
  49. Results

  50. Synthetic data: 1024 nodes Kronecker core-periphery (Leskovec, '08), transmission times

    and root nodes chosen uniformly at random, 1000 cascades. Real data: Memetracker, 1M nodes, >100K cascades.
  51. Future

  52. Algorithm: Continuous states Remove stationarity assumption Implementation: Parallelism Speed Experiments:

    Real data New synthetic data
  53. .