Save 37% off PRO during our Black Friday Sale! »

PageRank all the things! - Dutch PHP Conference 2019

PageRank all the things! - Dutch PHP Conference 2019

Joind.in: https://joind.in/talk/fb7e9
Video recording: https://www.youtube.com/watch?v=AeZJnG9lfRs

Most people know PageRank as Google’s algorithm for ranking search results, but it’s uses extend far beyond only that: PageRank has already been utilised for analysing social networks, finding the most important functions in source code, predicting traffic, and deriving a more accurate ranking table of teams in an ongoing sports competition. In this session we will cover the basics of linear algebra, developing an intuitive notion of how matrices and vectors interact, and use it to understand the principles of PageRank. Then we’ll jump straight into real-life applications of PageRank beyond web search and how these can be implemented in PHP using the math-php library.

8dfcb5f1b3cd5397f19780e2319694da?s=128

Arnout Boks

June 08, 2019
Tweet

Transcript

  1. PageRank all the things! @arnoutboks Arnout Boks #dpc19 08-06-2019

  2. @arnoutboks #dpc19 Story time

  3. @arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 next match: Netwerk 1 – Delta 4
  4. @arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 Can we account for the fact that some teams have yet played against weaker opposition than others?
  5. @arnoutboks #dpc19 To be continued… $ git stash Saved working

    directory and index state WIP on master: 5002d47 PageRank all the things! HEAD is now at 5002d47 PageRank all the things!
  6. @arnoutboks #dpc19 PageRank

  7. Linear Algebra Some necessary math

  8. @arnoutboks #dpc19 Linear Algebra is the discipline that studies vectors

    and matrices
  9. @arnoutboks #dpc19 Vectors v = 8 3 -4 1

  10. @arnoutboks #dpc19 Vectors v = 8 3 -4 1 4

    dimension 4 (“4-vector”)
  11. @arnoutboks #dpc19 Vectors v = -π 4.2 2 dimension 2

    (“2-vector”)
  12. @arnoutboks #dpc19 Vectors as coordinates 1 1 1 -2 -3

    -2 x y
  13. @arnoutboks #dpc19 Scalar multiplication v = -1.5 1 3v =

    -1.5 1 3 = -4.5 3
  14. @arnoutboks #dpc19 Scalar multiplication x y v 3v

  15. @arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4

    -7 1 1 2 6 M =
  16. @arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4

    -7 1 1 2 6 M = 4 dimension 4, 3 (“4x3-matrix”) 3
  17. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 M v 8 3 -4
  18. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 M 3 v 8 3 -4 3
  19. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 M v 8 3 -4 Mv =
  20. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 M v 8 3 -4 8 3 -4 Mv =
  21. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 8 x 8 0

    x 3 42 x -4 3 x 8 3 x 3 7 x -4 4 x 8 -7 x 3 1 x -4 1 x 8 2 x 3 6 x -4 8 3 -4
  22. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 0 -168 24

    9 -28 32 -21 -4 8 6 -24
  23. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 +

    -168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24
  24. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 +

    -168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24 = -104 5 7 -10
  25. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10
  26. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10 “4x3-matrix multiplied by a 3-vector yields a 4-vector”
  27. The Matrix In more detail

  28. @arnoutboks #dpc19 Matrix as a transformation 8 3 -4 -104

    5 7 -10 M 4x3-matrix 3D 4D
  29. @arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix

    2D
  30. @arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix

    2D
  31. @arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix

    2D Is there a 2x3-matrix describing this transformation?
  32. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1
  33. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 8 3 4 1 = 1 27
  34. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 0 3 -7 2 = 1 16 27 1
  35. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 42 7 1 6 = 1 16 46 27 1 127
  36. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127
  37. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127 “2x4-matrix multiplied by 4x3-matrix yields a 2x3-matrix”
  38. @arnoutboks #dpc19 Matrix multiplication as chained transformation M 4x3-matrix 4D

    3D N 2x4-matrix 2D 3D NM 2x3-matrix 2D N(Mv) = (NM)v
  39. @arnoutboks #dpc19 Square matrices 2D M 2x2-matrix 2D

  40. @arnoutboks #dpc19 Square matrices 2D M 2x2-matrix 2D

  41. @arnoutboks #dpc19 Square matrices x y v Mv 0 -1

    1 0 M =
  42. @arnoutboks #dpc19 Square matrices x y v Mv 1 0

    0 -1 M =
  43. @arnoutboks #dpc19 Square matrices x y v Mv 1 0

    0 -1 M = u = Mu
  44. PageRank Ranking web search results

  45. @arnoutboks #dpc19 Web pages and links A B

  46. @arnoutboks #dpc19 Web pages and links A B

  47. @arnoutboks #dpc19 Web pages and links A B C D

  48. @arnoutboks #dpc19 Web pages and links B E C F

    D A G
  49. @arnoutboks #dpc19 The PageRank of a page depends on the

    PageRank of the pages linking to it
  50. @arnoutboks #dpc19 Chicken and egg problem

  51. @arnoutboks #dpc19 Approach Let n be the number of web

    pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores
  52. @arnoutboks #dpc19 Approach Let n be the number of web

    pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores s = Ms
  53. @arnoutboks #dpc19 Creating the matrix M A B C D

    A 0 0 0 0 B 0 0 0 0 C 0 0 0 0 D 1 0 0 0 Outbound 1 0 0 0 A D
  54. @arnoutboks #dpc19 Creating the matrix M A D A B

    C D A 0 1 0 0 B 1 0 1 0 C 0 1 0 1 D 1 1 0 0 Outbound 2 3 1 1 B C
  55. @arnoutboks #dpc19 Creating the matrix M A D A B

    C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 Outbound 2 3 1 1 B C
  56. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s
  57. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD
  58. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD =
  59. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD = “The higher A scores, the more ‘points’ D gets for being one of the two pages A links to”
  60. @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

    s such that for the given matrix M? s = Ms
  61. @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

    s and a number λ such that for the given matrix M? λs = Ms
  62. @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

    s and a number λ such that for the given matrix M? λs = Ms A: Yes, with λ = 1 Fine print: under certain technical conditions. Lookup "Perron–Frobenius theorem” if you’re interested.
  63. Calculating PageRank Beyond mere existence

  64. @arnoutboks #dpc19 PageRank as simulated surfing • Someone starts surfing

    somewhere on the internet • On every page, they follow a random link • What page is shown when the buzzer goes? • Score of a page is the probability of ending up on it
  65. @arnoutboks #dpc19 PageRank as simulated surfing A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD s = “When on page A, there’s a ½ chance of going to D”
  66. @arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability

    per page after n clicks probability per page after n+1 clicks
  67. @arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability

    per page after n clicks probability per page after n+1 clicks s(0) reasonable initial guess
  68. @arnoutboks #dpc19 Power Method is proven to converge to the

    eigenvector (for a matrix M like we have)
  69. @arnoutboks #dpc19 Implementation in PHP aboks/power-iteration (uses markrogoyski/math-php)

  70. @arnoutboks #dpc19 Implementation in PHP <?php use Aboks\PowerIteration\PowerIteration; use MathPHP\LinearAlgebra\Matrix;

    $m = new Matrix([/* ... */]); $pi = new PowerIteration(); $pair = $pi->getDominantEigenpair($m); var_dump($pair->getEigenvector());
  71. @arnoutboks #dpc19 Behind the scenes <?php function getEigenvector(Matrix $m): Vector

    { $ones = array_fill(0, $m->getM(), 1); $v = new Vector($ones); for ($i = 0; $i < 1000; $i++) { $v = $m->vectorMultiply($v); } return $v; }
  72. @arnoutboks #dpc19 Implementation concerns Stopping criterion • Number of iterations

    • Eigenvector tolerance Scaling intermediate results Preventing reducible matrices
  73. @arnoutboks #dpc19 Preventing reducible matrices A B C D E

  74. @arnoutboks #dpc19 Damping factor α Google uses α ≈ 0.85

  75. @arnoutboks #dpc19 Preventing reducible matrices When moving on to a

    new page: • α probability of following a link • (1- α) probability of ‘teleporting’ to a random page
  76. @arnoutboks #dpc19 Preventing reducible matrices A B C D A

    (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) B (1/n)(1 – α) + 1/2α (1/n)(1 – α) (1/n)(1 – α) + 1α (1/n)(1 – α) C (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) + 1α D (1/n)(1 – α) + 1/2α (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) M’i,j = (1/n)(1 – α) + αMi,j
  77. Applications of PageRank Beyond web search

  78. @arnoutboks #dpc19 PageRank in general Analysis of networks/directed graphs: •

    Edge A → B increases score of B • The higher the score of A, the higher the score of B
  79. @arnoutboks #dpc19 Influence in social networks B A A follows

    B
  80. @arnoutboks #dpc19 Food chains B A A eats B

  81. @arnoutboks #dpc19 CodeRank function A A calls B function B

  82. @arnoutboks #dpc19 CodeRank A depends on B class A class

    B
  83. @arnoutboks #dpc19 Reverse CodeRank B depends on A class A

    class B
  84. @arnoutboks #dpc19 CodeRank for PHP pdepend/pdepend Calculates metrics including CodeRank

    and Reverse CodeRank
  85. @arnoutboks #dpc19 Package dependencies B A A depends on B

  86. @arnoutboks #dpc19 PageRank is not very difficult to implement… …but

    applications are countless
  87. @arnoutboks #dpc19 Back to our story… $ git stash pop

    On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: volleyball-story.txt Dropped refs/stash@{0}(b180f4)
  88. @arnoutboks #dpc19 Ranking (incomplete) competitions B A A lost to

    B
  89. @arnoutboks #dpc19 Ranking (incomplete) competitions B A A lost 2-3

    to B 3 2
  90. @arnoutboks #dpc19 The results # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4
  91. @arnoutboks #dpc19 The results # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 # Team % 1 Punch 6 25.79 2 DEO 2 16.86 3 Red Stars 1 15.31 4 Delta 5 14.13 5 Delta 4 11.03 6 Netwerk 1 8.79 7 Kalinko 6 3.13 8 Kratos '08 7 2.49 9 Punch 9 1.24 10 Sovicos 4 1.23
  92. @arnoutboks #dpc19 The end

  93. @arnoutboks #dpc19 Feedback & Questions @arnoutboks @arnoutboks @aboks Arnout Boks

    Please leave your feedback on joind.in: https://joind.in/talk/fb7e9 We’re hiring!
  94. @arnoutboks #dpc19 Image Credits • https://www.flickr.com/photos/42283496@N08/4011391702/ • https://pixabay.com/illustrations/google-search-engine-browser- search-76517/ •

    https://unsplash.com/photos/68ZlATaVYIo • https://imgur.com/gallery/SLHBV • https://www.flickr.com/photos/125329869@N03/14418114781 • https://unsplash.com/photos/ZiQkhI7417A