Arnout Boks
June 08, 2019
320

# PageRank all the things! - Dutch PHP Conference 2019

Joind.in: https://joind.in/talk/fb7e9

Most people know PageRank as Google’s algorithm for ranking search results, but it’s uses extend far beyond only that: PageRank has already been utilised for analysing social networks, finding the most important functions in source code, predicting traffic, and deriving a more accurate ranking table of teams in an ongoing sports competition. In this session we will cover the basics of linear algebra, developing an intuitive notion of how matrices and vectors interact, and use it to understand the principles of PageRank. Then we’ll jump straight into real-life applications of PageRank beyond web search and how these can be implemented in PHP using the math-php library.

June 08, 2019

## Transcript

3. ### @arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2

30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 next match: Netwerk 1 – Delta 4
4. ### @arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2

30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 Can we account for the fact that some teams have yet played against weaker opposition than others?
5. ### @arnoutboks #dpc19 To be continued… \$ git stash Saved working

directory and index state WIP on master: 5002d47 PageRank all the things! HEAD is now at 5002d47 PageRank all the things!

and matrices

10. ### @arnoutboks #dpc19 Vectors v = 8 3 -4 1 4

dimension 4 (“4-vector”)

(“2-vector”)

-2 x y
13. ### @arnoutboks #dpc19 Scalar multiplication v = -1.5 1 3v =

-1.5 1 3 = -4.5 3

15. ### @arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4

-7 1 1 2 6 M =
16. ### @arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4

-7 1 1 2 6 M = 4 dimension 4, 3 (“4x3-matrix”) 3
17. ### @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 M v 8 3 -4
18. ### @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 M 3 v 8 3 -4 3
19. ### @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 M v 8 3 -4 Mv =
20. ### @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 M v 8 3 -4 8 3 -4 Mv =
21. ### @arnoutboks #dpc19 Matrix-vector multiplication Mv = 8 x 8 0

x 3 42 x -4 3 x 8 3 x 3 7 x -4 4 x 8 -7 x 3 1 x -4 1 x 8 2 x 3 6 x -4 8 3 -4
22. ### @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 0 -168 24

9 -28 32 -21 -4 8 6 -24
23. ### @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 +

-168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24
24. ### @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 +

-168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24 = -104 5 7 -10
25. ### @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10
26. ### @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10 “4x3-matrix multiplied by a 3-vector yields a 4-vector”

28. ### @arnoutboks #dpc19 Matrix as a transformation 8 3 -4 -104

5 7 -10 M 4x3-matrix 3D 4D

2D

2D
31. ### @arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix

2D Is there a 2x3-matrix describing this transformation?
32. ### @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1
33. ### @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 8 3 4 1 = 1 27
34. ### @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 0 3 -7 2 = 1 16 27 1
35. ### @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 42 7 1 6 = 1 16 46 27 1 127
36. ### @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127
37. ### @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127 “2x4-matrix multiplied by 4x3-matrix yields a 2x3-matrix”
38. ### @arnoutboks #dpc19 Matrix multiplication as chained transformation M 4x3-matrix 4D

3D N 2x4-matrix 2D 3D NM 2x3-matrix 2D N(Mv) = (NM)v

1 0 M =

0 -1 M =
43. ### @arnoutboks #dpc19 Square matrices x y v Mv 1 0

0 -1 M = u = Mu

D A G
49. ### @arnoutboks #dpc19 The PageRank of a page depends on the

PageRank of the pages linking to it

51. ### @arnoutboks #dpc19 Approach Let n be the number of web

pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores
52. ### @arnoutboks #dpc19 Approach Let n be the number of web

pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores s = Ms
53. ### @arnoutboks #dpc19 Creating the matrix M A B C D

A 0 0 0 0 B 0 0 0 0 C 0 0 0 0 D 1 0 0 0 Outbound 1 0 0 0 A D
54. ### @arnoutboks #dpc19 Creating the matrix M A D A B

C D A 0 1 0 0 B 1 0 1 0 C 0 1 0 1 D 1 1 0 0 Outbound 2 3 1 1 B C
55. ### @arnoutboks #dpc19 Creating the matrix M A D A B

C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 Outbound 2 3 1 1 B C
56. ### @arnoutboks #dpc19 Equation s = Ms A B C D

A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s
57. ### @arnoutboks #dpc19 Equation s = Ms A B C D

A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD
58. ### @arnoutboks #dpc19 Equation s = Ms A B C D

A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD =
59. ### @arnoutboks #dpc19 Equation s = Ms A B C D

A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD = “The higher A scores, the more ‘points’ D gets for being one of the two pages A links to”
60. ### @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

s such that for the given matrix M? s = Ms
61. ### @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

s and a number λ such that for the given matrix M? λs = Ms
62. ### @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

s and a number λ such that for the given matrix M? λs = Ms A: Yes, with λ = 1 Fine print: under certain technical conditions. Lookup "Perron–Frobenius theorem” if you’re interested.

64. ### @arnoutboks #dpc19 PageRank as simulated surfing • Someone starts surfing

somewhere on the internet • On every page, they follow a random link • What page is shown when the buzzer goes? • Score of a page is the probability of ending up on it
65. ### @arnoutboks #dpc19 PageRank as simulated surfing A B C D

A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD s = “When on page A, there’s a ½ chance of going to D”
66. ### @arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability

per page after n clicks probability per page after n+1 clicks
67. ### @arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability

per page after n clicks probability per page after n+1 clicks s(0) reasonable initial guess
68. ### @arnoutboks #dpc19 Power Method is proven to converge to the

eigenvector (for a matrix M like we have)

70. ### @arnoutboks #dpc19 Implementation in PHP <?php use Aboks\PowerIteration\PowerIteration; use MathPHP\LinearAlgebra\Matrix;

\$m = new Matrix([/* ... */]); \$pi = new PowerIteration(); \$pair = \$pi->getDominantEigenpair(\$m); var_dump(\$pair->getEigenvector());
71. ### @arnoutboks #dpc19 Behind the scenes <?php function getEigenvector(Matrix \$m): Vector

{ \$ones = array_fill(0, \$m->getM(), 1); \$v = new Vector(\$ones); for (\$i = 0; \$i < 1000; \$i++) { \$v = \$m->vectorMultiply(\$v); } return \$v; }
72. ### @arnoutboks #dpc19 Implementation concerns Stopping criterion • Number of iterations

• Eigenvector tolerance Scaling intermediate results Preventing reducible matrices

75. ### @arnoutboks #dpc19 Preventing reducible matrices When moving on to a

new page: • α probability of following a link • (1- α) probability of ‘teleporting’ to a random page
76. ### @arnoutboks #dpc19 Preventing reducible matrices A B C D A

(1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) B (1/n)(1 – α) + 1/2α (1/n)(1 – α) (1/n)(1 – α) + 1α (1/n)(1 – α) C (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) + 1α D (1/n)(1 – α) + 1/2α (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) M’i,j = (1/n)(1 – α) + αMi,j

78. ### @arnoutboks #dpc19 PageRank in general Analysis of networks/directed graphs: •

Edge A → B increases score of B • The higher the score of A, the higher the score of B

B

B

class B
84. ### @arnoutboks #dpc19 CodeRank for PHP pdepend/pdepend Calculates metrics including CodeRank

and Reverse CodeRank

86. ### @arnoutboks #dpc19 PageRank is not very difficult to implement… …but

applications are countless
87. ### @arnoutboks #dpc19 Back to our story… \$ git stash pop

On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: volleyball-story.txt Dropped refs/stash@{0}(b180f4)

B

to B 3 2
90. ### @arnoutboks #dpc19 The results # Team Pnt 1 DEO 2

30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4
91. ### @arnoutboks #dpc19 The results # Team Pnt 1 DEO 2

30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 # Team % 1 Punch 6 25.79 2 DEO 2 16.86 3 Red Stars 1 15.31 4 Delta 5 14.13 5 Delta 4 11.03 6 Netwerk 1 8.79 7 Kalinko 6 3.13 8 Kratos '08 7 2.49 9 Punch 9 1.24 10 Sovicos 4 1.23