Slide 1

Slide 1 text

PageRank all the things! @arnoutboks Arnout Boks #dpc19 08-06-2019

Slide 2

Slide 2 text

@arnoutboks #dpc19 Story time

Slide 3

Slide 3 text

@arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2 30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 next match: Netwerk 1 – Delta 4

Slide 4

Slide 4 text

@arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2 30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 Can we account for the fact that some teams have yet played against weaker opposition than others?

Slide 5

Slide 5 text

@arnoutboks #dpc19 To be continued… $ git stash Saved working directory and index state WIP on master: 5002d47 PageRank all the things! HEAD is now at 5002d47 PageRank all the things!

Slide 6

Slide 6 text

@arnoutboks #dpc19 PageRank

Slide 7

Slide 7 text

Linear Algebra Some necessary math

Slide 8

Slide 8 text

@arnoutboks #dpc19 Linear Algebra is the discipline that studies vectors and matrices

Slide 9

Slide 9 text

@arnoutboks #dpc19 Vectors v = 8 3 -4 1

Slide 10

Slide 10 text

@arnoutboks #dpc19 Vectors v = 8 3 -4 1 4 dimension 4 (“4-vector”)

Slide 11

Slide 11 text

@arnoutboks #dpc19 Vectors v = -π 4.2 2 dimension 2 (“2-vector”)

Slide 12

Slide 12 text

@arnoutboks #dpc19 Vectors as coordinates 1 1 1 -2 -3 -2 x y

Slide 13

Slide 13 text

@arnoutboks #dpc19 Scalar multiplication v = -1.5 1 3v = -1.5 1 3 = -4.5 3

Slide 14

Slide 14 text

@arnoutboks #dpc19 Scalar multiplication x y v 3v

Slide 15

Slide 15 text

@arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4 -7 1 1 2 6 M =

Slide 16

Slide 16 text

@arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4 -7 1 1 2 6 M = 4 dimension 4, 3 (“4x3-matrix”) 3

Slide 17

Slide 17 text

@arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 M v 8 3 -4

Slide 18

Slide 18 text

@arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 M 3 v 8 3 -4 3

Slide 19

Slide 19 text

@arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 M v 8 3 -4 Mv =

Slide 20

Slide 20 text

@arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 M v 8 3 -4 8 3 -4 Mv =

Slide 21

Slide 21 text

@arnoutboks #dpc19 Matrix-vector multiplication Mv = 8 x 8 0 x 3 42 x -4 3 x 8 3 x 3 7 x -4 4 x 8 -7 x 3 1 x -4 1 x 8 2 x 3 6 x -4 8 3 -4

Slide 22

Slide 22 text

@arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 0 -168 24 9 -28 32 -21 -4 8 6 -24

Slide 23

Slide 23 text

@arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 + -168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24

Slide 24

Slide 24 text

@arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 + -168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24 = -104 5 7 -10

Slide 25

Slide 25 text

@arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10

Slide 26

Slide 26 text

@arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10 “4x3-matrix multiplied by a 3-vector yields a 4-vector”

Slide 27

Slide 27 text

The Matrix In more detail

Slide 28

Slide 28 text

@arnoutboks #dpc19 Matrix as a transformation 8 3 -4 -104 5 7 -10 M 4x3-matrix 3D 4D

Slide 29

Slide 29 text

@arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix 2D

Slide 30

Slide 30 text

@arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix 2D

Slide 31

Slide 31 text

@arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix 2D Is there a 2x3-matrix describing this transformation?

Slide 32

Slide 32 text

@arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1

Slide 33

Slide 33 text

@arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 8 3 4 1 = 1 27

Slide 34

Slide 34 text

@arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 0 3 -7 2 = 1 16 27 1

Slide 35

Slide 35 text

@arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 42 7 1 6 = 1 16 46 27 1 127

Slide 36

Slide 36 text

@arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127

Slide 37

Slide 37 text

@arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7 4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127 “2x4-matrix multiplied by 4x3-matrix yields a 2x3-matrix”

Slide 38

Slide 38 text

@arnoutboks #dpc19 Matrix multiplication as chained transformation M 4x3-matrix 4D 3D N 2x4-matrix 2D 3D NM 2x3-matrix 2D N(Mv) = (NM)v

Slide 39

Slide 39 text

@arnoutboks #dpc19 Square matrices 2D M 2x2-matrix 2D

Slide 40

Slide 40 text

@arnoutboks #dpc19 Square matrices 2D M 2x2-matrix 2D

Slide 41

Slide 41 text

@arnoutboks #dpc19 Square matrices x y v Mv 0 -1 1 0 M =

Slide 42

Slide 42 text

@arnoutboks #dpc19 Square matrices x y v Mv 1 0 0 -1 M =

Slide 43

Slide 43 text

@arnoutboks #dpc19 Square matrices x y v Mv 1 0 0 -1 M = u = Mu

Slide 44

Slide 44 text

PageRank Ranking web search results

Slide 45

Slide 45 text

@arnoutboks #dpc19 Web pages and links A B

Slide 46

Slide 46 text

@arnoutboks #dpc19 Web pages and links A B

Slide 47

Slide 47 text

@arnoutboks #dpc19 Web pages and links A B C D

Slide 48

Slide 48 text

@arnoutboks #dpc19 Web pages and links B E C F D A G

Slide 49

Slide 49 text

@arnoutboks #dpc19 The PageRank of a page depends on the PageRank of the pages linking to it

Slide 50

Slide 50 text

@arnoutboks #dpc19 Chicken and egg problem

Slide 51

Slide 51 text

@arnoutboks #dpc19 Approach Let n be the number of web pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores

Slide 52

Slide 52 text

@arnoutboks #dpc19 Approach Let n be the number of web pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores s = Ms

Slide 53

Slide 53 text

@arnoutboks #dpc19 Creating the matrix M A B C D A 0 0 0 0 B 0 0 0 0 C 0 0 0 0 D 1 0 0 0 Outbound 1 0 0 0 A D

Slide 54

Slide 54 text

@arnoutboks #dpc19 Creating the matrix M A D A B C D A 0 1 0 0 B 1 0 1 0 C 0 1 0 1 D 1 1 0 0 Outbound 2 3 1 1 B C

Slide 55

Slide 55 text

@arnoutboks #dpc19 Creating the matrix M A D A B C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 Outbound 2 3 1 1 B C

Slide 56

Slide 56 text

@arnoutboks #dpc19 Equation s = Ms A B C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s

Slide 57

Slide 57 text

@arnoutboks #dpc19 Equation s = Ms A B C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD

Slide 58

Slide 58 text

@arnoutboks #dpc19 Equation s = Ms A B C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD =

Slide 59

Slide 59 text

@arnoutboks #dpc19 Equation s = Ms A B C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD = “The higher A scores, the more ‘points’ D gets for being one of the two pages A links to”

Slide 60

Slide 60 text

@arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector s such that for the given matrix M? s = Ms

Slide 61

Slide 61 text

@arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector s and a number λ such that for the given matrix M? λs = Ms

Slide 62

Slide 62 text

@arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector s and a number λ such that for the given matrix M? λs = Ms A: Yes, with λ = 1 Fine print: under certain technical conditions. Lookup "Perron–Frobenius theorem” if you’re interested.

Slide 63

Slide 63 text

Calculating PageRank Beyond mere existence

Slide 64

Slide 64 text

@arnoutboks #dpc19 PageRank as simulated surfing • Someone starts surfing somewhere on the internet • On every page, they follow a random link • What page is shown when the buzzer goes? • Score of a page is the probability of ending up on it

Slide 65

Slide 65 text

@arnoutboks #dpc19 PageRank as simulated surfing A B C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD s = “When on page A, there’s a ½ chance of going to D”

Slide 66

Slide 66 text

@arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability per page after n clicks probability per page after n+1 clicks

Slide 67

Slide 67 text

@arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability per page after n clicks probability per page after n+1 clicks s(0) reasonable initial guess

Slide 68

Slide 68 text

@arnoutboks #dpc19 Power Method is proven to converge to the eigenvector (for a matrix M like we have)

Slide 69

Slide 69 text

@arnoutboks #dpc19 Implementation in PHP aboks/power-iteration (uses markrogoyski/math-php)

Slide 70

Slide 70 text

@arnoutboks #dpc19 Implementation in PHP getDominantEigenpair($m); var_dump($pair->getEigenvector());

Slide 71

Slide 71 text

@arnoutboks #dpc19 Behind the scenes getM(), 1); $v = new Vector($ones); for ($i = 0; $i < 1000; $i++) { $v = $m->vectorMultiply($v); } return $v; }

Slide 72

Slide 72 text

@arnoutboks #dpc19 Implementation concerns Stopping criterion • Number of iterations • Eigenvector tolerance Scaling intermediate results Preventing reducible matrices

Slide 73

Slide 73 text

@arnoutboks #dpc19 Preventing reducible matrices A B C D E

Slide 74

Slide 74 text

@arnoutboks #dpc19 Damping factor α Google uses α ≈ 0.85

Slide 75

Slide 75 text

@arnoutboks #dpc19 Preventing reducible matrices When moving on to a new page: • α probability of following a link • (1- α) probability of ‘teleporting’ to a random page

Slide 76

Slide 76 text

@arnoutboks #dpc19 Preventing reducible matrices A B C D A (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) B (1/n)(1 – α) + 1/2α (1/n)(1 – α) (1/n)(1 – α) + 1α (1/n)(1 – α) C (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) + 1α D (1/n)(1 – α) + 1/2α (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) M’i,j = (1/n)(1 – α) + αMi,j

Slide 77

Slide 77 text

Applications of PageRank Beyond web search

Slide 78

Slide 78 text

@arnoutboks #dpc19 PageRank in general Analysis of networks/directed graphs: • Edge A → B increases score of B • The higher the score of A, the higher the score of B

Slide 79

Slide 79 text

@arnoutboks #dpc19 Influence in social networks B A A follows B

Slide 80

Slide 80 text

@arnoutboks #dpc19 Food chains B A A eats B

Slide 81

Slide 81 text

@arnoutboks #dpc19 CodeRank function A A calls B function B

Slide 82

Slide 82 text

@arnoutboks #dpc19 CodeRank A depends on B class A class B

Slide 83

Slide 83 text

@arnoutboks #dpc19 Reverse CodeRank B depends on A class A class B

Slide 84

Slide 84 text

@arnoutboks #dpc19 CodeRank for PHP pdepend/pdepend Calculates metrics including CodeRank and Reverse CodeRank

Slide 85

Slide 85 text

@arnoutboks #dpc19 Package dependencies B A A depends on B

Slide 86

Slide 86 text

@arnoutboks #dpc19 PageRank is not very difficult to implement… …but applications are countless

Slide 87

Slide 87 text

@arnoutboks #dpc19 Back to our story… $ git stash pop On branch master Changes to be committed: (use "git reset HEAD ..." to unstage) new file: volleyball-story.txt Dropped refs/stash@{0}(b180f4)

Slide 88

Slide 88 text

@arnoutboks #dpc19 Ranking (incomplete) competitions B A A lost to B

Slide 89

Slide 89 text

@arnoutboks #dpc19 Ranking (incomplete) competitions B A A lost 2-3 to B 3 2

Slide 90

Slide 90 text

@arnoutboks #dpc19 The results # Team Pnt 1 DEO 2 30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4

Slide 91

Slide 91 text

@arnoutboks #dpc19 The results # Team Pnt 1 DEO 2 30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 # Team % 1 Punch 6 25.79 2 DEO 2 16.86 3 Red Stars 1 15.31 4 Delta 5 14.13 5 Delta 4 11.03 6 Netwerk 1 8.79 7 Kalinko 6 3.13 8 Kratos '08 7 2.49 9 Punch 9 1.24 10 Sovicos 4 1.23

Slide 92

Slide 92 text

@arnoutboks #dpc19 The end

Slide 93

Slide 93 text

@arnoutboks #dpc19 Feedback & Questions @arnoutboks @arnoutboks @aboks Arnout Boks Please leave your feedback on joind.in: https://joind.in/talk/fb7e9 We’re hiring!

Slide 94

Slide 94 text

@arnoutboks #dpc19 Image Credits • https://www.flickr.com/photos/42283496@N08/4011391702/ • https://pixabay.com/illustrations/google-search-engine-browser- search-76517/ • https://unsplash.com/photos/68ZlATaVYIo • https://imgur.com/gallery/SLHBV • https://www.flickr.com/photos/125329869@N03/14418114781 • https://unsplash.com/photos/ZiQkhI7417A