$30 off During Our Annual Pro Sale. View Details »

PageRank all the things! - Dutch PHP Conference 2019

PageRank all the things! - Dutch PHP Conference 2019

Joind.in: https://joind.in/talk/fb7e9
Video recording: https://www.youtube.com/watch?v=AeZJnG9lfRs

Most people know PageRank as Google’s algorithm for ranking search results, but it’s uses extend far beyond only that: PageRank has already been utilised for analysing social networks, finding the most important functions in source code, predicting traffic, and deriving a more accurate ranking table of teams in an ongoing sports competition. In this session we will cover the basics of linear algebra, developing an intuitive notion of how matrices and vectors interact, and use it to understand the principles of PageRank. Then we’ll jump straight into real-life applications of PageRank beyond web search and how these can be implemented in PHP using the math-php library.

Arnout Boks

June 08, 2019
Tweet

More Decks by Arnout Boks

Other Decks in Programming

Transcript

  1. PageRank all the things!
    @arnoutboks
    Arnout Boks
    #dpc19
    08-06-2019

    View Slide

  2. @arnoutboks #dpc19
    Story time

    View Slide

  3. @arnoutboks #dpc19
    Story time
    # Team Pnt
    1 DEO 2 30
    2 Netwerk 1 24
    3 Punch 6 24
    4 Delta 5 23
    5 Red Stars 1 23
    6 Delta 4 17
    7 Kalinko 6 12
    8 Kratos 7 7
    9 Punch 9 6
    10 Sovicos 4 4
    next match:
    Netwerk 1 – Delta 4

    View Slide

  4. @arnoutboks #dpc19
    Story time
    # Team Pnt
    1 DEO 2 30
    2 Netwerk 1 24
    3 Punch 6 24
    4 Delta 5 23
    5 Red Stars 1 23
    6 Delta 4 17
    7 Kalinko 6 12
    8 Kratos 7 7
    9 Punch 9 6
    10 Sovicos 4 4
    Can we account for the
    fact that some teams
    have yet played against
    weaker opposition than
    others?

    View Slide

  5. @arnoutboks #dpc19
    To be continued…
    $ git stash
    Saved working directory and index state WIP on
    master: 5002d47 PageRank all the things!
    HEAD is now at 5002d47 PageRank all the things!

    View Slide

  6. @arnoutboks #dpc19
    PageRank

    View Slide

  7. Linear Algebra
    Some necessary math

    View Slide

  8. @arnoutboks #dpc19
    Linear Algebra is the
    discipline that studies
    vectors and matrices

    View Slide

  9. @arnoutboks #dpc19
    Vectors
    v =
    8
    3
    -4
    1

    View Slide

  10. @arnoutboks #dpc19
    Vectors
    v =
    8
    3
    -4
    1
    4
    dimension 4
    (“4-vector”)

    View Slide

  11. @arnoutboks #dpc19
    Vectors
    v =

    4.2
    2
    dimension 2
    (“2-vector”)

    View Slide

  12. @arnoutboks #dpc19
    Vectors as coordinates
    1
    1
    1
    -2
    -3
    -2
    x
    y

    View Slide

  13. @arnoutboks #dpc19
    Scalar multiplication
    v =
    -1.5
    1
    3v =
    -1.5
    1
    3 = -4.5
    3

    View Slide

  14. @arnoutboks #dpc19
    Scalar multiplication
    x
    y
    v
    3v

    View Slide

  15. @arnoutboks #dpc19
    Matrices
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    M =

    View Slide

  16. @arnoutboks #dpc19
    Matrices
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    M = 4
    dimension 4, 3
    (“4x3-matrix”)
    3

    View Slide

  17. @arnoutboks #dpc19
    Matrix-vector multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    M v
    8
    3
    -4

    View Slide

  18. @arnoutboks #dpc19
    Matrix-vector multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    M
    3
    v
    8
    3
    -4
    3

    View Slide

  19. @arnoutboks #dpc19
    Matrix-vector multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    M v
    8
    3
    -4
    Mv =

    View Slide

  20. @arnoutboks #dpc19
    Matrix-vector multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    M v
    8
    3
    -4
    8 3 -4
    Mv =

    View Slide

  21. @arnoutboks #dpc19
    Matrix-vector multiplication
    Mv =
    8 x 8 0 x 3 42 x -4
    3 x 8 3 x 3 7 x -4
    4 x 8 -7 x 3 1 x -4
    1 x 8 2 x 3 6 x -4
    8 3 -4

    View Slide

  22. @arnoutboks #dpc19
    Matrix-vector multiplication
    Mv =
    64 0 -168
    24 9 -28
    32 -21 -4
    8 6 -24

    View Slide

  23. @arnoutboks #dpc19
    Matrix-vector multiplication
    Mv =
    64 + 0 + -168
    24 + 9 + -28
    32 + -21 + -4
    8 + 6 + -24

    View Slide

  24. @arnoutboks #dpc19
    Matrix-vector multiplication
    Mv =
    64 + 0 + -168
    24 + 9 + -28
    32 + -21 + -4
    8 + 6 + -24
    =
    -104
    5
    7
    -10

    View Slide

  25. @arnoutboks #dpc19
    Matrix-vector multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    8
    3
    -4
    Mv = =
    -104
    5
    7
    -10

    View Slide

  26. @arnoutboks #dpc19
    Matrix-vector multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    8
    3
    -4
    Mv = =
    -104
    5
    7
    -10
    “4x3-matrix multiplied by a 3-vector yields a 4-vector”

    View Slide

  27. The Matrix
    In more detail

    View Slide

  28. @arnoutboks #dpc19
    Matrix as a transformation
    8
    3
    -4
    -104
    5
    7
    -10
    M
    4x3-matrix
    3D
    4D

    View Slide

  29. @arnoutboks #dpc19
    Chaining transformations
    M
    4x3-matrix
    4D 3D
    N
    2x4-matrix
    2D

    View Slide

  30. @arnoutboks #dpc19
    Chaining transformations
    M
    4x3-matrix
    4D 3D
    N
    2x4-matrix
    2D

    View Slide

  31. @arnoutboks #dpc19
    Chaining transformations
    M
    4x3-matrix
    4D 3D
    N
    2x4-matrix
    2D
    Is there a 2x3-matrix describing
    this transformation?

    View Slide

  32. @arnoutboks #dpc19
    Matrix multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    NM =
    1 0 -2 1
    3 1 0 -1

    View Slide

  33. @arnoutboks #dpc19
    Matrix multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    NM =
    1 0 -2 1
    3 1 0 -1
    8 3 4 1
    = 1
    27

    View Slide

  34. @arnoutboks #dpc19
    Matrix multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    NM =
    1 0 -2 1
    3 1 0 -1
    0 3 -7 2
    = 1 16
    27 1

    View Slide

  35. @arnoutboks #dpc19
    Matrix multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    NM =
    1 0 -2 1
    3 1 0 -1
    42 7 1 6
    = 1 16 46
    27 1 127

    View Slide

  36. @arnoutboks #dpc19
    Matrix multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    NM =
    1 0 -2 1
    3 1 0 -1
    = 1 16 46
    27 1 127

    View Slide

  37. @arnoutboks #dpc19
    Matrix multiplication
    8 0 42
    3 3 7
    4 -7 1
    1 2 6
    NM =
    1 0 -2 1
    3 1 0 -1
    = 1 16 46
    27 1 127
    “2x4-matrix multiplied by 4x3-matrix yields a 2x3-matrix”

    View Slide

  38. @arnoutboks #dpc19
    Matrix multiplication as chained
    transformation
    M
    4x3-matrix
    4D 3D
    N
    2x4-matrix
    2D
    3D
    NM
    2x3-matrix
    2D
    N(Mv)
    =
    (NM)v

    View Slide

  39. @arnoutboks #dpc19
    Square matrices
    2D
    M
    2x2-matrix
    2D

    View Slide

  40. @arnoutboks #dpc19
    Square matrices
    2D
    M
    2x2-matrix
    2D

    View Slide

  41. @arnoutboks #dpc19
    Square matrices
    x
    y
    v
    Mv
    0 -1
    1 0
    M =

    View Slide

  42. @arnoutboks #dpc19
    Square matrices
    x
    y
    v
    Mv
    1 0
    0 -1
    M =

    View Slide

  43. @arnoutboks #dpc19
    Square matrices
    x
    y
    v
    Mv
    1 0
    0 -1
    M =
    u = Mu

    View Slide

  44. PageRank
    Ranking web search results

    View Slide

  45. @arnoutboks #dpc19
    Web pages and links
    A B

    View Slide

  46. @arnoutboks #dpc19
    Web pages and links
    A B

    View Slide

  47. @arnoutboks #dpc19
    Web pages and links
    A B
    C D

    View Slide

  48. @arnoutboks #dpc19
    Web pages and links
    B E
    C
    F
    D
    A
    G

    View Slide

  49. @arnoutboks #dpc19
    The PageRank of a page
    depends on the PageRank
    of the pages linking to it

    View Slide

  50. @arnoutboks #dpc19
    Chicken and egg problem

    View Slide

  51. @arnoutboks #dpc19
    Approach
    Let n be the number of web pages
    Let s be an n-vector of scores for these pages
    Let M be an n×n-matrix describing the dependencies
    of scores

    View Slide

  52. @arnoutboks #dpc19
    Approach
    Let n be the number of web pages
    Let s be an n-vector of scores for these pages
    Let M be an n×n-matrix describing the dependencies
    of scores
    s = Ms

    View Slide

  53. @arnoutboks #dpc19
    Creating the matrix M
    A B C D
    A 0 0 0 0
    B 0 0 0 0
    C 0 0 0 0
    D 1 0 0 0
    Outbound 1 0 0 0
    A D

    View Slide

  54. @arnoutboks #dpc19
    Creating the matrix M
    A D
    A B C D
    A 0 1 0 0
    B 1 0 1 0
    C 0 1 0 1
    D 1 1 0 0
    Outbound 2 3 1 1
    B C

    View Slide

  55. @arnoutboks #dpc19
    Creating the matrix M
    A D
    A B C D
    A 0 1/3 0 0
    B 1/2 0 1 0
    C 0 1/3 0 1
    D 1/2 1/3 0 0
    Outbound 2 3 1 1
    B C

    View Slide

  56. @arnoutboks #dpc19
    Equation s = Ms
    A B C D
    A 0 1/3 0 0
    B 1/2 0 1 0
    C 0 1/3 0 1
    D 1/2 1/3 0 0
    sA
    sB
    sC
    sD
    M s

    View Slide

  57. @arnoutboks #dpc19
    Equation s = Ms
    A B C D
    A 0 1/3 0 0
    B 1/2 0 1 0
    C 0 1/3 0 1
    D 1/2 1/3 0 0
    sA
    sB
    sC
    sD
    M s
    sA
    sB
    sC
    sD

    View Slide

  58. @arnoutboks #dpc19
    Equation s = Ms
    A B C D
    A 0 1/3 0 0
    B 1/2 0 1 0
    C 0 1/3 0 1
    D 1/2 1/3 0 0
    sA
    sB
    sC
    sD
    M s
    sA
    sB
    sC
    sD
    =

    View Slide

  59. @arnoutboks #dpc19
    Equation s = Ms
    A B C D
    A 0 1/3 0 0
    B 1/2 0 1 0
    C 0 1/3 0 1
    D 1/2 1/3 0 0
    sA
    sB
    sC
    sD
    M s
    sA
    sB
    sC
    sD
    =
    “The higher A scores, the more
    ‘points’ D gets for being one of
    the two pages A links to”

    View Slide

  60. @arnoutboks #dpc19
    Eigenvalue problem
    Q: Does there exist a vector s such that
    for the given matrix M?
    s = Ms

    View Slide

  61. @arnoutboks #dpc19
    Eigenvalue problem
    Q: Does there exist a vector s and a number λ such that
    for the given matrix M?
    λs = Ms

    View Slide

  62. @arnoutboks #dpc19
    Eigenvalue problem
    Q: Does there exist a vector s and a number λ such that
    for the given matrix M?
    λs = Ms
    A: Yes, with λ = 1
    Fine print: under certain technical conditions. Lookup "Perron–Frobenius theorem” if
    you’re interested.

    View Slide

  63. Calculating PageRank
    Beyond mere existence

    View Slide

  64. @arnoutboks #dpc19
    PageRank as simulated surfing
    • Someone starts surfing somewhere on the internet
    • On every page, they follow a random link
    • What page is shown when the buzzer goes?
    • Score of a page is the probability of ending up on it

    View Slide

  65. @arnoutboks #dpc19
    PageRank as simulated surfing
    A B C D
    A 0 1/3 0 0
    B 1/2 0 1 0
    C 0 1/3 0 1
    D 1/2 1/3 0 0
    sA
    sB
    sC
    sD
    M s
    sA
    sB
    sC
    sD
    s
    =
    “When on page A, there’s a ½
    chance of going to D”

    View Slide

  66. @arnoutboks #dpc19
    Power Method
    M s(n)
    s(n+1)
    transition
    probabilities
    probability
    per page
    after n clicks
    probability
    per page after
    n+1 clicks

    View Slide

  67. @arnoutboks #dpc19
    Power Method
    M s(n)
    s(n+1)
    transition
    probabilities
    probability
    per page
    after n clicks
    probability
    per page after
    n+1 clicks
    s(0) reasonable initial guess

    View Slide

  68. @arnoutboks #dpc19
    Power Method is proven to
    converge to the eigenvector
    (for a matrix M like we have)

    View Slide

  69. @arnoutboks #dpc19
    Implementation in PHP
    aboks/power-iteration
    (uses markrogoyski/math-php)

    View Slide

  70. @arnoutboks #dpc19
    Implementation in PHP
    use Aboks\PowerIteration\PowerIteration;
    use MathPHP\LinearAlgebra\Matrix;
    $m = new Matrix([/* ... */]);
    $pi = new PowerIteration();
    $pair = $pi->getDominantEigenpair($m);
    var_dump($pair->getEigenvector());

    View Slide

  71. @arnoutboks #dpc19
    Behind the scenes
    function getEigenvector(Matrix $m): Vector
    {
    $ones = array_fill(0, $m->getM(), 1);
    $v = new Vector($ones);
    for ($i = 0; $i < 1000; $i++) {
    $v = $m->vectorMultiply($v);
    }
    return $v;
    }

    View Slide

  72. @arnoutboks #dpc19
    Implementation concerns
    Stopping criterion
    • Number of iterations
    • Eigenvector tolerance
    Scaling intermediate results
    Preventing reducible matrices

    View Slide

  73. @arnoutboks #dpc19
    Preventing reducible matrices
    A B
    C D
    E

    View Slide

  74. @arnoutboks #dpc19
    Damping factor α
    Google uses α ≈ 0.85

    View Slide

  75. @arnoutboks #dpc19
    Preventing reducible matrices
    When moving on to a new page:
    • α probability of following a link
    • (1- α) probability of ‘teleporting’ to a random page

    View Slide

  76. @arnoutboks #dpc19
    Preventing reducible matrices
    A B C D
    A
    (1/n)(1 – α)
    (1/n)(1 – α)
    + 1/3α
    (1/n)(1 – α) (1/n)(1 – α)
    B (1/n)(1 – α)
    + 1/2α
    (1/n)(1 – α)
    (1/n)(1 – α)
    + 1α
    (1/n)(1 – α)
    C
    (1/n)(1 – α)
    (1/n)(1 – α)
    + 1/3α
    (1/n)(1 – α)
    (1/n)(1 – α)
    + 1α
    D (1/n)(1 – α)
    + 1/2α
    (1/n)(1 – α)
    + 1/3α
    (1/n)(1 – α) (1/n)(1 – α)
    M’i,j
    = (1/n)(1 – α) + αMi,j

    View Slide

  77. Applications of PageRank
    Beyond web search

    View Slide

  78. @arnoutboks #dpc19
    PageRank in general
    Analysis of networks/directed graphs:
    • Edge A → B increases score of B
    • The higher the score of A, the higher the score of B

    View Slide

  79. @arnoutboks #dpc19
    Influence in social networks
    B
    A
    A follows B

    View Slide

  80. @arnoutboks #dpc19
    Food chains
    B
    A
    A eats B

    View Slide

  81. @arnoutboks #dpc19
    CodeRank
    function A
    A calls B
    function B

    View Slide

  82. @arnoutboks #dpc19
    CodeRank
    A depends on B
    class A class B

    View Slide

  83. @arnoutboks #dpc19
    Reverse CodeRank
    B depends on A
    class A class B

    View Slide

  84. @arnoutboks #dpc19
    CodeRank for PHP
    pdepend/pdepend
    Calculates metrics including CodeRank and
    Reverse CodeRank

    View Slide

  85. @arnoutboks #dpc19
    Package dependencies
    B
    A
    A depends on B

    View Slide

  86. @arnoutboks #dpc19
    PageRank is not very
    difficult to implement…
    …but applications are
    countless

    View Slide

  87. @arnoutboks #dpc19
    Back to our story…
    $ git stash pop
    On branch master
    Changes to be committed:
    (use "git reset HEAD ..." to unstage)
    new file: volleyball-story.txt
    Dropped refs/stash@{0}(b180f4)

    View Slide

  88. @arnoutboks #dpc19
    Ranking (incomplete) competitions
    B
    A
    A lost to B

    View Slide

  89. @arnoutboks #dpc19
    Ranking (incomplete) competitions
    B
    A
    A lost 2-3 to B
    3
    2

    View Slide

  90. @arnoutboks #dpc19
    The results
    # Team Pnt
    1 DEO 2 30
    2 Netwerk 1 24
    3 Punch 6 24
    4 Delta 5 23
    5 Red Stars 1 23
    6 Delta 4 17
    7 Kalinko 6 12
    8 Kratos 7 7
    9 Punch 9 6
    10 Sovicos 4 4

    View Slide

  91. @arnoutboks #dpc19
    The results
    # Team Pnt
    1 DEO 2 30
    2 Netwerk 1 24
    3 Punch 6 24
    4 Delta 5 23
    5 Red Stars 1 23
    6 Delta 4 17
    7 Kalinko 6 12
    8 Kratos 7 7
    9 Punch 9 6
    10 Sovicos 4 4
    # Team %
    1 Punch 6 25.79
    2 DEO 2 16.86
    3 Red Stars 1 15.31
    4 Delta 5 14.13
    5 Delta 4 11.03
    6 Netwerk 1 8.79
    7 Kalinko 6 3.13
    8 Kratos '08 7 2.49
    9 Punch 9 1.24
    10 Sovicos 4 1.23

    View Slide

  92. @arnoutboks #dpc19
    The end

    View Slide

  93. @arnoutboks #dpc19
    Feedback & Questions
    @arnoutboks
    @arnoutboks
    @aboks
    Arnout Boks
    Please leave your feedback on joind.in:
    https://joind.in/talk/fb7e9
    We’re hiring!

    View Slide

  94. @arnoutboks #dpc19
    Image Credits
    • https://www.flickr.com/photos/42283496@N08/4011391702/
    • https://pixabay.com/illustrations/google-search-engine-browser-
    search-76517/
    • https://unsplash.com/photos/68ZlATaVYIo
    • https://imgur.com/gallery/SLHBV
    • https://www.flickr.com/photos/125329869@N03/14418114781
    • https://unsplash.com/photos/ZiQkhI7417A

    View Slide