Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Google PageRank

Beat Signer
December 09, 2008

Google PageRank

Explanation of Google's PageRank algorithm with some examples followed by a discussion of related Search engine optimisation (SEO) issues.

Beat Signer

December 09, 2008
Tweet

More Decks by Beat Signer

Other Decks in Research

Transcript

  1. Vrije Universiteit Brussel, August 25, 2008 Google PageRank Beat Signer

    Global Information Systems Research Group Department of Computer Science ETH Zurich
  2. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Overview

    ▪ History of PageRank ▪ PageRank algorithm ▪ Examples ▪ Implications for website development 2
  3. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] History

    of PageRank ▪ Developed as part of an academic project at Stanford University ▪ research platform to aid understanding of large-scale web data and enable researches to easily experiment with new search technologies ▪ Larry Page and Sergey Brin worked on the project about a new kind of search engine (1995-1998) which finally led to a functional prototype called Google 3 Larry Page Sergey Brin
  4. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Web

    Search Until 1998 ▪ Find all documents using a query term ▪ use information retrieval (IR) solutions ▪ ranking based on "on the page factors" → problem: poor quality of search results (order) ▪ Page and Brin proposed to compute the absolute qualtity of a page (PageRank) ▪ based on the number and quality of pages linking to a page (votes) 4
  5. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] PageRank

    ▪ A page has a high PageRank R if ▪ there are many pages linking to it ▪ or, if there are some pages with a high PageRank linking to it ▪ Total score = IR score x PageRank 5 P1 R1 P2 R2 P3 R3 P4 R4 P5 R5 P6 R6 P7 R7 P8 R8
  6. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] PageRank

    Algorithm ▪ where ▪ Bi is the set of pages that link to page Pi ▪ Lj is the number of outgoing links for page Pj 6   = i j B P j j i L P R P R ) ( ) ( P1 P2 P3 P1 1 P2 1 P3 1 P1 1.5 P2 1.5 P3 0.75 P1 1.5 P2 1.5 P3 0.75
  7. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Matrix

    Representation ▪ Let us define a hyperlink matrix H 7 P1 P2 P3     = otherwise 0 if 1 i j j ij B P L H           = 0 2 1 0 0 0 1 1 2 1 0 H ( )   i P R = R and HR R = R is an eigenvector of H with eigenvalue 1 →
  8. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Matrix

    Representation ... ▪ We can use the power method to find R 8 t t HR R = +1           = 0 2 1 0 0 0 1 1 2 1 0 H For our example this results in or   1 2 2 = R   2 . 0 4 . 0 4 . 0
  9. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Dangling

    Pages ▪ Problem with pages that have no outbound links (P2 ) 9 P1 P2       = 0 1 0 0 H and   0 0 = R       = 2 1 0 2 1 0 C       = + = 2 1 1 2 1 0 C H S and C C
  10. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Strongly

    Connected Pages (Graph) ▪ Add new transition probabilities between all pages ▪ with probability d we follow the hyperlink structure S ▪ with probability 1-d we choose a random page 10 P1 P2 P3 P4 P5 ( ) S 1 G d n d + − = 1 1 GR R = 1-d 1-d 1-d
  11. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples

    11 ( ) S 1 G d n d + − = 1 1 A1 0.26 A2 0.37 A3 0.37
  12. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples

    ... 12 A1 0.13 A2 0.185 A3 0.185 B1 0.13 B2 0.185 B3 0.185 ( ) 5 . 0 = A P ( ) 5 . 0 = B P ( ) S 1 G d n d + − = 1 1
  13. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples

    ... 13 A1 0.10 A2 0.14 A3 0.14 B1 0.22 B2 0.20 B3 0.20 ( ) 38 . 0 = A P ( ) 62 . 0 = B P ( ) S 1 G d n d + − = 1 1
  14. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples

    ... 14 A1 0.3 A2 0.23 A3 0.18 B1 0.10 B2 0.095 B3 0.095 ( ) 71 . 0 = A P ( ) 29 . 0 = B P ( ) S 1 G d n d + − = 1 1
  15. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples

    ... 15 A1 0.35 A2 0.24 A3 0.18 B1 0.09 B2 0.07 B3 0.07 ( ) 77 . 0 = A P ( ) 23 . 0 = B P ( ) S 1 G d n d + − = 1 1
  16. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples

    ... 16 A1 0.33 A2 0.17 A3 0.175 B1 0.08 B2 0.06 B3 0.06 ( ) 86 . 0 = A P ( ) 14 . 0 = B P A4 0.125 ( ) S 1 G d n d + − = 1 1
  17. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Implications

    for Website Development ▪ First make sure that your page gets indexed ▪ "on the page factors" ▪ Think about your site's internal link structure ▪ create many internal links for important pages ▪ be "careful" about where to put outgoing links ▪ Increase the number of pages ▪ Ensure that webpages are addressed consistently ▪ http://www.vub.ac.be  http://www.vub.ac.be/index.php ▪ Make sure that you get links from good websites 17
  18. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Other

    Search Engine Optimisations (SEO) ▪ Internet marketing has become big business ▪ white hat and black hat optimisations ▪ Black hat optimisations ▪ link farms ▪ spamdexing in guestbooks etc. <a rel="nofollow" href="…">…</a> ▪ selling/buying links ▪ … ▪ Is PageRank fair? 19
  19. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Conclusions

    ▪ PageRank algorithm ▪ absolute quality of a page based on incoming links ▪ random surfer model ▪ computed as eigenvector of Google matrix G ▪ Implications for website development and SEO ▪ PageRank is just one (important) factor 20
  20. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] References

    ▪ The PageRank Citation Ranking: Bringing Order to the Web, L. Page, S. Brin, R. Motwani and T. Winograd, January 1998 ▪ The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin and L. Page, Computer Networks and ISDN Systems, 30(1-7), April 1998
  21. Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] References

    … ▪ PageRank Uncovered, C. Ridings and M. Shishigin, September 2002 ▪ PageRank Calculator, http://www.webworkshop.net/pagerank_ calculator.php 22