Slide 1

Slide 1 text

Vrije Universiteit Brussel, August 25, 2008 Google PageRank Beat Signer Global Information Systems Research Group Department of Computer Science ETH Zurich

Slide 2

Slide 2 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Overview ▪ History of PageRank ▪ PageRank algorithm ▪ Examples ▪ Implications for website development 2

Slide 3

Slide 3 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] History of PageRank ▪ Developed as part of an academic project at Stanford University ▪ research platform to aid understanding of large-scale web data and enable researches to easily experiment with new search technologies ▪ Larry Page and Sergey Brin worked on the project about a new kind of search engine (1995-1998) which finally led to a functional prototype called Google 3 Larry Page Sergey Brin

Slide 4

Slide 4 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Web Search Until 1998 ▪ Find all documents using a query term ▪ use information retrieval (IR) solutions ▪ ranking based on "on the page factors" → problem: poor quality of search results (order) ▪ Page and Brin proposed to compute the absolute qualtity of a page (PageRank) ▪ based on the number and quality of pages linking to a page (votes) 4

Slide 5

Slide 5 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] PageRank ▪ A page has a high PageRank R if ▪ there are many pages linking to it ▪ or, if there are some pages with a high PageRank linking to it ▪ Total score = IR score x PageRank 5 P1 R1 P2 R2 P3 R3 P4 R4 P5 R5 P6 R6 P7 R7 P8 R8

Slide 6

Slide 6 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] PageRank Algorithm ▪ where ▪ Bi is the set of pages that link to page Pi ▪ Lj is the number of outgoing links for page Pj 6   = i j B P j j i L P R P R ) ( ) ( P1 P2 P3 P1 1 P2 1 P3 1 P1 1.5 P2 1.5 P3 0.75 P1 1.5 P2 1.5 P3 0.75

Slide 7

Slide 7 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Matrix Representation ▪ Let us define a hyperlink matrix H 7 P1 P2 P3     = otherwise 0 if 1 i j j ij B P L H           = 0 2 1 0 0 0 1 1 2 1 0 H ( )   i P R = R and HR R = R is an eigenvector of H with eigenvalue 1 →

Slide 8

Slide 8 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Matrix Representation ... ▪ We can use the power method to find R 8 t t HR R = +1           = 0 2 1 0 0 0 1 1 2 1 0 H For our example this results in or   1 2 2 = R   2 . 0 4 . 0 4 . 0

Slide 9

Slide 9 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Dangling Pages ▪ Problem with pages that have no outbound links (P2 ) 9 P1 P2       = 0 1 0 0 H and   0 0 = R       = 2 1 0 2 1 0 C       = + = 2 1 1 2 1 0 C H S and C C

Slide 10

Slide 10 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Strongly Connected Pages (Graph) ▪ Add new transition probabilities between all pages ▪ with probability d we follow the hyperlink structure S ▪ with probability 1-d we choose a random page 10 P1 P2 P3 P4 P5 ( ) S 1 G d n d + − = 1 1 GR R = 1-d 1-d 1-d

Slide 11

Slide 11 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples 11 ( ) S 1 G d n d + − = 1 1 A1 0.26 A2 0.37 A3 0.37

Slide 12

Slide 12 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples ... 12 A1 0.13 A2 0.185 A3 0.185 B1 0.13 B2 0.185 B3 0.185 ( ) 5 . 0 = A P ( ) 5 . 0 = B P ( ) S 1 G d n d + − = 1 1

Slide 13

Slide 13 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples ... 13 A1 0.10 A2 0.14 A3 0.14 B1 0.22 B2 0.20 B3 0.20 ( ) 38 . 0 = A P ( ) 62 . 0 = B P ( ) S 1 G d n d + − = 1 1

Slide 14

Slide 14 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples ... 14 A1 0.3 A2 0.23 A3 0.18 B1 0.10 B2 0.095 B3 0.095 ( ) 71 . 0 = A P ( ) 29 . 0 = B P ( ) S 1 G d n d + − = 1 1

Slide 15

Slide 15 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples ... 15 A1 0.35 A2 0.24 A3 0.18 B1 0.09 B2 0.07 B3 0.07 ( ) 77 . 0 = A P ( ) 23 . 0 = B P ( ) S 1 G d n d + − = 1 1

Slide 16

Slide 16 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Examples ... 16 A1 0.33 A2 0.17 A3 0.175 B1 0.08 B2 0.06 B3 0.06 ( ) 86 . 0 = A P ( ) 14 . 0 = B P A4 0.125 ( ) S 1 G d n d + − = 1 1

Slide 17

Slide 17 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Implications for Website Development ▪ First make sure that your page gets indexed ▪ "on the page factors" ▪ Think about your site's internal link structure ▪ create many internal links for important pages ▪ be "careful" about where to put outgoing links ▪ Increase the number of pages ▪ Ensure that webpages are addressed consistently ▪ http://www.vub.ac.be  http://www.vub.ac.be/index.php ▪ Make sure that you get links from good websites 17

Slide 18

Slide 18 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Consistent Addressing of Webpages 18

Slide 19

Slide 19 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Other Search Engine Optimisations (SEO) ▪ Internet marketing has become big business ▪ white hat and black hat optimisations ▪ Black hat optimisations ▪ link farms ▪ spamdexing in guestbooks etc. ▪ selling/buying links ▪ … ▪ Is PageRank fair? 19

Slide 20

Slide 20 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] Conclusions ▪ PageRank algorithm ▪ absolute quality of a page based on incoming links ▪ random surfer model ▪ computed as eigenvector of Google matrix G ▪ Implications for website development and SEO ▪ PageRank is just one (important) factor 20

Slide 21

Slide 21 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] References ▪ The PageRank Citation Ranking: Bringing Order to the Web, L. Page, S. Brin, R. Motwani and T. Winograd, January 1998 ▪ The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin and L. Page, Computer Networks and ISDN Systems, 30(1-7), April 1998

Slide 22

Slide 22 text

Vrije Universiteit Brussel, August 25, 2008 Beat Signer, [email protected] References … ▪ PageRank Uncovered, C. Ridings and M. Shishigin, September 2002 ▪ PageRank Calculator, http://www.webworkshop.net/pagerank_ calculator.php 22