COMP551 special topic lecture We often want to find which nodes are important in graphs. Who is the most influential person? Which is the best paper? Which is the most popular webpage? Paper citation graphs Web Social networks
COMP551 special topic lecture We often want to know which nodes are important in graph. Who is the most influential person? Which is the best paper? Which is the most popular webpage? Paper citation networks Web How can we compute the importance of nodes in graphs? Q. Link analysis can help us!! A.
Importance Ranking 1. node B 2. node D 3. node A 4. node C 5. node E 0.40pt 0.26pt 0.20pt 0.11pt 0.03pt Based on graph structure, PageRank evaluates and ranks webpages Web graph (Hyperlink structure)
voting) If a webpage is linked by a lot of webpages, the webpage can be important. A C D B E #in-links = 3 #in-links = 2 #in-links = 2 Is this assumption really good?
consider whether where a webpage is linked by important pages or non-important pages A C D B E #in-links: 3 #in-links: 2 #in-links: 2 linked by B whose #in-link=3 linked by E whose #in-link=0 Which is more important, page C or D?
a lot of IMPORTANT pages, the page can be important Assumption A C D B E more important than E #in-links: 2 #in-links: 2 D is more important than C because D is linked by more important node (B) than D
more likely to stay on more important pages 1.When people are browsing a page, we assume that they randomly select a link on it for next browsing 2.They are likely to move to a page from links of more important pages. A C D B E How can we calculate the likelihood to stay? 3.
A C D B Q. Suppose that a random surfer is now at A. He randomly selects one of links on each page to decide which page he will visit. Which page has the highest chance of him to stay after he move repeatedly? Prob. = 1 Prob. = 0 Prob. = 0 Prob. = 0 Random surfer
A C D B Q. A surfer randomly select a link to move 1/3 1/3 Transition probability 1/3 Which page has the highest chance of him to stay after he move repeatedly?
A C D B 1/3 1/3 Transition probability 1/3 How large is the chance that he will be on page B after his first transition? Q. Which page has the highest chance of him to stay after he move repeatedly?
A C D B 1x(1/3)=1/3 1x(1/3)=1/3 1x(1/3)=1/3 0 How likely is he to stay on page B after his SECOND transition? Q. Which page has the highest chance of him to stay after he move repeatedly?
A C D B 1/2 1/2 1 Transition probability 1/2 1/2 1/3 1/3 1/3 Q. Which page has the highest chance of him to stay after he move repeatedly? The probability A→B→A is 𝟏 𝟑 × 𝟏 𝟐
A C D B 1/2 1/2 1 Transition probability 1/2 1/2 1/3 1/3 1/3 Q. Which page has the highest chance of him to stay after he move repeatedly? The probability A→C→A is 𝟏 𝟑 ×𝟏
A C D B 1/2 1/2 1 Transition probability 1/2 1/2 1/3 1/3 1/3 Q. Which page has the highest chance of him to stay after he move repeatedly? The probability A→?→A is 𝟏 𝟑 × 𝟏 𝟐 + 𝟏 𝟑 ×𝟏 = 𝟏 𝟐
0 1 2 3 4 5 A 1 0 0.5 0.25 0.375 0.313 B 0 0.333 0.167 0.25 0.208 0.229 C 0 0.333 0.167 0.25 0.208 0.229 D 0 0.333 0.167 0.25 0.208 0.229 Node Iter. Probability change in each iteration Q. Which page has the highest chance of him to stay after he move repeatedly?
0 5 10 20 … 1000 A 1 0.313 0.334 0.333 0.333 B 0 0.229 0.222 0.222 0.222 C 0 0.229 0.222 0.222 0.222 D 0 0.229 0.222 0.222 0.222 Node Iter. When transition repeats, each probability will be converged. The prob. mean the likelihood of people to visit (i.e., PageRank) Probability change in each iteration Q. Which page has the highest chance of him to stay after he move repeatedly?
of users to be on each node 𝒓𝟎 = 1 0 0 0 Transition probability from node to node 𝑴 = 0 1/2 0 0 1/3 0 0 1/2 1/3 0 1 1/2 1/3 1/2 0 0 A C D B Prob.=1 Prob.=0 Prob.=0 Prob.=0 1/2 1/3 1/3 1/3 1/2 1/2 1/2 1
Dead end Several of link structures violate the PageRank assumption 0 1 10 100 A 1 0 0.01 0 B 0 0.333 0.015 0 C 0 0.333 0.015 0 D 0 0.333 0.015 0 Probability change in each iteration
Spider trap Several of link structures violate the PageRank assumption 0 1 10 100 A 1 0 0.01 0 B 0 0.333 0.015 0 C 0 0.333 0.961 1 D 0 0.333 0.015 0 Probability change in each iteration
browsing a page, we assume that they randomly select links on it for next transition A C D B Most cases: people use links A C D B Sometimes: people directly jump 2.Sometimes, people directly visit pages without using hyperlinks (called, random jump)
browsing a page, we assume that they randomly select links on it for next transition 2.Sometimes, people directly visit pages without using hyperlinks (called, random jump) A C D B Most cases: people use links A C D B Sometimes: people directly jump
1 − α 𝒅 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 2.Starting with n = 0, update rn with the below formula Corresponds to the case where people use links to visit pages 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 Transition matrix (which derived from link structure) 0 1/2 0 0 1/3 0 0 1/2 1/3 0 1 1/2 1/3 1/2 0 0 A C D B 1/3 1/3 1/3 1/2 1/2 1/2 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 2.Starting with n = 0, update rn with the below formula Corresponds to the case where people directly visit pages 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 Random surf vector: Probability of people to directly visit pages (uniform distribution of prob.) 1/4 1/4 1/4 1/4 A C D B 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 Probabilities (parameters) to decide which of the two factors people use. (Empirically, α is set in the range from 0.8 to 0.9) 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d
1 − α 𝒅 2.Starting with n = 0, update rn with the below formula 1. Initialize r0 (randomly assign values to r0 ). Set transition Matrix M and random surf vector d 3.If rn is converged (it does not change), the algorithm finishes. The converged rn is the PageRank!!
of topics each node is related to. A C D B E ▪ Pages about medicine ▪ Pages about cosmetic F G Normal PageRank 1. Page C 0.282pt 2. A 0.174pt 3. F 0.133pt 4. D 0.132pt 5. B 0.093pt 6. E 0.092pt 7. G 0.092pt
of topics each node is related to. A C D B E ▪ Pages about medicine ▪ Pages about cosmetic F G Normal PageRank 1. Page C 0.282pt 2. A 0.174pt 3. F 0.133pt 4. D 0.132pt 5. B 0.093pt 6. E 0.092pt 7. G 0.092pt
Pages about medicine ▪ Pages about cosmetic A C D B E F G Many pages link to C, but only one of them is about med. A is linked by more pages about medicine than C
Pages about medicine ▪ Pages about cosmetic A C D B E F G Many pages link to C, but only one of them is about med. A is linked by more pages about medicine than C If people often move to a page from important pages about the topic, such page should be important for the topic! Assumption
links on pages to visit other pages. • They sometimes randomly visit pages without links. Topic-sensitive PageRank • People follow links on pages to visit other pages. • They sometimes randomly visit pages without links. only a kind of
1 − α 𝒅 1/7 1/7 1/7 1/7 1/7 1/7 1/7 Normal PageRank A C D B E F G 1/4 1/4 1/4 0 0 0 1/4 Topic-sensitive PageRank Starting with n = 0, update rn with the below formula
scores to pages about a target topic ▪ Pages about medicine ▪ Pages about cosmetic A C D B E F G Normal PageRank 1. C 0.282pt 2. A 0.174pt 3. F 0.133pt 4. D 0.132pt 5. B 0.093pt 6. E 0.092pt 7. G 0.092pt Topic-sensitive PR 1. A 0.266pt 2. C 0.248pt 3. G 0.147pt 4. B 0.121pt 5. D 0.108pt 6. E 0.057pt 7. F 0.055pt • Even if a page is not about a target topic, if the page is linked by important pages, TsPR gives high score to it.
on a graph for target topics 1. Finding important nodes for individual users (personalized PageRank) 2. - For that, Give random surf values to only nodes for target topics - If you know the nodes of a user to frequently visit, give random surf values to only the nodes. A C D B E F G ▪ Pages which a user likes 𝒓𝒏 = 𝜶𝑴𝒓𝒏,𝟏 + 1 − α 𝒅 0 1/3 0 0 1/3 0 1/3