How PageRank is calculated


On a simple level, we can tell quite a lot about how PageRank is calculated. This is because when Google was just a university research project, the creators of  PageRank published a paper that detailed a formula for calculating it. This formula is now more than a handful of years old, and we suspect that it has changed somewhat since then. However, for detailing the over-riding principles of how PageRank works, it is as accurate today as the day it was written.


PR(A) = (1-d) + d (PR(T1) /C(T1) + ... + PR(Tn) /C(Tn) )
Where PR(A) is the PageRank of Page A ( the one we want to work out ) .
D is a dampening factor. Nominally this is set to 0.85
PR(T1) is the PageRank of a site point ing to Page A
C(T1) is the number of links off that page
PR(Tn) /C(Tn) means we do that for each page point ing to Page A



Couldn’t be simpler right? Depending on your mathematical competence, this is 



either remarkably easy to understand, or remarkably complex. So here’s the deal  



– this isn’t a one-time calculation. It looks pretty above, but you can’t just do one 



simple calculation and get an answer with the PageRank of a page! If you look at 



it, to calculate the PageRank of Page A, we need to know the PageRank of all 



the pages pointing to it. To calculate the PageRank of those pages we need to 



know the PageRank of all the pages pointing to them (one of which could, and is 



quite likely to be Page A!). If this seems like it could go around in circles forever,










that’s because it nearly does. We have to do a whole lot of little calculations 



many times over before we actually get the answer. What this formula tells us is 






that however you slice it, and however the formula may have changed since it 






was written:












The PageRank given to Page A by a Page B pointing to it, is decreased with
each link to anywhere that exists on Page B. This means that a page’s
PageRank is essentially a measure of its vote; it can split that vote between one
link or two or many more, but its overall voting power will always remain the
same.




This really is important. So to be totally clear, let’s put some numbers to it (these 
numbers are made up for demonstration purposes only, and have no relevancy 






to any particular page). Say Page B has a PageRank value of 5 and has a single 
link on it pointing to Page A. Page A’s PageRank is improved by a proportion of 
Page B’s value of 5 (Page B doesn’t lose anything, but Page A gains). If Page B 
has two links, that PageRank improvement would be split, and Page A would 



only gain half the PageRank that it did before. 








Now put the formula out of your mind for a moment, as it's easier to understand 



how it works using a diagram. Let’s say we have a hypothetical set of pages 
imaginatively titled Page A, Page B, Page C and Page D. They link to each other .





To begin with, in our example at least, we don’t know what the page’s starting 
PageRanks are. There’s nothing special about the number that we pick to start 






with (in fact, if you read forward to the section on convergence – you can see that 



we can start at any number we want). Since in the last version of this paper we 
performed the calculations by setting these values to 1 – we’re going to set them 






to zero this time around in order to prove that it doesn’t matter what the starting 






values are.





Next, we perform the necessary calculation to obtain the PageRank for each 
page. The rules are:






1. We take a 0.85 * a page’s PageRank, and divide it by the number of  
links on the page.





2. We add that amount on to a new total for each page it’s being passed 
to.





3. We add 0.15 to each of those totals.




The first calculation is easy. Because we’ve started at zero – 0 * 0.85 is always 0. 
So each page gets just 0.15 + 0. Meaning each page now has a PageRank of 






0.15. Clearly we’re not done – we want to show the importance of each page 






based upon links, and they’re all the same; so we need to run the calculation 






again.





Page A links to pages B, C and D. Page A’s PageRank is 0.15 so it will add 0.85 



* 0.15 = 0.1275 to the new PageRank scores of the pages it links to. There are 






three of them so they each get 0.0425.





Page B links to page C. Page B’s PageRank is 0.15 so it will add 0.85 * 0.15 = 



0.1275 to the new PageRank score of the pages it links to. Since it only links to 






page C, page C will get it all.






Page C links to Page A, all 0.1275 passes to page A. 
Page D links to Page C. Again all 0.1275 passes to page C.





The new totals for each page them become:




Page A: 0.15 (base) + 0.1275 (from Page C) = 0.2775
Page B: 0.15 (base) + 0.0425 (from Page A) = 0.1925
Page C: 0.15 (base) + 0.0425 (from Page A) + 0.1275 (from Page B) + 0.1275 
(from Page D) = 0.4475






Page D: 0.15 (base) + 0.0425 (from Page A) = 0.1925
So we’ve got:
15
Pretty neat huh? Already we’re begining to see that Page C is probably the most 
important page in the system (but we can’t be sure yet – it could well change).









We carry on doing these calculations until the value for each page no longer 
changes (this is called convergence – there’s more about this in the next 






section). In practice, Google probably doesn't wait for this convergence, but 






instead run a number of iterations of the calculation which is likely to give them 






fairly accurate values (more on this later as well). If we carried out all the 






calculations for the example given, it would take us 143 calculations [ Excel 






Example 1] and we’d reach final values of: 





As suspected, Page C is the most important. If we take a quick look at these raw 
values we can see something about the number of links pointing out from a page. 






Look at Page A, which has a link from a high PageRank page (Page C), which 






has only one outbound link. Then look at Page B and D; both share links from a 






high PageRank page (Page A), with three outbound links. The number of links s
ignificantly alters the way PageRank is distributed.




Comments

Popular Posts