Oversees: Search Technology & Relevancy
Join Date: Jun 2004
Improving PageRank: The Papers
Since its inception PageRank has suffered from many theoretical fallacies and problems.
Rather than beating a dead horse, I will enumerate some of these, followed by research papers that attempt to fix these. This approach may benefit those interested in researching the corresponding subjects.
Here are some papers. If you check the W3C conferences; e.g. 2005, 2004 and before, you might find more articles. Other sources have similar research papers.
1. "Users Never Click Back"
Paper that tries to address this: The Effect of the Back Button in a Random Walk:
Application for PageRank
2. Lack of Accuracy with Missing Nodes
Paper that tries to address this: Outlink Estimation For Pagerank Computation Under Missing Data
This, related in some way to the famous Perron Frobenius Theorem.
3. Link counts and "important" Web pages
Paper that tries to address this: Weighted PageRank Algorithm
4. Link Citation-Literature Citation Analogy (debunked)
Paper that tries to address this: TLA Paper (check also our SEWF thread on TLA).
5. Why PageRank has been biased in favor of old Sites
Paper that tries to address this: Web Structure, Dynamics and Page Quality
Last year, Prof. Ricardo Baeza-Yates kindly sent me by email paper #5. This paper in particular is important since
a. It presents a mathematical model that explains why PageRank tends to favor old sites.
b. Note the Fractal Nature of the Web as it was known back then: bowties structures within large Web Bowtie (IN, SCC, OUT)
How about new sites?
Incidentally, there is another paper which presents a mathematical model for improving PageRank. In this model, new sites are no longer ignored. The paper is
On the Temporal Dimension of Search
In this last paper, the authors write
"Our experimental data show an obvious trend that a new paper
is more likely to draw citations than an old paper. Therefore,
another parameter called the agingfactor, Aging(A) (which is in
[0, 1]), is introduced."
They found that his approach boost the rank of new sites. However, this is not sufficient
"Although TimedPageRank is able to boost the rank of emerging
quality papers, it is not sufficient for all the papers because new
papers only have a few or no citation."
Overall this paper points to the obvious; that the Web is a Dynamical System as we knew long ago back in the early 90's; so, it must be treated accordingly. Welcome to Non Linear Dynamical Systems and Fractals.
Thus, Baeza-Yates and this other paper shed some ligth to issues related with the age of sites, but using mathematical models to explain the observables.
Last edited by orion : 02-10-2005 at 03:10 PM.
Join Date: Feb 2005
Another great topic Orion.
I have been working on papers all day and writing the odd article, so my head is about to fall off, and I now have a load of results to go through, so I don't think I have the strength to write the answer that your post deserves.
I will say that I feel that PR is really quite old now, and that a big chnage is long due. It has drawbacks that are becoming more and more apparent.
Seems about right to me that nonlinear systems and fractals be used.
A collegue has been working on cahos theory for a little while. It is also interesting.
Forums Editor, SearchEngineWatch
Join Date: Jun 2004
I don't whether you have posted these in various places before... or maybe we have searched for similiar terms etc. But I actually have read all of these.
Guess I am more emersed in this topic than I thought!!!
Join Date: Oct 2006
Nice roundup of articles.
Last edited by AussieWebmaster : 10-06-2006 at 05:51 PM. Reason: sig link not allowed
|Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)|