Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 02-10-2005   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Improving PageRank: The Papers

Since its inception PageRank has suffered from many theoretical fallacies and problems.

Rather than beating a dead horse, I will enumerate some of these, followed by research papers that attempt to fix these. This approach may benefit those interested in researching the corresponding subjects.

Here are some papers. If you check the W3C conferences; e.g. 2005, 2004 and before, you might find more articles. Other sources have similar research papers.

1. "Users Never Click Back"
Paper that tries to address this: The Effect of the Back Button in a Random Walk:
Application for PageRank

2. Lack of Accuracy with Missing Nodes
Paper that tries to address this: Outlink Estimation For Pagerank Computation Under Missing Data
This, related in some way to the famous Perron Frobenius Theorem.

3. Link counts and "important" Web pages
Paper that tries to address this: Weighted PageRank Algorithm

4. Link Citation-Literature Citation Analogy (debunked)
Paper that tries to address this: TLA Paper (check also our SEWF thread on TLA).

5. Why PageRank has been biased in favor of old Sites
Paper that tries to address this: Web Structure, Dynamics and Page Quality


Last year, Prof. Ricardo Baeza-Yates kindly sent me by email paper #5. This paper in particular is important since

a. It presents a mathematical model that explains why PageRank tends to favor old sites.
b. Note the Fractal Nature of the Web as it was known back then: bowties structures within large Web Bowtie (IN, SCC, OUT)

How about new sites?
Incidentally, there is another paper which presents a mathematical model for improving PageRank. In this model, new sites are no longer ignored. The paper is
On the Temporal Dimension of Search

In this last paper, the authors write

"Our experimental data show an obvious trend that a new paper
is more likely to draw citations than an old paper. Therefore,
another parameter called the agingfactor, Aging(A) (which is in
[0, 1]), is introduced."

They found that his approach boost the rank of new sites. However, this is not sufficient

"Although TimedPageRank is able to boost the rank of emerging
quality papers, it is not sufficient for all the papers because new
papers only have a few or no citation."

Emphasis added.

Overall this paper points to the obvious; that the Web is a Dynamical System as we knew long ago back in the early 90's; so, it must be treated accordingly. Welcome to Non Linear Dynamical Systems and Fractals.

Thus, Baeza-Yates and this other paper shed some ligth to issues related with the age of sites, but using mathematical models to explain the observables.


Orion

Last edited by orion : 02-10-2005 at 02:10 PM.
orion is offline   Reply With Quote
Old 02-10-2005   #2
xan
Member
 
Join Date: Feb 2005
Posts: 238
xan has a spectacular aura aboutxan has a spectacular aura about
Another great topic Orion.

I have been working on papers all day and writing the odd article, so my head is about to fall off, and I now have a load of results to go through, so I don't think I have the strength to write the answer that your post deserves.

I will say that I feel that PR is really quite old now, and that a big chnage is long due. It has drawbacks that are becoming more and more apparent.

Seems about right to me that nonlinear systems and fractals be used.

A collegue has been working on cahos theory for a little while. It is also interesting.
xan is offline   Reply With Quote
Old 02-10-2005   #3
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Orion, would the research paper on TrustRank be a part of this set - http://www.vldb.org/conf/2004/RS15P3.PDF ?
randfish is offline   Reply With Quote
Old 02-14-2005   #4
AussieWebmaster
Forums Editor, SearchEngineWatch
 
AussieWebmaster's Avatar
 
Join Date: Jun 2004
Location: NYC
Posts: 8,153
AussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant futureAussieWebmaster has a brilliant future
I don't whether you have posted these in various places before... or maybe we have searched for similiar terms etc. But I actually have read all of these.
Guess I am more emersed in this topic than I thought!!!
AussieWebmaster is offline   Reply With Quote
Old 10-06-2006   #5
kandk
Newbie
 
Join Date: Oct 2006
Posts: 3
kandk is on a distinguished road
Nice roundup of articles.

Still new to this forum but it's already paying off!

-kak


Last edited by AussieWebmaster : 10-06-2006 at 04:51 PM. Reason: sig link not allowed
kandk is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off