Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 11-11-2004   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation The WLR Algorithm

The Weighted Links Rank (WLR) Algorithm

At the 2004 W3C Conference, one of the most respected IR scientists, Dr Ricardo Baeza-Yates and Emilio Davis, presented the research paper Web Page Ranking using Link Attributes in which the Weighted Links Rank (WLR), an improved PageRank model was introduced.


Description of WLR

According to the researchers WLR “considers different Web page attributes to give more weight to some links, improving the precision of the answers.”

In a nutshell, the weight of a document is given by

W(j, i) = L(j, i)(c + T(j, i) + AL(j, i) + RP(j, i))

where given a link from page j to page i:

L(j, i) = 1 if the link exists, or 0 otherwise
T(j, i) = a value that depends on the tag where the link is inserted
AL(j, i) = the length of the anchor text of the link divided by a constant d
RP(j, i) = the relative position of the link in the page weighted by a constant b

The original PageRank is recovered when W(j, i) = L(j, i)

The authors states

“The term T(j; i) is a sequence of constants depending on the tag where the link is. For example, if the link is inside a <h1> tag, will have a high T(j; i) value, a little less for <h2>, etc. The same for others emphasis tags like <strong>or <b>.”

This reminds me of some old seo/html “tricks” I use for testing purposes. Here is one

<hn title=”keywords”><a title=”keywords” href=”http://….”>
<strong>keywords or relevant text</strong>
</a></hn>

where n = 1, 2,….6

The idea is to check which IR scoring system relies on the HTML DOM and can be gamed with such tricks. If this is the case, then one should even score with a copyright footer and by using an h5 or h6.

[NOTE: W3C Validations for XHTML and HTML
1. To pass the W3C validations the anchor tag must be inside the header tag and not the other way.
2. As the italic and font presentation tags, the bold tag has been long ago deprecated by the W3C. Instead of the bold tag, we should use the strong tag Check the W3C site.]

The authors also states

“The term AL(j, i) gives more value to links where the
creator explained in more detail what Web resource is being linked. For example, this gives less weight to links described with home or here.” [Emphasis added]

“Finally, the term RP(j; i) gives more weight to links that are at the beginning of the page rather that at the end of the page (physically in the HTML code, not necessarily in the browser view).”

That is, WRL incorporates practices already standard procedures in usability and search engine optimization.


Important Findings

The researchers found that

“Using the judgments of a total of 20 queries, we computed the precision on the first k answers. That is, precision is the number of relevant answers over the number of answers considered, obtaining the results shown in Figure 1. From the graph we can see that the most effective attribute is anchor text length, and that all of them improve upon PageRank which uses uniform link weights.” [Emphasis added]

“One way to compare how better is WLRank with respect to PageRank is using a perfect ranking, which only gives relevant results. Table 1 shows the total error with respect to a perfect ranking for the first k answers up to 10. We can see that WLRank improves PageRank precision a 13% on average per answer for the first 10 answers.”

“Our results show that using weighted links can improve the precision of search engines. The best attribute seems to be anchor text length, but others can be better. On the other hand, relative position was not so effective, indicating that the logical position not always matches the physical position.”


Comments? Suggestions?


Orion

Last edited by orion : 11-11-2004 at 03:39 PM. Reason: typo
orion is offline   Reply With Quote
Old 11-11-2004   #2
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
>>Comments? Suggestions?

On what, the usefulness of such a thing?

Picking through the math to find the understandable bits I'd say it looked easier to manipulate if anything..
Nick W is offline   Reply With Quote
Old 11-11-2004   #3
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
en ingles por favor?

me too... can we (non developers/mathematicians) have a translation?
Chris Boggs is offline   Reply With Quote
Old 11-11-2004   #4
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Thanks for the post Orion - timely & excellent as always.

I see that it could be valuable to both the search engines and those who seek to manipulate the results. A powerful SEO has access to hundreds if not thousands of documents to modify for his/her personal gain. By using 'optimal' link weight information, it would be easy to simply create perfect links to their documents.

However (big however), this is nearly impossible to modify on sites where one requests links or for organic links - the values of blog comment spam, guestbook sigs, footer links, etc. will dissipate, which is a positive thing in terms of relevancy.

SEOs have long discussed the 'perfect' link, and WLR is simply taking it from idea form to an algorithmic form. I think they could use a little more ingenuity to make WLR even more valuable.

For example, a SE that could read context should be able to pick out links in the 'content' area of the page. I know MSN had been discussed as using visual scanning technology to denote separate 'areas' of content - ads, navigation, content, headers, etc. - certainly WLR could do likewise.

Additionally, WLR would be wise to avoid placing arbitrary weight on the position of links in H1, H2 tags, etc. as these are so easily manipulated - link farms and link exchanges will simply start requesting links with H tags surrounding them...

In my opinion
T(j, i) should have the highest value when it is inside the content of an article or a paragraph, surrounded by relevant text.

I do like the idea behind it though - all my optimization efforts have been done along the lines that this type of thinking was already being used by the search engines, maybe it will help out my sad efforts

As a last aside, I have to mention that while I'm a big fan of the W3C's standards and goals, I am of the opinion that W3C validation shouldn't affect rankings - it's something that SEOs and programmers can manipulate that the general web populace is unaware of. Just as a library should be built to accomodate very diverse media, so should the web be made to accomodate diverse coding.
randfish is offline   Reply With Quote
Old 11-11-2004   #5
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, Nick

Fair and valid reaction. To keep the discussion more interesting, let address the following questions, which search engines or IR systems you think that

1. can be gamed by manipulating the HTML DOM? Feel free to show evidence, one way or the other.
2. assign importance to the length of the links? Feel free to show evidence, one way or the other.

The paper provides interesting hints (we need to read between lines). Here is one. Take a close look at Table 1. Substract both columns. Can you interpret the meaning of the differences using the proposed model?

Here is another one, what is a perfect ranking? (expression used in the paper)

An another one: why the length of the links affect the scoring function?; i.e., Why weights increase with the length of the links? How far we can go?

Orion

PS. I added the last lines after posting.

Last edited by orion : 11-11-2004 at 05:07 PM. Reason: Refining two lines.
orion is offline   Reply With Quote
Old 11-11-2004   #6
Nick W
Member
 
Join Date: Jun 2004
Posts: 593
Nick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the roughNick W is a jewel in the rough
Randfish, thanks! That makes things a bit more understandable I shall go hit u up for some nice green rep lol!

>>Show evidence, one way or the other

ROFL Orion! That would be cool huh? Spam school at SEW - $10 a ticket! hehe....

Added: Damn! Cant give Rand any rep, someone please do the honors...?
Nick W is offline   Reply With Quote
Old 11-11-2004   #7
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by randfish
SEOs have long discussed the 'perfect' link, and WLR is simply taking it from idea form to an algorithmic form..
Actually they use the perfect ranking concept as reference ranking values (not links) they can use to compare PageRank and WLR scores. (See paper's Table 1)

"One way to compare how better is WLRank with respect
to PageRank is using a perfect ranking, which only gives
relevant results. Table 1 shows the total error with respect
to a perfect ranking for the first k answers up to 10. We
can see that WLRank improves PageRank precision a 13%
on average per answer for the first 10 answers."

It is not clear which reference set they used or how this set was obtained. I emailed the research group to see if they can shed some light on it.

Orion

Last edited by orion : 11-11-2004 at 07:38 PM.
orion is offline   Reply With Quote
Old 11-11-2004   #8
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Orion, thanks - hopefully they'll get back to you.

I wasn't referring to their concept of 'perfect rank', I just meant the concept of the "best" link or "perfect" link from an SEO standpoint - one that is the most relevant, most important, etc.

Their ideas about how to rank links follows the same vein of thought - that the placement of a link on a page, the anchor text it uses, etc. should influence how 'important' or 'valuable' that link is.

Orion, do you think they've done a thorough job of this? I thought after reading the paper that they could have gone much, much further with this topic and produced something of much greater value.

Maybe they'll leave that up to the search engineers...


p.s. I want to add that I believe any algorithmic implementation that focuses on code 'tricks' or placement are hurtful to relevancy rather than helpful. The future of search relevancy should be placed firmly in preventing 'gaming' of the system, rather than adding new ways to do it. Search engineers might do well to hire some slimy, spam-kings of the web to think up ways to outsmart the algo changes BEFORE implementing them.

Last edited by randfish : 11-11-2004 at 08:05 PM.
randfish is offline   Reply With Quote
Old 11-11-2004   #9
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by randfish
Orion, do you think they've done a thorough job of this? I thought after reading the paper that they could have gone much, much further with this topic and produced something of much greater value.
Understandable. Dr. Baeza Yates is one the most important IR researcher in the world. I'm confident in his work. On the other hand, keep in mind this is a conference paper, not a journal paper.

Normally conference papers don't tell all the story due to limitation of space and very restrictive editorial guidelines. They mentioned in the paper they could not present more research work because of this. That's why I emailed them.

Quote:
Originally Posted by randfish
p.s. I want to add that I believe any algorithmic implementation that focuses on code 'tricks' or placement are hurtful to relevancy rather than helpful. The future of search relevancy should be placed firmly in preventing 'gaming' of the system, rather than adding new ways to do it. Search engineers might do well to hire some slimy, spam-kings of the web to think up ways to outsmart the algo changes BEFORE implementing them.
I agree with you, randfish. However, we can also learn how to exploit faulty algorithms by studying them scientifically. Exploiting them in this way does not necessarily qualifies as tricks or spam. I'm providing an example at

http://forums.searchenginewatch.com/...2202#post22202

with MSN beta. These results were obtained without the need for bloggers help, spam tricks, or without being obsessive with link building or PageRank (I never care about this metric or these practices)

In a nutshell, knowing how a scoring framework reacts to the HTML DOM, word co-occurrence, and vector theory makes the difference. Good content and cloning an IR system in a controlled environment also helps.


Orion

Last edited by orion : 11-11-2004 at 09:17 PM.
orion is offline   Reply With Quote
Old 11-12-2004   #10
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Implications

Baeza-Yates and Davis's WRL model proposes that the most important factor in the model is length of the anchor text of the link AL(j, I) . This length is normalized by dividing by a constant proportional to the average length of links in a page. I would assume this average is obtained by calculating the total length of links and dividing by the number of links present in the document. The authors write

“AL(j; i) is the length of the anchor text of the link
divided by a constant d that depends that estimates
the average anchor text length in characters,”


Content is King

The WRL model assigns more importance to usable content in the anchor text; a fact already known by SEOs and defended by usability experts..

“The term AL(j; i) gives more value to links where the
creator explained in more detail what Web resource is being
linked. For example, this gives less weight to links described
with home or here.”

More weight is assigned to top links. Baeza-Yates and Davis write

“Finally, the term RP(j; i) gives more weight to links that
are at the beginning of the page rather that at the end of
the page (physically in the HTML code, not necessarily in
the browser view).”

Here we are not talking about apparent positioning in the browse view. The anchor tags must be hand-coded to appear at the beginning of the HTML source code. This implies that we can safely use CSS to repositioning the links to appear anywhere in the document.


Implications for Navigation Menus

This has some implications for links placed in menus. Links placed in navigation menus are usually shorter than links placed in the body of a document. Thus, in the WLR model these links should weigh less than long links placed at the beginning of the page.

In addition, links hand-coded to appear in a navigational menu positioned at the corners or bottom of a page should weigh less. To improve these weights we could use the CSS trick described in Andy King’s Speed Up Your Site: Web Site Optimization book, First Edition, Chapter 8, Advanced CSS Optimization under Raising Relevance (page 188). (See also Eric Meyer’s CSS book and site). The menu links are hand-coded to appear at the beginning of the HTML code and then repositioned with CSS to appear where you want them to appear.


Orion

Last edited by orion : 11-12-2004 at 12:07 PM.
orion is offline   Reply With Quote
Old 11-12-2004   #11
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Orion,

Would it be your experience or your guess that a search engine company (Google, Yahoo!, MSN, Teoma, etc.) would implement an algorithm addition like this directly as described in the paper?

Personally, I would expect that implementation would take this much further and close the loopholes that you and I can easily spot - like using CSS to place links near the top of the code, but the bottom of the visible document.

I have a bone to pick with the anchor text weight as well. I believe that "here" can be just as valid as "pixie sticks for kids" when considering the relevance of a link. The context and content it's framed in is what counts. With advanced semantics, a search engine should be able to contextually decipher a web page's content (whether that content be a sentence, a paragraph, a list of links, or something else) and place it into a categorical hierarchy from which to determine if a link placed in the text is 'relevant'.

Orion - thanks for presenting this material, I think it makes for a great discussion and a lot of noodle-scratching - something that's hard to come by in the monotony of SEO work.
randfish is offline   Reply With Quote
Old 11-12-2004   #12
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, randfish

Quote:
Originally Posted by randfish
I have a bone to pick with the anchor text weight as well. I believe that "here" can be just as valid as "pixie sticks for kids" when considering the relevance of a link. The context and content it's framed in is what counts. With advanced semantics, a search engine should be able to contextually decipher a web page's content (whether that content be a sentence, a paragraph, a list of links, or something else) and place it into a categorical hierarchy from which to determine if a link placed in the text is 'relevant'.
Let see how we can resort this.

WLR is a link model, not a semantic model. WLR does not account for the content and semantics of the page the link points to or for the semantic or content where the links are hosted. This is a limitation of most link models, including PageRank. To account for content and semantics one must use other algorithms and similarity functions (term vector cosine similarity, terms co-occurrence, topic analysis, latent semantic indexing, etc).

The idea of the WLR model consists in incorporating anchor attributes into the PageRank to check how the new scoring function behaves. These attributes are

The presence and number of links in the page (implicicit in PR already)
The text itself (text content of the anchor link)
The length of the link (in characters)
The position of the link (absolute position of the anchor tags in the source code)

Note that the semantic structure of the documents the links point to or where the link reside, as the theme, topic or whether the documents are relevant to the initial document is not considered.

WLR is just one of many scoring functions one could use when evaluating an overall score for a document.

True that there is a bit of semantic implicit in the model when one considers the nature, length and content of the anchor text, but the model is aimed mostly at improving the current PageRank metric.


I hope this help.


Orion

Last edited by orion : 11-13-2004 at 07:48 PM. Reason: changing 'achieving' for 'evaluating'
orion is offline   Reply With Quote
Old 11-12-2004   #13
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, all.

I had the privilege of receiving email communication from Dr. Ricardo Baeza-Yates, co-author of the WLR Model.

On the concept of perfect ranking he explained to me

“A perfect ranking is just a ranking where all answers are relevants (not
necessarely) in the best order. A REAL perfect ranking is infeasible.”


What this means is that they used a set of documents of known relevancy as a reference set to construct the table shown in the paper (Table 1). All used reference documents (answers) were relevant.

For those that might be interested in the datum, Dr. Baeza-Yates is the co-author of the authority book

Modern Information Retrieval (ACM Press Books, 1999)

This is a must-have reference and research book used in many graduate schools across the Nation. I got my copy back in 2000 and still is a great source of inspiration and research.

He also sent me a paper he wrote few years ago on PageRank and the age of sites. I’m digesting it now. Great interesting piece of research.


Orion

Last edited by orion : 11-13-2004 at 11:57 AM.
orion is offline   Reply With Quote
Old 11-13-2004   #14
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Thanks for all your research and efforts. You've helped to make all of us more aware of the engineering side of search.

I know there are many people who read these threads and feel like the information and conversation is over their heads, but these advanced concepts are more simple than we make them out to be.

Search engineers & researchers are always looking for ways to make the results more relevant. For those who'd like to take something away specific from this thread (and haven't already), I would say this:
When thinking about how search engines will consider a specific link, it's good to keep in mind that the components of WLR - the location of a link in the HTML code, the length of the anchor text and the tag that contains the link (H1, H2, et.c) - are factors to consider.
Overall, the more SEOs think about how search engines will advance the relevancy of their results, the more we can help our clients and ourselves to build websites that will continue to be valuable to users and well ranked in the SERPs.
randfish is offline   Reply With Quote
Old 11-14-2004   #15
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Quote:
Originally Posted by orion

Implications for Navigation Menus

This has some implications for links placed in menus. Links placed in navigation menus are usually shorter than links placed in the body of a document. Thus, in the WLR model these links should weigh less than long links placed at the beginning of the page.

In addition, links hand-coded to appear in a navigational menu positioned at the corners or bottom of a page should weigh less.
It all sounds a little perverse - navigation links usually a very good description of the content they describe.

Also, any algo that penalises links by length is - in my opinion - a little short-sighted. Long links are not necessarily natural, and if all SEOs have to do is lengthen their anchor text, then how are "natural" links going to compete?

Short links are often proper descriptive links and the expert documents approach of Hilltop, at recognises that, and tries to put them into a wider link pop context.

At their heart, links are not simply anchor text, but concise descriptions of other pages for navigational purposes. Any system that seeks to undermine that ethic immediately simply sets up a platform for a different level of "misuse" without addressing it in the first place.

Last edited by I, Brian : 11-14-2004 at 05:43 AM.
I, Brian is offline   Reply With Quote
Old 11-14-2004   #16
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation On link lengths

Hi, Brian

Before drawing conclusions, let's put link lengths in perspective.

1. Length is a relative term, so as "short" and "long".
2. The authors seem to define link lengths in terms of number of characters.

More importantly, how short or how long a link can go or weigh was not addressed in the WLR paper. So, we don't know if what is considered "long" in the paper is what some consider "short". Again let's remind this is a conference paper, not a journal paper. So not all the details of the research are suppose to be in writing.

Regarding the notion of natural flow of semantics, this flow is as good as the person coding the link and the person reading the link, so this is a subjective area. The web is full of "short" and "long" links with no semantic content or flow.

If a search engine (Google, MSN, Yahoo, etc) plans to incorporate WLR or if they already have done this only time will tell. However, some simple tests should shed some light.


Orion

Last edited by orion : 11-14-2004 at 11:04 AM.
orion is offline   Reply With Quote
Old 11-14-2004   #17
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
The principle of the WLR weighting towards longer links, would suggest that "spammy" link anchor would absolutely triumph - therefore any search enigne acting on that principle would be handing SEO a "golden gun" to shoot right through natural linkage.

The length of anchor text itself cannot be used a practical measure of relvancy - but the context of the link, the semantics of the anchor text, and the semantic relationship between the link page and the target page can be. The suggestion so far seems a rather odd concept to say the least.

Regardless as to how much information is detailed in WRP, the actual description so far suggests a very flawed concept indeed.

Perhaps I have profoundly misunderstood something important about the concept - it wouldn't be the first time.
I, Brian is offline   Reply With Quote
Old 11-14-2004   #18
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by I. Brian
The length of anchor text itself cannot be used a practical measure of relvancy - but the context of the link, the semantics of the anchor text, and the semantic relationship between the link page and the target page can be. The suggestion so far seems a rather odd concept to say the least.

Regardless as to how much information is detailed in WRP, the actual description so far suggests a very flawed concept indeed.
Hi, Brian.

I hope this help.

As previously explained to randfish (see posts #11 and #12), the WLR is a link model, not a semantic model. The context of the link, the semantics of the anchor text, and the semantic relationship between the link page and the target page is addressed through models that account for these. That’s why we have semantic models developed in support of link models such as latent semantic indexing, similarity functions, term vector, terms occurrence and co-occurrence models to mention a few.

The WLR model is an attempt at improving a link model, PageRank, not an attempt at improving the scoring of semantics.

Here is why the WLR was proposed back at the 2004 Conference of the W3C:

“In all published link ranking algorithms, all links have the
same importance. However, web page developers give more
importance to some links using different HTML tags, be-
cause some Web resources are more important than others.
Hence, a link ranking technique that gives different weights
to links may improve over uniform weight links.”

“In this work we present a variant of PageRank that gives
weights to link based on three attributes: relative position in
the page, tag where the link is contained, and length of the
anchor text. Our results show that our algorithm, WLRank,
improves over PageRank.”


Orion

Last edited by orion : 11-14-2004 at 06:48 PM. Reason: typos
orion is offline   Reply With Quote
Old 11-14-2004   #19
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Orion,

I don't think Brian or I would argue that a current implementation of WLR would improve the PageRank algo. However, the danger of offering a benefit that can be easily manipulated through html code by savvy SEOs is a very dangerous gamble that will probably hurt relevancy in the long run.

I think that both Brian and myself aren't arguing that WLR should include semantic pieces, we're simply suggesting more positive and effective ways to improve relevancy.

That said, the W3C is thinking ahead - the PageRank algo needs some serious work to become a more relevant part of Google's overall strategy. But, I'm loathe to conceed that a short-term solution link WLRanking would be of benefit without some modification.

Brian, you were critical of the length of anchor text, but remember that WLR would do an average of length on each page it checks, so you'd have to use short anchors except for the links you want to be most important - it's still got the potential to be gamed/spammed but it's a little better than I orginally thought.
randfish is offline   Reply With Quote
Old 11-14-2004   #20
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by randfish
However, the danger of offering a benefit that can be easily manipulated through html code by savvy SEOs is a very dangerous gamble that will probably hurt relevancy in the long run.
I found this argument very true and at the same time very weak.

This argument is true about any link model or semantic model that is publicly described on the Web. Once you describe an algorithm to the public, there will always be someone willing to game the scoring function. Isn’t this what happened with the original PageRank once described in the infamous Anatomy of…? (which, btw, many IR scientists found full of fallacies from the get-go).

I feel that arguing against WLR on the grounds that it can be gamed is a weak argument, as I can argue the same about any and every one of the current publicly described algorithms. As a matter of fact any of the following can be gamed

PageRank (any variant of it that has been publicly described)
TLA, Temporal Link Analysis
Block-Level Link Analysis
Term Counts Models (any variant)
Term Vector Models (any variant)
Local Context Analysis (any variant)


and the list goes on...

In my view, concerns about gaming a scoring function are both true and weak arguments, simply because once you tell the world how an algorithm works there will always be someone willing swing at it, as we can see right now. But in the interest of fairness, I still haven’t found any IR scoring function or system that cannot be gamed.

Precisely, that’s why most models that work so well under controlled lab conditions keep breaking on the commercial Web. I’m of the ones that think that the best way to studying the beast is in its natural habitat full of noise and vested commercial interests (including seos/sems).

Quote:
Originally Posted by randfish
I think that both Brian and myself aren't arguing that WLR should include semantic pieces, …
Again, both the original PageRank as the current WLR are not semantic models; they are link models. To measure, analyze and score semantics you need to use the corresponding semantic models.


Orion

Last edited by orion : 11-14-2004 at 09:54 PM.
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off