Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Yahoo! > Yahoo Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 05-04-2006   #1
bragadocchio
Seeking Enlightenment
 
Join Date: Aug 2004
Location: Warrenton, Virginia
Posts: 57
bragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura about
Trustrank and the Link Spam Patent Application

Trustrank is one of those phrases that I've seen spring up in blogs and on forums, and there are more than a couple of threads here that discuss the topic.

I posted a blog entry this morning, In Yahoo We Trust - The Link Spam Patent Application about a new patent application from Yahoo which uses trustrank, in combination with pagerank to filter pages from results. The patent application is here:

Link-based spam detection

The here's the original trustrank paper:

Combating Web Spam with TrustRank

The patent application does a very nice job of showing how trustrank could be used with pagerank. Anyone think that Yahoo is presently doing this?
bragadocchio is offline   Reply With Quote
Old 05-04-2006   #2
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Aside from the two from Stanford, the third author of the TrustRank paper, Jan Pederson, is listed as being from Yahoo. I don't doubt that Yahoo goes after link spam with a vengeance; I've looked at a number of sites that were banned and it was apparently for linking practices - and that's just a speck of sand on a big beach.

The question is, how could Yahoo utilize PageRank in a patent or process when it's trademarked by Stanford as the assignee - unless it's something else or they'll be licensing the technology from Stanford.?
Marcia is offline   Reply With Quote
Old 05-04-2006   #3
bragadocchio
Seeking Enlightenment
 
Join Date: Aug 2004
Location: Warrenton, Virginia
Posts: 57
bragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura about
Yep. That's one of those questions that nagged at me a little. We know from the Clever Project and HITS that Google doesn't have a monopoly on using links to assign value to pages.

The patent application includes a list of "definitions" including one for pagerank:

Quote:
PageRank is a family of well known algorithms for assigning numerical weights to hyperlinked documents (or web pages or web sites) indexed by a search engine. PageRank uses link information to assign global importance scores to documents on the web. The PageRank process has been patented and is described in U.S. Pat. No. 6,285,999. The PageRank of a document is a measure of the link-based popularity of a document on the Web.
First time that I've heard pagerank referred to as a "family," but it's not difficult to believe that Yahoo! uses some measure of link popularity based upon hyperlinks that might be a cousin to pagerank.

Yahoo's Pavel Berkhin also has his name on the patent application, and his A Survey on PageRank Computing is one of the most definitive (public) sources of information about the various flavors of pagerank.

There's some overlap between the authors of the Trustrank paper and the patent - all four are authors of Link Spam Detection Based on Mass Estimation paper; Zoltan Gyongyi, Pavel Berkhin, Hector Garcia-Molina, and Jan Pedersen. That Mass Estimation is also a concept that shows up in the patent. In the footnotes on the front page of that paper, it's noted that Zoltan Gyongyi was working on this as a summer intern at Yahoo! So the three names on the patent application are all of people who either work for Yahoo, or did at one point.

It is possible to patent a method without using it. Seems like a lot of effort went into this though.
bragadocchio is offline   Reply With Quote
Old 05-05-2006   #4
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Bill, we can see some of the usual suspects' names showing up on different papers and patents over the years that carry a flavor with them, and it's kind of interesting to watch which areas they seem to have an interest in, what they collaberate on, and who with.

One thing about this particular patent app is that it seems to be borrowing concepts from several different sources, including the one on Re-ranking/LocalRank.

A noticable element running through this one is the emphasis on human judgment and editorial involvement, which needless to say - unlike Google's approach and philosophies - is not at all unexpected for Yahoo.
Marcia is offline   Reply With Quote
Old 05-06-2006   #5
bragadocchio
Seeking Enlightenment
 
Join Date: Aug 2004
Location: Warrenton, Virginia
Posts: 57
bragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura about
It is interesting to see those flavors associated with different authors. You can sort of tell if something was written by Junghoo Cho, or Andrei Broder, or Monika Henzinger or Steven Lawrence or Jon Kleinberg without looking at the names on a patent or paper.

I can see some of the ideas from Krishna Bharat's reranking patent in this, and elements from a few others, too.

Yahoo doesn't get too far away from their roots as a directory, with pages chosen by people, do they?
bragadocchio is offline   Reply With Quote
Old 05-07-2006   #6
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Quote:
Yahoo doesn't get too far away from their roots as a directory, with pages chosen by people, do they?
Which raises a $64K question: Do/would the sites hand chosen to be in the seed set also get a human-generated boost in the SERPs?
Marcia is offline   Reply With Quote
Old 05-07-2006   #7
bragadocchio
Seeking Enlightenment
 
Join Date: Aug 2004
Location: Warrenton, Virginia
Posts: 57
bragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura aboutbragadocchio has a spectacular aura about
Really good question.

I'm not sure that they would get any boost, but by their nature, and the mechanism involved, they should be less likely to be harmed or filtered out as spam pages because they are setting a standard for the other pages to be measured by when it comes to trust.

As an analogy, in a system like Topic Sensitive Page Rank, a site like the Open Directory is described as being used as a starting point to develop some topics, but it isn't likely that those Directory pages would rank higher just because they were used to come up with topics.

Would there be some mechanism in place to let someone know that a seed site had changed in some manner? Would they be revisited in some manner, and would others replace them? The patent and paper both describe a inverse pagerank method to identify pages that have lots of outlinks, which are then reviewed for quality by an editor. Changes in that inverse pagerank score could trigger a new review by an editor. Other changes could be tracked, and trigger a new review.

In the paper, there were some interesting filters that they applied to their list of sites that had high inverse pagerank scores to find seed sites. They eliminated sites that were not listed in any of the major web directories, like the DMOZ, and they further filtered those by trying to choose sites that were heavily controlled by some authority that controlled the contents of the site:

Quote:
...we only selected sites with a clearly identifiable authority (such as a governmental or educational institution or company) that controlled the contents of the site. The extra filter was added to guarantee the longevity of the good seed set, since the presence of physical authorities decreases the chance that the sites would degrade in the short run.
The conclusion also considers an approach where seed sites could be reviewed and considered on an ongoing basis:

Quote:
We believe that there are still a number of interesting experiments that need to be carried out. For instance, it would be desirable to further explore the interplay between dampening and splitting for trust propagation. In addition, there are a number of ways to refine our methods. For example, instead of selecting the entire seed set at once, one could think of an iterative process: after the oracle has evaluated some pages, we could reconsider what pages it should evaluate next, based on the previous outcome.
bragadocchio is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off