Special thanks to:
|
#1
|
|||
|
|||
|
Trustrank and the Link Spam Patent Application
Trustrank is one of those phrases that I've seen spring up in blogs and on forums, and there are more than a couple of threads here that discuss the topic.
I posted a blog entry this morning, In Yahoo We Trust - The Link Spam Patent Application about a new patent application from Yahoo which uses trustrank, in combination with pagerank to filter pages from results. The patent application is here: Link-based spam detection The here's the original trustrank paper: Combating Web Spam with TrustRank The patent application does a very nice job of showing how trustrank could be used with pagerank. Anyone think that Yahoo is presently doing this? |
|
#2
|
||||
|
||||
|
Aside from the two from Stanford, the third author of the TrustRank paper, Jan Pederson, is listed as being from Yahoo. I don't doubt that Yahoo goes after link spam with a vengeance; I've looked at a number of sites that were banned and it was apparently for linking practices - and that's just a speck of sand on a big beach.
The question is, how could Yahoo utilize PageRank in a patent or process when it's trademarked by Stanford as the assignee - unless it's something else or they'll be licensing the technology from Stanford.? |
|
#3
|
|||
|
|||
|
Yep. That's one of those questions that nagged at me a little. We know from the Clever Project and HITS that Google doesn't have a monopoly on using links to assign value to pages.
The patent application includes a list of "definitions" including one for pagerank: Quote:
Yahoo's Pavel Berkhin also has his name on the patent application, and his A Survey on PageRank Computing is one of the most definitive (public) sources of information about the various flavors of pagerank. There's some overlap between the authors of the Trustrank paper and the patent - all four are authors of Link Spam Detection Based on Mass Estimation paper; Zoltan Gyongyi, Pavel Berkhin, Hector Garcia-Molina, and Jan Pedersen. That Mass Estimation is also a concept that shows up in the patent. In the footnotes on the front page of that paper, it's noted that Zoltan Gyongyi was working on this as a summer intern at Yahoo! So the three names on the patent application are all of people who either work for Yahoo, or did at one point. It is possible to patent a method without using it. Seems like a lot of effort went into this though. |
|
#4
|
||||
|
||||
|
Bill, we can see some of the usual suspects' names showing up on different papers and patents over the years that carry a flavor with them, and it's kind of interesting to watch which areas they seem to have an interest in, what they collaberate on, and who with.
One thing about this particular patent app is that it seems to be borrowing concepts from several different sources, including the one on Re-ranking/LocalRank. A noticable element running through this one is the emphasis on human judgment and editorial involvement, which needless to say - unlike Google's approach and philosophies - is not at all unexpected for Yahoo. |
|
#5
|
|||
|
|||
|
It is interesting to see those flavors associated with different authors. You can sort of tell if something was written by Junghoo Cho, or Andrei Broder, or Monika Henzinger or Steven Lawrence or Jon Kleinberg without looking at the names on a patent or paper.
I can see some of the ideas from Krishna Bharat's reranking patent in this, and elements from a few others, too. Yahoo doesn't get too far away from their roots as a directory, with pages chosen by people, do they? |
|
#6
|
||||
|
||||
|
Quote:
|
|
#7
|
|||
|
|||
|
Really good question.
I'm not sure that they would get any boost, but by their nature, and the mechanism involved, they should be less likely to be harmed or filtered out as spam pages because they are setting a standard for the other pages to be measured by when it comes to trust. As an analogy, in a system like Topic Sensitive Page Rank, a site like the Open Directory is described as being used as a starting point to develop some topics, but it isn't likely that those Directory pages would rank higher just because they were used to come up with topics. Would there be some mechanism in place to let someone know that a seed site had changed in some manner? Would they be revisited in some manner, and would others replace them? The patent and paper both describe a inverse pagerank method to identify pages that have lots of outlinks, which are then reviewed for quality by an editor. Changes in that inverse pagerank score could trigger a new review by an editor. Other changes could be tracked, and trigger a new review. In the paper, there were some interesting filters that they applied to their list of sites that had high inverse pagerank scores to find seed sites. They eliminated sites that were not listed in any of the major web directories, like the DMOZ, and they further filtered those by trying to choose sites that were heavily controlled by some authority that controlled the contents of the site: Quote:
Quote:
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|