Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 11-10-2004   #21
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hi, randfish

Quote:
Originally Posted by randfish
This would also suggest to me that the "Google prefers older sites" concept is directly at odds with TLA. Orion, Mike, Rusty, et al - hopefully you can tell me what I'm missing.
The missing part is that Google does not seem to be implementing TLA at all because of the reasons given above; otherwise there would be no contradiction, precisely.

I hope this help.

Orion

Last edited by orion : 11-10-2004 at 09:44 PM.
orion is offline   Reply With Quote
Old 11-16-2004   #22
spocksbeard
 
Posts: n/a
Two further papers on TLA

There are two further papers related to temporal issues in ranking

Appeared in WWW2004 (freely accessible)
http://www.www2004.org/proceedings/docs/2p448.pdf

Appeared in WAW2004 (not freely accessible)
http://springerlink.metapress.com/op...3243&spage=131

Moreover, Microsoft integrated a 'temporal parameter' in their new search beta
http://beta.search.msn.com

so you can specify whether recently updated pages are favored
  Reply With Quote
Old 11-16-2004   #23
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation TimedPageRank

Welcome to this thread, spocksbeard. Please feel at home.

Excellent findings.

Yes, the first quoted link, On the Temporal Dimension of Search, is one of several papers described at the 2004 W3C and proposed to improve PageRank. This one is TimedPageRank.

The authors of the paper states

"There are many factors that contribute to the potential
importance of a paper, such as the citations it has received, the
date of these citations, its authors, and the publication journal.
PageRank only includes the first factor, the citations that a paper
receives. To integrate the time dimension, we add timing factors
in the PageRank and propose the TimedPageRank algorithm."

As in TLA, this paper shows the benefits of incorporating time in link models. The authors find the same findings as in the TLA model; that is, newer sites receive more citations than older sites. Precisely this behavior is just the opposite of literature citation (another reason of why the link-literature citation analogy cannot be sustained).

This has important implication for authorities as now we need to think in terms of timely authorities. If the TimedPageRank model agrees with the TLA model, then when time is included newer authority sites should score higher than if no time dimension is assigned to them.

On Linear Decay

One finding of this report I would need to retest carefully is the notion that the Aging(A) factor decays linearly. Before accepting or rejecting this idea that there is a linear decay regarding the aging factor, I'm inclined to conducting more experiments as we need to factor-in too many variables.


Orion

Last edited by orion : 12-16-2004 at 09:16 PM. Reason: refining some lines
orion is offline   Reply With Quote
Old 12-12-2004   #24
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Second Paper on TLA

Here is the second big paper on TLA, named Trend Detection Through Temporal Link Analysis. The paper presents many improvement and new ideas not discussed in the previous work. In this post I only present some portions with minimum comments. In future posts we can discuss more. Enjoy it!

LINK CITATION vs. LITERATURE CITATION

First, note that the graphs are no longer referred to a DIP curves. Some aspects of TLA are elegantly presented in a more precise fashion. In general, the paper proves the link citation-literature citation analogy is unsustainable as the behavior of these citation schemes are completely dissimilar. The authors write

“Consequently, Web links exhibit a behavior that is the opposite of that found in scientific citations: the more time passes the more citations (links) a page receives”.

TIMESTAMPED OF A LINK and Themes

The basic unit of temporal data that drives the applications
proposed in the following section is the timestamped
link. A timestamped link to a page p is an ordered pair (u,t),
where u is an URL of a page that was last modified at time
t and that links to p. We will usually be interested in
timestamped links to a theme, where a theme will be identi-
fied as a set of pages. Formally, if P is a set of pages on a certain
theme or topic, a timestamped link to P is an ordered
pair (u,t) where u is a URL of a page, last updated at time t,
which links to a page in P.
As Web users, we know that

What Makes This Timestamp So Unique?

In my view, the gist of the model is an attempt at modeling/understanding Leaders, Followers, and Passersby and their behavior in time.

The Timestamped Link Profile

The researchers introduce the TLP of a theme as the “normalized projection of that theme’s timestamped links onto the time axis.” This is an histogram of the age of timestamped links and are normalized; the sum of all counts equals 1. TLP are described as follows

“… a TLP for a theme is assembled by submitting
a query (or several queries) describing the theme to a
search engine (or several engines). We then take the union of
the top-n results returned by the engine(s) for the query(ies)
and denote the resulting set of pages by P. Next, we ask for
pages that link to each of the pages in P, taking the “last
modified” value of those pages. Let Q denote the pages linking
to the pages of P, and let L denote the multiset of
timestamped links from the pages of Q to the pages of P (L
is a multiset because each page in Q may link to multiple
pages of P). We also decide on a date range of interest and on
the number of (equal-sized) intervals by which the date
range should be partitioned. The TLP is then plotted by associating
each timestamped link of the multiset L with the
interval that contains the last update time of the page containing
L.”

Abnormal TLPs and Their Relation to Real-Life Events

In this section, they show how TLP for a given theme can be used to discover and analyze abnormal changes in the activities of virtual communities. This has important implications for intelligence monitoring back/for-ward in time.

The effect of special events link activities is well-demonstrated in the sections

1. Comparing a Theme to Itself in Two Different Points in Time

2. Tracing Themes Over Time

This section make me rise some questions. Were link analysis/activities performed before or near the tragic events of 9/11 by others out there? Did they “see” the same red flag in the sky as we can see now from the graphs? Where were the internet intelligence folks back then?

From Authorities to Timely Authorities

In this section they disclose the weighting scheme assigned and how this affect the ranking of timely authorities

• If (d = 0 or d = 1 year): bonus = 0
• If (if d=0 and d =1 week): bonus = 1.5
• If (d = 1 week and d = 1 month): bonus = 1.0
• If (d = 1 month and d = 6 months): bonus = 0.5
• If (d = 6 months and d = 1 year): bonus = 0.25


POSSIBLE APPLICATIONS

"The strength of the simple TLP approach is that anyone can track his or her temporal reputation using this method without requiring heavyweight processing, logging, or paying a search engine to reveal users’ behavior. For example, a business (with a Web site) that has launched an ad campaign
can track the level of new timestamped links daily and determine the effectiveness of its advertisement. When combined with other sources of information such as traffic and query log, TLPs offer a more focused view on the community
of content makers and information providers on
theWeb."

This could be used as another source of revenues ($$$) for search engine marketers. It would be interesting to know how the IBM patent plays into the picture.

Other applications I can see for TLA and new frontiers:

1. Seasonal and cyclical behaviors can be examined with time delay techniques.
2. Non Linear behaviors can be examined with Poincare maps, and Chaos theory.
3. Temporal self-similarity and power law behaviors can be examined through standard fractal geometry and scaling concepts.


Orion

Last edited by orion : 12-16-2004 at 09:00 PM. Reason: typos; off-topic lines
orion is offline   Reply With Quote
Old 12-13-2004   #25
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Timely Authorities

FROM AUTHORITIES TO TIMELY AUTHORITIES

In this section of the TLA paper the authors write

“Having assembled the subgraph to be analyzed, we proceed to assign weights to its hyperlinks. The rankings of many link-analyzing algorithms can be biased toward favorable pages by assigning different weights to links in the collection.”

“In query-specific searches, links referring to pages with matching textual content, or links with anchor text that appear in the query, are often awarded high weights (Bharat & Henzinger, 1998; Chakrabarti et al., 1998). Among the algorithms that are affected by such techniques are HITS (Kleinberg, 1999) and SALSA (Lempel & Moran, 2000).”

They build on these and previous frameworks to produce a new framework. This is done basically by adding the time dimension. This is the only and main difference between the two frameworks. What is surprising is that this simple difference has non-trivial consequence on the ranking of authority sites. The concept of authority sites is no longer base on linkage only.

“Specifically, we assign weights to links based
on two parameters:

1. Following Chakrabarti et al. (1998), links are weighted according to the similarity between the anchor text which is associated with them and the query.
2. To add the time dimension to the analysis, we further alter the weights associated with the links by adding a bonus as follows. For each link, let d denote the time difference (in days) between the current date and the timestamp associated with the link. In our implementation we adjusted the weights as follows:

• If (d = 0 or d = 1 year): bonus = 0
• If (if d=0 and d =1 week): bonus = 1.5
• If (d = 1 week and d = 1 month): bonus = 1.0
• If (d = 1 month and d = 6 months): bonus = 0.5
• If (d = 6 months and d = 1 year): bonus = 0.25

Basically, links from fresh pages (pages that were updated recently) were assigned higher weights than links emanating from stale pages. This is the only algorithmic difference between the calculation of “basic” authorities and the calculation of “timely” authorities. We then analyzed the link structure of the subgraph S in the manner described by Aridor et al. (2000), assigning each candidate page with a hub score and an authority score. These scores are computed by summing the scores produced by the HITS and SALSA algorithms.”

“In what follows, we present experiments of our modification for two queries. In each experiment, two lists of 20 authorities are shown. The list on the left was produced without considering temporal data, while the list on the right took into account the temporal data. Furthermore, the parentheses
next to every URL on the right list indicate the rank of that URL in the left column (or “new” when the URL did not appear in the left column). Table 1 summarizes the differences between the two ranked lists in every experiment conducted in 2001 by listing the number of authorities that hold
top-n positions in both lists.”


No further explanation is needed. Tables 2 and 3 of the second TLA paper compare authorities and TLA-based authorities. Similar tables and discussion is found in the first TLA paper.

Adding the time dimension to previous and currents link models does affect the notion of ranking results, their meaning and how we would need to look at link-base data, community activities, seasonal trends and events in time. They all affect the notion of link importance and web behaviors. In what great days we are living!


Orion


References

Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.

Kleinberg, J.M. (2002, July 23–26). Bursty and hierarchical structure in streams. Proceedings of the 8th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining

Finding Authorities and Hubs From Link Structures on the World Wide Web

Last edited by orion : 12-16-2004 at 08:45 PM. Reason: typos; refining lines
orion is offline   Reply With Quote
Old 12-15-2004   #26
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation TLA And Previous Known Art

PREVIOUS STUDIES

What makes the TLA model different from previous “known art”? Let’s address this issue.

According to this seminar presentation on TLA

“There have been numerous attempts to make use of time to predict trends on the Web. However all of those studies emphasized the detection of the change itself and not the temporal nature of the data studied.”

That is, previous studies on link variations in time are reserved for monitoring changes. These models are not embedded into IR systems to account for process used for weighing text, ranking documents or to actually retrieve information.

“None of these studies looked into how to incorporate time into the processes that are currently used for ranking web pages, computing link-based measures of site popularity, and link analysis in general. In fact, to the best of our knowledge, the Web Information Retrieval community has never proposed such a temporal approach.”, the researchers state.


IMPORTANT CLAIMS

It is clear that TLA is more than one claim , mere link counting activities or monitoring changes in time. The model can be embedded into IR processes that actually assign weights, rank documents, and retrieve information. How this is done?

Let’s revisit some of the 32 claims listed in the TLA patent,

Claim 20. A method for temporally ranking a collection of linked entities, the method comprising: for each link activity record related to a link, assigning a weight to said link according to a temporal criterion applied to said link activity record; performing said assigning step for at least one link to each of a plurality of linked entities; and ranking said linked entities and associated links using said weights.

Claim 21. A method according to claim 20 wherein said assigning step comprises assigning more weight to any of said links having either of more link activity records and more recent link activity records than to any of said links having either of fewer link activity records and fewer recent link activity records.

Claim 30. A system for temporally ranking a collection of linked entities, the system comprising: means for assigning a weight to a link for each link activity record related to said link according to a temporal criterion applied to said link activity record; means for performing said assigning step for at least one link to each of a plurality of linked entities; and means for ranking said linked entities and associated links using said weights.

Claim 31. A system according to claim 30 wherein said means for assigning is operative to assign more weight to any of said links having either of more link activity records and more recent link activity records than to any of said links having either of fewer link activity records and fewer recent link activity records.


Orion

Last edited by orion : 12-15-2004 at 10:41 PM.
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off