|
#21
|
||||
|
||||
|
Hi, randfish
Quote:
I hope this help. Orion Last edited by orion : 11-10-2004 at 09:44 PM. |
|
#22
|
|||
|
|||
|
Two further papers on TLA
There are two further papers related to temporal issues in ranking
Appeared in WWW2004 (freely accessible) http://www.www2004.org/proceedings/docs/2p448.pdf Appeared in WAW2004 (not freely accessible) http://springerlink.metapress.com/op...3243&spage=131 Moreover, Microsoft integrated a 'temporal parameter' in their new search beta http://beta.search.msn.com so you can specify whether recently updated pages are favored |
|
#23
|
||||
|
||||
|
Welcome to this thread, spocksbeard. Please feel at home.
Excellent findings. Yes, the first quoted link, On the Temporal Dimension of Search, is one of several papers described at the 2004 W3C and proposed to improve PageRank. This one is TimedPageRank. The authors of the paper states "There are many factors that contribute to the potential importance of a paper, such as the citations it has received, the date of these citations, its authors, and the publication journal. PageRank only includes the first factor, the citations that a paper receives. To integrate the time dimension, we add timing factors in the PageRank and propose the TimedPageRank algorithm." As in TLA, this paper shows the benefits of incorporating time in link models. The authors find the same findings as in the TLA model; that is, newer sites receive more citations than older sites. Precisely this behavior is just the opposite of literature citation (another reason of why the link-literature citation analogy cannot be sustained). This has important implication for authorities as now we need to think in terms of timely authorities. If the TimedPageRank model agrees with the TLA model, then when time is included newer authority sites should score higher than if no time dimension is assigned to them. On Linear Decay One finding of this report I would need to retest carefully is the notion that the Aging(A) factor decays linearly. Before accepting or rejecting this idea that there is a linear decay regarding the aging factor, I'm inclined to conducting more experiments as we need to factor-in too many variables. Orion Last edited by orion : 12-16-2004 at 09:16 PM. Reason: refining some lines |
|
#24
|
||||
|
||||
|
Here is the second big paper on TLA, named Trend Detection Through Temporal Link Analysis. The paper presents many improvement and new ideas not discussed in the previous work. In this post I only present some portions with minimum comments. In future posts we can discuss more. Enjoy it!
LINK CITATION vs. LITERATURE CITATION First, note that the graphs are no longer referred to a DIP curves. Some aspects of TLA are elegantly presented in a more precise fashion. In general, the paper proves the link citation-literature citation analogy is unsustainable as the behavior of these citation schemes are completely dissimilar. The authors write “Consequently, Web links exhibit a behavior that is the opposite of that found in scientific citations: the more time passes the more citations (links) a page receives”. TIMESTAMPED OF A LINK and Themes The basic unit of temporal data that drives the applications proposed in the following section is the timestamped link. A timestamped link to a page p is an ordered pair (u,t), where u is an URL of a page that was last modified at time t and that links to p. We will usually be interested in timestamped links to a theme, where a theme will be identi- fied as a set of pages. Formally, if P is a set of pages on a certain theme or topic, a timestamped link to P is an ordered pair (u,t) where u is a URL of a page, last updated at time t, which links to a page in P. As Web users, we know that What Makes This Timestamp So Unique? In my view, the gist of the model is an attempt at modeling/understanding Leaders, Followers, and Passersby and their behavior in time. The Timestamped Link Profile The researchers introduce the TLP of a theme as the “normalized projection of that theme’s timestamped links onto the time axis.” This is an histogram of the age of timestamped links and are normalized; the sum of all counts equals 1. TLP are described as follows “… a TLP for a theme is assembled by submitting a query (or several queries) describing the theme to a search engine (or several engines). We then take the union of the top-n results returned by the engine(s) for the query(ies) and denote the resulting set of pages by P. Next, we ask for pages that link to each of the pages in P, taking the “last modified” value of those pages. Let Q denote the pages linking to the pages of P, and let L denote the multiset of timestamped links from the pages of Q to the pages of P (L is a multiset because each page in Q may link to multiple pages of P). We also decide on a date range of interest and on the number of (equal-sized) intervals by which the date range should be partitioned. The TLP is then plotted by associating each timestamped link of the multiset L with the interval that contains the last update time of the page containing L.” Abnormal TLPs and Their Relation to Real-Life Events In this section, they show how TLP for a given theme can be used to discover and analyze abnormal changes in the activities of virtual communities. This has important implications for intelligence monitoring back/for-ward in time. The effect of special events link activities is well-demonstrated in the sections 1. Comparing a Theme to Itself in Two Different Points in Time 2. Tracing Themes Over Time This section make me rise some questions. Were link analysis/activities performed before or near the tragic events of 9/11 by others out there? Did they “see” the same red flag in the sky as we can see now from the graphs? Where were the internet intelligence folks back then? From Authorities to Timely Authorities In this section they disclose the weighting scheme assigned and how this affect the ranking of timely authorities • If (d = 0 or d = 1 year): bonus = 0 • If (if d=0 and d =1 week): bonus = 1.5 • If (d = 1 week and d = 1 month): bonus = 1.0 • If (d = 1 month and d = 6 months): bonus = 0.5 • If (d = 6 months and d = 1 year): bonus = 0.25 POSSIBLE APPLICATIONS "The strength of the simple TLP approach is that anyone can track his or her temporal reputation using this method without requiring heavyweight processing, logging, or paying a search engine to reveal users’ behavior. For example, a business (with a Web site) that has launched an ad campaign can track the level of new timestamped links daily and determine the effectiveness of its advertisement. When combined with other sources of information such as traffic and query log, TLPs offer a more focused view on the community of content makers and information providers on theWeb." This could be used as another source of revenues ($$$) for search engine marketers. It would be interesting to know how the IBM patent plays into the picture. Other applications I can see for TLA and new frontiers: 1. Seasonal and cyclical behaviors can be examined with time delay techniques. 2. Non Linear behaviors can be examined with Poincare maps, and Chaos theory. 3. Temporal self-similarity and power law behaviors can be examined through standard fractal geometry and scaling concepts. Orion Last edited by orion : 12-16-2004 at 09:00 PM. Reason: typos; off-topic lines |
|
#25
|
||||
|
||||
|
FROM AUTHORITIES TO TIMELY AUTHORITIES
In this section of the TLA paper the authors write “Having assembled the subgraph to be analyzed, we proceed to assign weights to its hyperlinks. The rankings of many link-analyzing algorithms can be biased toward favorable pages by assigning different weights to links in the collection.” “In query-specific searches, links referring to pages with matching textual content, or links with anchor text that appear in the query, are often awarded high weights (Bharat & Henzinger, 1998; Chakrabarti et al., 1998). Among the algorithms that are affected by such techniques are HITS (Kleinberg, 1999) and SALSA (Lempel & Moran, 2000).” They build on these and previous frameworks to produce a new framework. This is done basically by adding the time dimension. This is the only and main difference between the two frameworks. What is surprising is that this simple difference has non-trivial consequence on the ranking of authority sites. The concept of authority sites is no longer base on linkage only. “Specifically, we assign weights to links based on two parameters: 1. Following Chakrabarti et al. (1998), links are weighted according to the similarity between the anchor text which is associated with them and the query. 2. To add the time dimension to the analysis, we further alter the weights associated with the links by adding a bonus as follows. For each link, let d denote the time difference (in days) between the current date and the timestamp associated with the link. In our implementation we adjusted the weights as follows: • If (d = 0 or d = 1 year): bonus = 0 • If (if d=0 and d =1 week): bonus = 1.5 • If (d = 1 week and d = 1 month): bonus = 1.0 • If (d = 1 month and d = 6 months): bonus = 0.5 • If (d = 6 months and d = 1 year): bonus = 0.25 Basically, links from fresh pages (pages that were updated recently) were assigned higher weights than links emanating from stale pages. This is the only algorithmic difference between the calculation of “basic” authorities and the calculation of “timely” authorities. We then analyzed the link structure of the subgraph S in the manner described by Aridor et al. (2000), assigning each candidate page with a hub score and an authority score. These scores are computed by summing the scores produced by the HITS and SALSA algorithms.” “In what follows, we present experiments of our modification for two queries. In each experiment, two lists of 20 authorities are shown. The list on the left was produced without considering temporal data, while the list on the right took into account the temporal data. Furthermore, the parentheses next to every URL on the right list indicate the rank of that URL in the left column (or “new” when the URL did not appear in the left column). Table 1 summarizes the differences between the two ranked lists in every experiment conducted in 2001 by listing the number of authorities that hold top-n positions in both lists.” No further explanation is needed. Tables 2 and 3 of the second TLA paper compare authorities and TLA-based authorities. Similar tables and discussion is found in the first TLA paper. Adding the time dimension to previous and currents link models does affect the notion of ranking results, their meaning and how we would need to look at link-base data, community activities, seasonal trends and events in time. They all affect the notion of link importance and web behaviors. In what great days we are living! Orion References Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632. Kleinberg, J.M. (2002, July 23–26). Bursty and hierarchical structure in streams. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Finding Authorities and Hubs From Link Structures on the World Wide Web Last edited by orion : 12-16-2004 at 08:45 PM. Reason: typos; refining lines |
|
#26
|
||||
|
||||
|
PREVIOUS STUDIES
What makes the TLA model different from previous “known art”? Let’s address this issue. According to this seminar presentation on TLA “There have been numerous attempts to make use of time to predict trends on the Web. However all of those studies emphasized the detection of the change itself and not the temporal nature of the data studied.” That is, previous studies on link variations in time are reserved for monitoring changes. These models are not embedded into IR systems to account for process used for weighing text, ranking documents or to actually retrieve information. “None of these studies looked into how to incorporate time into the processes that are currently used for ranking web pages, computing link-based measures of site popularity, and link analysis in general. In fact, to the best of our knowledge, the Web Information Retrieval community has never proposed such a temporal approach.”, the researchers state. IMPORTANT CLAIMS It is clear that TLA is more than one claim , mere link counting activities or monitoring changes in time. The model can be embedded into IR processes that actually assign weights, rank documents, and retrieve information. How this is done? Let’s revisit some of the 32 claims listed in the TLA patent, Claim 20. A method for temporally ranking a collection of linked entities, the method comprising: for each link activity record related to a link, assigning a weight to said link according to a temporal criterion applied to said link activity record; performing said assigning step for at least one link to each of a plurality of linked entities; and ranking said linked entities and associated links using said weights. Claim 21. A method according to claim 20 wherein said assigning step comprises assigning more weight to any of said links having either of more link activity records and more recent link activity records than to any of said links having either of fewer link activity records and fewer recent link activity records. Claim 30. A system for temporally ranking a collection of linked entities, the system comprising: means for assigning a weight to a link for each link activity record related to said link according to a temporal criterion applied to said link activity record; means for performing said assigning step for at least one link to each of a plurality of linked entities; and means for ranking said linked entities and associated links using said weights. Claim 31. A system according to claim 30 wherein said means for assigning is operative to assign more weight to any of said links having either of more link activity records and more recent link activity records than to any of said links having either of fewer link activity records and fewer recent link activity records. Orion Last edited by orion : 12-15-2004 at 10:41 PM. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|