|
#1
|
||||
|
||||
|
Temporal Link Analysis
The general analogy in the sense that link citation is analogous to literature citation (e.g., Garfield's Impact Factors) makes no sense. Literature citation is driven by peer reviews and editorial policies. On the Commercial Web, where anyone can buy or sell links, add or delete links at will, link citation is mostly driven by commercial and vested interests and strategic alliances of all kind. It can be argued that link-based models or marketing strategies based on the above analogy are questionable. It can be demonstrated that the dynamical nature and value of literature citation and link citation are completely different. Back in 2002, an Italy group presented a generalized Web Page Scoring System framework (WPSS) in which the dynamical nature of links and the Web was taken into consideration. Back then I wrote a non-technical review on this paper and other paper on the futility of link tools and link-based metrics and few seos were quick to react without knowing all the facts. The WPSS paper points out several theoretical flaws embedded from the start in link-based models, mostly because of the temporal nature of the Web and the fact that web traffic consists of two components: random (by chance) and deterministic (not by chance). A link model in which a user is modeled as a pure random walker (or pure deterministic walker) does not go with the reality and experience of average web surfers. The same can be said about models in which only one atomic action from the user is taken into consideration (e.g; "users don't click back"). To sum up, while under controlled IR lab conditions links may be a measure of citation importance (votes), most likely on the commercial Web this is not the case. Despite the fact that the Web is a dynamical system, few works have been published with regard to the temporal behavior of links. In 2003, during a presentation at Haifa, IBM researcher Einat Amitay discussed Temporal Link Analysis. Her presentation was enlightening: "In fact, a journal will be considered more prominent the higher its citation half-life is (i.e., how old in years are most of the papers currently cited in the literature that were previously published in this journal). Combined with another measure called impact-factor (the frequency with which the average article in a given journal has been cited in a particular year), libraries determine the value of a certain journal to their collection. Since the value of journals can change over time, this evaluation is carried out in many libraries on an annual or bi-annual basis. Furthermore, authors learn about the importance of their acceptance to a journal or the citation of their work in a certain journal based on such evaluations. In contrast, when plotting similar measures for citations on the Web, the reverse behaviour is exhibited: the more time passes the more citations a page receives. Furthermore, unlike the publications studied in co-citation analysis, pages on the Web are modified and updated with respect to real world events. There have been numerous attempts to make use of time to predict trends on the Web. However all of those studies emphasised the detection of the change itself and not the temporal nature of the data studied. None of these studies looked into how to incorporate time into the processes that are currently used for ranking web pages, computing link-based measures of site popularity, and link analysis in general. In fact, to the best of our knowledge, the Web Information Retrieval community has never proposed such a temporal approach. In this talk I will discuss several aspects and uses of temporal data in the context of Web IR. The main contribution of this work is first and foremost in raising the issue of utilizing the time dimension in the context of link analysis. I will demonstrate the benefits of this approach by showing how we incorporated this additional dimension into two applications. The first application measures the activity within a topical community as a function of time. The second application is an adaptation of link-based ranking schemes that captures timely authorities, the authorities that are on the rise today and should be ranked over the resources of days past." End of the quote. Let's discuss Temporal Link Analysis in the context of business intelligence and search engine marketing strategies. Orion References Temporal Link Analysis (research paper) http://techunix.technion.ac.il/~uriw...k_analysis.pdf Temporal Link Analysis of Linked Entities (USPTO patent) http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2 FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=200401282 73.PGNR.&OS=DN/20040128273&RS=DN/20040128273 A paper somewhat related (from the on-topic standpoint) Knowledge Encapsulation for Focused Searches from Pervasive Devices http://www10.org/cdrom/papers/436/ Last edited by orion : 10-25-2004 at 12:18 AM. Reason: 1. to change a line 2. more typos |
|
#2
|
||||
|
||||
|
Orion,
I think you have changed the level of what an SEM forum should expect. Outstanding, thought provoking and enlightening posts. Now, it would be very interesting to see test cases and how much they differ to the current algorithms. In fact, I can see this working well on the Web - as it states. But even better in old ancient works, including the old testament, roman philosophy, etc. Not that I am big in those areas. |
|
#3
|
||||
|
||||
|
Quote:
Cases: 1. See References. 2. Time series for links: Construct a time series control chart at two sigma levels. Plot y vs x where y=link popularity and x=time (days, weeks, months, years, etc) 3. Time series for co-occurrence: As above but using y=c-index calculated for titles, allintitles, anchor links, etc. This should reveal trends. Orion |
|
#4
|
|||
|
|||
|
Orion,
Thanks for the heads-up on this thread. I'm a huge fan of Einat Amitay. She went to university in Edinburgh which is the next major city north of where I live. And she was hugely helpful with the research work for my second edition. In fact, both she and Ellen Spertus (once winner of the sexiest geek alive award, believe or not!) have carried out remarkable work. Temporal tracking is something I'm covering more in-depth with the next edition. I stumbled upon the problem in web link-bibliography when I spoke to Craig Silverstein at Google some time ago. We were discussing the subject of the most popular movies. At that time, Titanic was the most popular movie ever... But you should get Lord of the Rings for that search now, regardless of the ancient linkage data. I do believe that, with the use of vector support machines (learning machines) some of the problems which I highlighted in a recent article I wrote about evolving networks, as they relate to the web, will become less of an issue. But as I'm currently finishing a report for a client right now... I'll have to come back to this. As ever, superb topic for discussion by the way. Wish I could hang around longer! Cheers! Mike. |
|
#5
|
||||
|
||||
|
Thanks, Mike for stopping by. I know you are a very busy person.
Quote:
![]() 2. The example you gives about movies is similar to the one given in Amitay's paper; i.e., "For example, a concept like “Monica Lewinsky” yielded completely different results during and after Bill Clinton’s presidency. During Clinton’s presidency, most of the top 100 results from the major search engines were related to the news item itself and the opinions and buzz it created. After President Clinton left office, most of the top 100 documents returned were (and still are) about the jokes, humour, and folklore the event created within and outside the USA." Point 2 suggests me to look at link bombs before and after certain events or periods (e.g. changes in results for "miserable failure" or "kerry waffles" before and after election day; "who's your daddy" before or after the New York-Boston 2004 seventh game, etc). Now a bit more serious: The DIP According to the paper, a DIP for a concept C is computed by submitting a query describing C to a search engine, which returns a set of n pages, P. Thus P(n, C). The "last modified" value of the top n pages is checked to determine the pages that link to each of those pages. These pages and their last modification date conform a set of dated inlinks profiles (DIP) for the concept C. This computation is carried out over time and a DIP-time curve is obtained. The results are normalized (from 0 to 1) to properly detect large and small changes. Thus and according to the paper, the DIP considers 1. The dates when every page was created and last modified. Every time a page is crawled, the crawler checks its HTTP “last modified” header field (see below). If this information is not available but the engine’s repository detects that the page has changed since the last time it was crawled (for example, by the methods presented in the page’s date of last modification is set to the date of the crawl. In particular, this procedure updates the page’s date of creation when the page is crawled for the first time. 2) The date when a page was detected as deleted. This date is set, for example, when receiving 404 codes for previously seen pages, or when a page cannot be accessed for long periods of time. 3) Dates of creation and deletion of links. In the ideal implementation, the search engine should track the additions and removals of hyperlinks in each page, and tag creation and deletion dates to the links in a similar manner to that described above for the pages. Questions 1. What is your take on the following: " where search engines trace and store temporal data for each of the pages in their repository"? Do you think is possible to detect and expose significant events and trends? 2. How do you think a DIP curve could be gamed? 3. What do you think about DIPs as a homeland security tracking tool? Check Figure 4 where -I should say- "dipping" was applied to the query Ussama/Usama/Ossama/Osama Bin Laden Similar curves can be obtained with c-index values in which terms co-occurrences are measured over time. A huge spike in on-topic co-occurrences could suggest an anomalous activity. In general any time series function that gives significant spikes is trying to tell the observer something. Orion Last edited by orion : 10-22-2004 at 12:46 AM. |
|
#6
|
||||
|
||||
|
Orion,
Once again you take us to the next level in link analysis and help us keep learning new frontiers. Thank you! Quote:
Quote:
Quote:
Now time for bed for me. I look forward to hear from you tomorrow. Saludos! |
|
#7
|
|||
|
|||
|
Quote:
The problem with this sort of link analysis is that it costs in overhead, both enlarging DB sizes and in processing time per page. The question therefore becomes, in my head at least, is the extra time required to check a page's last modified date (when headers are not offerred) a better way to spend that time, or would JavaScript parsing be more effective to most searches? Ditto a better crawler that can parse alternate filetypes, or fill in forms and utilise cookies. Quote:
Similiarly, good old fashioned PR (Public Relations) takes on great imprtance, as a long lasting comment in teh press, archived forever adn a day gets increasingly better. It also makes the highly charged issue of the sandbox a killer. New sites will struggle big time, if this is applied uniformly. IMHO, this sort of anaylsis will work brilliantly for a subset of searches, and be the worst thing ever for another. The trick will be for SE to create semantic filters to work out exactly how much of such factors apply at query time. Different queries for different searches, using different combinations of of factors weighted differently. |
|
#8
|
||||
|
||||
|
The other problem I see with temporal link analysis and DIP in comparison with literature citation is that the time the page was first created and the time it was first crawled could be anywhere from a day to maybe years a part. Unless ALL programmers would need to insert a "creation date" to every page as a standard. However, not all programmers follow every coding standard. Therefore, if one didn't add this "creation date", what would search engines do? Would the search engine's crawler have to consider it's "creation date" the same as its "first index date"? If so, what happens to all those edits and links taken in and out on those pages, would the be a way to pass to the search engines a log of the entire build history? And YES I agree with projectphp, dbase sizes will SUPER size, so what are the potential problems to that?
|
|
#9
|
||||
|
||||
|
Hi, Nacho and projectphp. Thank you for stopping by.
I hope this help. Nacho 1. I'm fascinated and at the same time afraid about a search engine technology that can store all pages with a tracking history back in time. Imagine all that goldmine of data in the hands of marketers, spammers, or the gov.!? True that it could be use for monitoring trends. DIP curves are used mostly for looking back in time for trends and spikes of information. But you are right, Nacho. The technology can be used to predict trends. They provide several examples (e.g., Harry Potter movie release) 2. DIP curves are based on "last modified" stamps. If no stamp is available the stamp of the crawling is used. These are the points gamers can target. Fake the stamps or deceive the crawler. 3. "Dipping Osama". See Figure 4 of the Temporal paper. A lot of research is being conducted in co-words and topic analysis for identifying trends and usage of word patterns. These are used for monitoring sites of high activities (chat rooms, discussion forums, etc). 4. Date/time stamp concerns. See below. Projectphp 1. When I use the expression "Commercial Web" I mean documents/sites driven by commercial interests, not every document or site on the WWW. Excluded are online documents that also are normally found through scientific, academic, and gov IR systems. The problem with most link-based models is that work fine under free from commercial noise documents but fail miserably under the presence of this noise. So, I stand by "mostly" in that context. 2. Overhead. Check Section 3 of the Temporal paper and post #1. To build a DIP curve you only need to collect the top N ranked pages, capture the "last modified" while crawling the pages or use the default crawling date/time stamp, and then conduct the analysis. This is already done by some crawlers and is even less time consuming than crawling all meta tags, links, and document content. You only need to add a new instruction to the current crawler. Analysis and DIPs curves can then be constructed offline. 3. I do agree with the rest of your post. Challenging Questions 1. How do you think this could impact filesharing applications? 2. How could this be use to track topic-focused online communities? Orion Last edited by orion : 10-23-2004 at 12:07 PM. Reason: typos |
|
#10
|
||||
|
||||
|
3. How would this impact allowable content duplication (eg. press releases)?
|
|
#11
|
|||
|
|||
|
Our site has been doing great over the past 3 1/2 years and has continued to get more links each month. We started a new site about 2 years ago, and about 12 months ago that site started doing very well - even though the original site had more links (and of higher quality). Then about 4-5 months ago we saw a competitor start doing very well (their site is about 14 months old). I have been trying to analyze why this (relatively) new site has done so well. I have had a lot of difficulty in defining reasons as to why they have started to rank better than us. We clearly have many more links (and better quality), and we seem to have similar on-the-page optimization.
One idea that popped in my head when analyzing this was the fact that they had a much higher rate of obtaining new links. I quickly abandoned this idea since I had no idea how google would implement this into the algo and why it would help produce relevant results. But this thread brought up the idea again and explains why the rate of links could possibly lead to relevant results. I am not proposing that this is already in affect and it really doesn't matter if our situation is applicable or not (I highly doubt this is the reason anything going on right now). Regardless, I wanted to see if people thought that the rate of gaining/losing links could have any affect on things? The above posts seem to suggest it could be applied at some point. Mike has mentioned the fact that the longer a site is around, the more links it should obtain naturally. So maybe google expects an increase of x links per month for sites that have obtained a certain status. |
|
#12
|
||||
|
||||
|
Quote:
The DLA model incorporates a random walker sticking to a growing, tree-like fractal pattern. The growth rate does play a role. Immediately after the publication of the DLA model -in the 80's and 90's- physicists found that crystal growth and cluster aggregations tend to mimic the effect in which rich growing branches get richer and poor branches die away. Many random processes are governed by this natural "principle". Back then I participated in several conferences on the subject. The rich get richer phenomenon is based on probability averages taken from randomly selected samples. Only because a site is old does not mean that it will get rich. As based on probabilistic averages, there will be cases in which many old sites will not get richer. Orion Last edited by orion : 10-22-2004 at 11:11 PM. |
|
#13
|
||||
|
||||
|
Quote:
Link A >> http://www.domainA.com/D.html Link B >> http://www.domainB.com/D.html Link C >> http://www.domainC.com/D.html This is different as to say that A, B, C point to D; i.e. http://www.domainD.com/D.html Since they construct DIP curves by mapping links to date/time stamps, I'm inclined to think that individual URIs count for each site when someone points to them. Orion PS Sorry for too much editing. Lack of sleep ![]() Last edited by orion : 10-23-2004 at 01:07 PM. Reason: Typos |
|
#14
|
||||
|
||||
|
Orion,
I read the research paper just a few hours ago. I was hoping that you can give me your understanding of section 4.3 "Tracing Concepts Over Time". I seem to get lost each of the two times I read that section. As for the rest of the paper, I think it would really work wonders. Based on the results of the preliminary tests conducted in the study, the results of the "Timely Authorities", to me, seemed much more relevant. Very interesting on how you can watch a pattern on a specific topic of interest fluctuate over time. Also, wouldn't you feel that storing all the linkage data over time would be very costly? I mean, storing the date of all inlinks found, the past inlink dates, the topics and communities they belong to, etc. Of course they mention many of the challenges with using the header to determine the last updated date, but even so.... |
|
#15
|
||||
|
||||
|
The way I understand TLA is as follows.
Let P be the top n pages retrieved by submitting a query associated to a concept. Let Q be the q number of pages linking to the top n pages. Thus, we have the two sets P = {p1,p2,p3....pn} Q = {u1,u2,u3....uq} Each u is a timestamp url linking to pages in P. One constructs a DIP curve consisting of the points (u, t) with u as normalized values and associated to a time range and time interval. Consider a range from 1995 to 1998 and intervals of one year. Let assume that for the query associated to the "hotel" concept we have Range: From Jan 1995 to Dec 1998 in increments of 1 year interval Jan 1995-Dec 1995 10 timestamped links point to pages in P interval Jan 1996-Dec 1996 10 timestamped links point to pages in P interval Jan 1997-Dec 1997 20 timestamped links point to pages in P interval Jan 1998-Dec 1998 60 timestamped links point to pages in P Total = 100 links Thus, 10/100, 10/100, 20/100, 60/100 and SUM = 0.1 + 0.1 + 0.2 + 0.6 = 1.0 The DIP curve is given by the normalized u,t points (0.1, 1995), (0.1, 1996), (0.2, 1997), (0.6, 1998) I don't see an overhead issue here since (a) the data is already public or can be collected as specified by a users and (b) computation is done off-line. The disjoint technique looks to me as a method for resolving important portions of the response curve. We could do the same using time decomposition/delay techniques from non linear dynamics. Adding time as a variable opens the door to the injection of non linear dynamical tools into link models. Few days ago, Dr. Einat Amitay, a co-author of the TLA paper sent me a new work they are working on. This new work is about to be published and shows new temporal trends and developments that changed the concepts first described in their previous paper. I sent back some questions on this new work since honestly there are some aspects I don't understand or are not clear to me --and I don't want to speculate. I'm waiting for feedback. Since this new work is not yet published, it would not be ethical for me to make any comment without their permission or before the publication date (however, we can discuss things already published). This is an outstanding work!!! and I can see many applications of TLA in Web analytics, intelligence, and marketing. Orion Last edited by orion : 10-28-2004 at 10:16 PM. Reason: 1. typos 2. refining, adding punctuation |
|
#16
|
||||
|
||||
|
Orion, thank you for clarifying Timely Authorities.
And I am excited to see Dr. Einat Amitay new work. I was thinking of visiting her this weekend, while in Israel. But she wouldn't want to talk to me. ![]() |
|
#17
|
||||
|
||||
|
This is a recap on Temporal Link Analysis. In the TLA model
1. First, a query about a concept C is submitted to a search engine. 2. Next, the set P consisting of the top n ranked pages is collected. So, P = {p1, p2, p3,.... pn} 3. Now the set Q consisting of the number of urls across the Web that point to pages in P is collected. So, Q = {u1, u2, u3, .... uq} 4. Each url is a timestamped link. This timestamped data is readily available from the http headers, the document itself or from date of the crawls. 5. For a given time range divided in specific time intervals, one constructs a curve consisting of number of timestamped urls vs time. The y-axis (# timestamped urls) is normalized to run from 0 to 1. The resultant curve describes the time evolution of the number of links associated to the queried concept C. This curve does not tell us how relevant the individual urls are. This is important since one can see several sites and forums spreading some speculatives and conspiracy theories (sandbox, BLOOD, TLD vs. TLA, etc), about temporal link analysis and about the age of links. To illustrate, the SOCEngine (SeoSurvey) site in The Sandbox, the March Filter & BLOOD vs. TLD article misquotes this thread by stating "This argument is the exact opposite of the theory described by Dr. Garcia (Orion) at SearchEngineWatch in a thread titled - Temporal Link Analysis - which claims that the most relevant links are those that are new, fresh or on pages that are frequently updated." This quote is unfortunate. Not only such claims have never been made at this thread but are incorrect and cannot be found in the original TLA paper. For the record, the only public statements on temporal link analysis I have made outside this thread are found in a short note I wrote to Rusty (Barry) at the SeoRoundTable site. I’m reproducing that post below. -------------- "Temporal Link Analysis I want to expand on the statement "The basic premise is that the more often AND the more recent those citations are, the more important the journal is." True that the more time passes, the less hard copy citations a paper receives. Unlike hard copy citation, the more time passes the more link citations a Web page receives. This is one of the reasons that make the literature-link citation analogy a fallacy. Of course, there are other reasons that deal directly with the commercial "intention" and "perception" of link citation and reduce the above analogy to a caricature of the reality. Since the inception of PageRank in the Web scene, several seo "experts", sem "discussion" forums, and marketing firms with vested interests used the analogy merely as a point of sales for their products and services. Even some well-known researchers fueled this fallacy. Back in 2002 I exposed this but the usual suspects were quick to react." -------------- End of the quote. The line that reads, "Unlike hard copy citation, the more time passes the more link citations a Web page receives." says it all. The emphasis is on the page the timestamped urls link to, not on the age (date) of the timestamped links themselves. As a first and crude approximation, we can think of TLA as a temporal link popularity-like model. However, is more than this. The idea is to track the time evolution of the number of links pointing to the top ranked pages relevant to a concept. As given by IBM’s current TLA model, this is not concerned with the relevancy of the timed urls respect to the concept that has been queried. Note that the semantic content of the timestamped urls is not taken into consideration. The timestamped urls may not necessarily discuss or have the queried concept C as their main topic. I believe this is an area in which the current TLA could be improved. Still the actual model provides important information. DIP curves could be used to compare between changes in the activity levels in communities discussing related topics. Effectively, we can track in time or monitor the activities of such communities and conduct interesting intel or even seasonal studies. There is another type of analysis in the TLA paper, which consists in examining the number of timestamped urls a given domain or web document receives. This provides a DIP curve for that particular domain or page. Now if we incorporate the weight of timestamped urls into a ranking algorithm, we can now go from authorities to timely authorities. At this SeoChat thread it is claimed that the age of a page in Google affects how the page ranks. At the time of writing, there is no scientific or research evidence of such claims or of claims about something called “temporal link devaluation” (TLD). Regardless of the validity of such claims, these concepts are not what timely authorities are about. There is an upcoming research paper on Temporal Link Analysis from the IBM Research Group. This paper modifies, greatly improves, and provides excellent examples of TLA in the real world. The paper expands on timely authorities, how time could in theory affect a ranking algorithm and how temporal-based weights are assigned. Again in this new work, the emphasis is on the page receiving the timestamped urls, not on the relevancy of these urls with respect to the queried concept or on a suposse relationship between the age of these urls and how they rank. I received copy of this new work several weeks ago. Once officially published we can discuss it. In the meantime, what is left is what is already public domain. Orion Last edited by orion : 11-09-2004 at 11:34 PM. |
|
#18
|
|||
|
|||
|
As the author of that article, I apologize. I made a glaringly generalization rather than specifically noting what you said. Obviously, I have misinterpreted the specifics, if not the general meaning of the ideas behind TLA.
Orion, certainly you grasp that there are two sides of the field from which people who post and read these boards come - the SEO business side, and the search engine engineers, students & academics. I certainly admit to being from the first, less educated group - however, when reading over the discussion here, I cannot help but be struck again by the purpose of TLA, which I interpret to mean: The analysis of the time-relevance of a particular web page to a particular query, based on the links it receives. Many factors are obviously being taken into account in measuring the links - their authority, source, "timestamp", etc. However, the purpose from a search engine's point of view appears to be as I described - to increase the relevancy of the document returned based on its timeliness. In writing the sentence you quoted, I clearly made an error. Perhaps you could offer assistance in mending it. Based on a re-read and some thought, albeit probably less then is warranted, I would say: "This argument confilcts with the theory described by Dr. Garcia (Orion) at SearchEngineWatch in a thread titled - Temporal Link Analysis (TLA). TLA purports to help search engines return more relevant results by adding a time analysis component to the value of a link. However, if speculation about the 'sandbox' factor holds true, it would suggest that TLA is not yet being included in Google's algorithm, or that sites suffering from sandboxing are not benefiting from it."My point in the article was not to suggest that TLA was taking into account the age of sites, but that the idea of 'devaluing recent links' directly conflicts with the idea that new links are more relevant (provided they are from a reputable source). Thanks for pointing out my error and taking the time to comment on it. The last thing I want to do is spread misinformation. Get back to me when you have time - I will amend the article immediately. Regarding the lack of evidence for Google's current preference for older sites, I would agree that it is largely circumstantial. However, I'm not sure what kind of evidence could be amassed to help confirm or deny the hypothesis. I made a quick sampling at http://socengine.com/seo/guide/age-of-sites.html - perhaps someone could suggest ideas for a larger study that would help make a more complete analysis. Last edited by randfish : 11-10-2004 at 03:45 AM. |
|
#19
|
|||
|
|||
|
Quote:
This would also suggest to me that the "Google prefers older sites" concept is directly at odds with TLA. Orion, Mike, Rusty, et al - hopefully you can tell me what I'm missing. |
|
#20
|
||||
|
||||
|
Hi, Randfish. It is an honor to having you at this thread. Please feel at home.
I salute you. About your explanation, that’s fair enough. I know it wasn’t your intention to misinforming members of both forums. Feel free to clarify at the SeoChat forum if you think is necessary. BTW, some SeoChat users are posting good and interesting observations. Let see how I can address some of these. Timestamped Urls Let say we query a search engine for Osama and we inspect the top 30 documents. Let say we find out that 600 urls across the web are linking to these 30 documents. There are two possible treatments. 1. We can sort these 600 urls based on their timestamp data and then group the urls in specific time intervals. Next we normalize the counts by dividing by 600, so counts runs from 0 to 1. Now we plot the url-time curve. 2. We proceed as in “1” but we predefine a time range to be monitored, ignoring urls from the 600 collected urls whose timestamps are not within the range. In any case the resultant curves monitor the linking activity associated to the query “Osama”. Similar url-time curves can be constructed for a given domain. In this case the curves gives a record of the linking activity associated to a given domain. A significant spike in url-time curves “tells” the user that important link activity took place around a particular date or time interval. I can see many applications for TLA, to mention only two. 1. Intelligence: TLA curves could be used to correlated with, for example, historical events in time (e.g., September 11, 2001) or to monitor linkage patterns and trends around a given concept. 2. Marketing: TLA curves could be used to correlate seasonal and fashion trends around a given product or brand. As it stands, I can also see several areas in which the current TLA model could be improved. Two of these are 1. documents retrieved by the query may not be on-topic; it is assumed that the top ranked documents are relevant to the concept C, not necessarily the case, especially in the presence of noise (e.g., bloggers, link bombs, relevancy tricks) 2. timestamped urls may not be on-topic; i.e., the content and relevancy of the timestamped urls with respect to the initial query is not taken into consideration. Although there are other areas that deserve improvements, TLA is a promissory intel model. Google using TLA? I can only refer readers to the public information available. The evidence suggests this is unlikely for three reasons. 1. IBM holds a patent on TLA (published just this Summer). 2. Check the TLA paper, Section 4.4 “From Authorities to Timely Authorities”, Tables 2 and 3. When TLA is incorporated into a ranking model the IBM group found that new, recent, and fresh sites tend to rank higher, not lower as suggested by proponents of Sandbox, TLD, BLOOD and other conspiracy theorists. 3. Models that try to explain spatio-temporal behaviors on the Web are far from being fully explored, developed, and implemented. Welcome to the world of Non Linear Dynamics (Chaos) and Fractals. Orion Last edited by orion : 02-04-2005 at 12:09 AM. Reason: Fixing first line |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|