Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 07-05-2004   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Term Vector Theory and Keyword Weights

OK, let's start this thread.

This thread is about term vector models and how term weights are computed by search engines and IR systems.

All variants of Salton's Term Vector Model -a keystone in information retrieval studies- demonstrate that the weight of a term in a document is determined with a combination of global (database level) and local (document level) measures.

Most commercial search engines, one way or the other use term vector models. Some take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Some combine this score with other metrics (link metrics, in the case of Google and others).

Thus the notion that keyword weights and keyword density values are equivalent concepts seems misleading. http://www.miislita.com/semantics/c-index-7.html

Let's talk about term vectors and term weights!


Orion

Last edited by orion : 07-06-2004 at 09:18 AM.
orion is offline   Reply With Quote
Old 07-06-2004   #2
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

According to Salton's Vector Model, term weights are assessed with local and global information. In the classic model the weight w(i) of a term i is defined as

w(i) = tf(i)*IDF = tf(i)*log[D/df(i)]

where

tf(i) = term frequency, number of times a term i occurs in a document
IDF = Inverse document frequency = log[D/df(i)]
D = database size or number of documents available
df(i) = number of documents containing term i

Known as the Vector Model (or Salton's Term Weight Model), the equation shows that w(i) increases with tf(i) but decreases as df(i) increases. Thus, terms which appear in too many documents (e.g., stopwords, very frequent terms) receive a low weight, while uncommon terms appear in few documents and receive a high weight. This makes sense since too common terms (e.g., "a", "the", "of", etc) are not very useful for distinguishing a relevant document from a non-relevant one. The two extremes are not recommended in rutinary retrieval work. Terms of acceptable weight are those that are not too common or too uncommon.

This is how term weights are computed. Over the years, several modifications to the Vector Model have been proposed. Can you think of one you may want to discuss? Any suggestion?

Orion

Last edited by orion : 07-06-2004 at 09:11 PM.
orion is offline   Reply With Quote
Old 07-07-2004   #3
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

First, things to avoid and some reference material:

Term vector theory (TVT) and its discussion has been around for soo long and still is a keystone concept in IR and graduate schools. An old thread at http://www.webmasterworld.com/forum34/241.htm tried to discuss TVT but ended discussing all but TVT in action.

Here we will try to understand how the theory works. The goal is to introduce TVT to a wider audience. (Later on we can proceed with advanced TVT variants.)

Any taker? Schools? SEOs R&D Depts?


BASIC REFERENCES

1. Salton, Gerard. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.

2. Baeza-Yates, R., Ribeiro-Neto, B; Modern Information Retrieval, Addison Wesley, 1999.

3. Vector Model Information Retrieval
http://www.hray.com/5264/math.htm

4. Vector Model of Text Retrieval
http://mingo.info-science.uiowa.edu:...0/Vector1.html

5. Term Weighting and Ranking Algorithms
http://www.sims.berkeley.edu/courses...e17/sld001.htm

6. Automatic Hypertext Link Generation based on Similarity Measures between Documents
http://www.fundp.ac.be/~lgoffine/Hyp...tic_links.html


RELATED MATERIAL from WWW9 conference (somewhat still experimental).

1. The Term Vector Database: fast access to indexing terms for Web pages
http://www9.org/w9cdrom/159/159.html

2. Graph structure in the web
http://www9.org/w9cdrom/160/160.html

3. WTMS: A System for Collecting and Analyzing Topic-Specific Web Information
http://www9.org/w9cdrom/293/293.html

4. On Near-Uniform URL Sampling
http://www9.org/w9cdrom/88/88.htmln Near-Uniform URL Sampling

5. What is this Page Known for? Computing Web Page Reputation
http://www9.org/w9cdrom/368/368.htmltions


Orion
orion is offline   Reply With Quote
Old 07-07-2004   #4
Incubator
Member
 
Join Date: Jun 2004
Location: toronto
Posts: 260
Incubator has a spectacular aura aboutIncubator has a spectacular aura aboutIncubator has a spectacular aura about
Smile

Hi Orion, great post again, was wondering if you could break something down a little more in lamens terms for some of us

Landing page, 500 words, keyphrases used 3 max per page, keyphrases being 2-3 words deep across the content of the landing page. How do you recommend the implementation of those words? I am not speaking about layout but how semantic terms will fall accordingly across the remaining content
Cheers and great to see your posts again !!!

Last edited by Incubator : 07-07-2004 at 12:39 PM. Reason: font problem
Incubator is offline   Reply With Quote
Old 07-07-2004   #5
detlev
Member
 
Join Date: Jun 2004
Posts: 48
detlev is on a distinguished road
Term vector and Keyphrase Research

Hello everyone,

For the SEO, Term Vector translates to a competitive term getting less weight on a per-page basis. It was Term Vector that hinted to us to go after terms that are not searched upon very highly (according to WordTracker anyway). The "competition" for those terms is less.

The competitiveness of a term makes it less weighted by Term Vector theory in practice becuase of the increased number of documents on the Web which contain those terms. Therefore, you would have to stuff a given page with a competitive term in order to rank well. Cloaking excels at that, copywriting style suffers.

When you don't cloak, the idea goes: You could more naturally write a page with less competitive terms in mind and attract all sorts of naturally valuable traffic as an alternative approach and build more specialized documents. The Term Vector balance would favor a page with less competitive terms and you could more easily go after those rankings.

An additional benefit is that a naturally written page does not display on the SERP as pure gibberish like many listings of stuffed pages. Hence, it is not necessarily out of sheer laziness or ineptitude but sometimes good sense to tell clients that highly competitive terms are a waste of effort when the best performing pages on a given query are cloaked and stuffed.

Once Google came into the picture and links were more important, competitive terms became somewhat more reachable again. Term Vector, as I recall, was most obviously applied at AltaVista before Google got famous.

My .02

*cheers*
-detlev
detlev is offline   Reply With Quote
Old 07-07-2004   #6
cjtripnewton
Member
 
Join Date: Jun 2004
Location: Chicago
Posts: 5
cjtripnewton is on a distinguished road
Not to argue detlev, but Orian has it correct when he states "Thus, terms which appear in too many documents (e.g., stopwords, very frequent terms) receive a low weight, while uncommon terms appear in few documents and receive a high weight."

Now that has very little to do with the Wordtracker score for a term, which samples the number of times a term is querried, not the number of times the term appears in documents.

In the context of TVT, to determine how "common" a term is, you should simply go and do a quick Google search for the term. If you're comparing two terms, the one which appears in the smallest number of documents is the least common or in Orian's terminology, the most "uncommon." The one which appears in the largest number of documents is the most common, and receives a low weight.

You're right to bring up cloaking versus copywriting. Trying to win for a term which meets some threshold of commonness requires much more effort, and many just resort to cloaking such terms. It's obviously even worse the the term also has a high Wordtracker score, because then we all focus more on winning it, making it even more common. It's a cycle I've seen repeated many times.
cjtripnewton is offline   Reply With Quote
Old 07-07-2004   #7
detlev
Member
 
Join Date: Jun 2004
Posts: 48
detlev is on a distinguished road
WordTracker

Hello everyone,

Well put Newt. I concede that WordTracker has nothing to do with TVT directly and should not have been brought into the equation exactly. Thank you for clarifying my post without destroying what I was trying to say.

Yes, a simple query at the SE in question for the term is how to determine the number of docs which contain the term. I failed to make this point and instead relied on the assumption that one would do that.

Thanks!
-d
detlev is offline   Reply With Quote
Old 07-07-2004   #8
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Welcome to this thread Incubator, detlev and cjtripnewton. Please feel at home.

To Incubator:

I am answering your question at the Keyword Co-Occurrence and Semantic Connectivity thread at http://forums.searchenginewatch.com/...=4731#post4731

I feel other readers there may benefit from the discussion. That thread is moving to the second phase; i.e. terms co-occurrence at the document level. I hope future posts there will clear any reserves certain SEOs may have on the benefits of having co-occurrence, semantic connectivity and terms sequencing strategies in their "tool box".

To cjtripnewton:

"Now that has very little to do with the Wordtracker score for a term, which samples the number of times a term is querried, not the number of times the term appears in documents."--Well put.


To detlev:

1. "Once Google came into the picture and links were more important, competitive terms became somewhat more reachable again. Term Vector, as I recall, was most obviously applied at AltaVista before Google got famous.."

Thanks to Andrei Broder, seos learned abou their Term Vector Database project, just one variant of the many outthere on Salton's model. Actually, The Term Vector Model has been around and applied before the inception of search engines in the scene. Before AltaVista, Brian Pinkerton implemented TVT. Check

http://archive.ncsa.uiuc.edu/SDG/IT9...ebCrawler.html,

In this old paper he states

"The WebCrawler's database is comprised of two separate pieces: a full-text index and a representation of the Web as a graph. The database is stored on disk, and is updated as documents are added. To protect the database from system crashes, updates are made under the scope of transactions that are committed every few hundred documents.

The full-text index is currently based on NEXTSTEP's IndexingKit [NeXT]. The index is inverted to make queries fast: looking up a word produces a list of pointers to documents that contain that word. More complex queries are handled by combining the document lists for several words with conventional set operations. The index uses a vector-space model for handling queries [Salton]."

The original and classic model represented by the crude w=tf*log(D/df) equation works well with long queries but not with short queries. Because of this and other reasons, IR proposed several variants. Most commercial search engines implements variants of TVT.

Other search engines, like Google and others use a combination of link metrics and term vector weights. The trend in the last years has been to integrate link metrics with term vector schemes (with several flavors and variations) Check http://www.e-marketing-news.co.uk/to...stillation.pdf

On Google and Term Vector Theory.

Actually, since its inception in the scene, Google has been using the term vector theory. Check http://www.webpronews.com/ebusiness/...kMarckini.html
In that webpronews article, Fredrick Marckini interviews Google's Craig Silverstein. Craig states,

"The Term Vector Theory

Google's algorithm incorporates the ideas and understanding behind the term vector theory. While the elements of the term vector theory can be quite complex, Craig offered a rather basic definition of how the theory originated. A premise of the term vector theory "says the documents are good if they contain the words in your query and they contain them a lot," explained Silverstein. As search has matured and grown more complex, Google has adapted their algorithm to complement these changes and to account for those who try to cheat and trick the search engines. While the algorithm has adjusted with the times, in essence it still embraces the beliefs behind the term vector theory.

Scoring = PageRank + Term Vector

The term vector factors of the Google ranking algorithm, which will be covered below, concentrate on how relevant a page is to a user's search. This score, combined with the PageRank score that measures the popularity of the page, is how Google derives an overall score or ranking of a Web page. Thus, the Web pages that receive high scores are, in Google's opinion, the Web pages that best meet the user's individual needs."

2. "Yes, a simple query at the SE in question for the term is how to determine the number of docs which contain the term."--Actually, the df(i) in the TVT equation is number of documents in D (all documents available in the system) which contain the term, not exactly number of retrieved results from the database. From this subset, some may not contain the queried term. The assumption that

df(i), # documents containing term i = # documents retrieved by querying term i

introduces an error in the analysis. This is a drawback of keyword-driven searches. A system may return documents semantically relevant, yet without the queried term present in the document. Depending on the degree of recall/precision performance this error may not be critical. To minimize this error one can conduct a regexp search or use EXACT mode.
I hope this post has helped in some way to clear some confusion.

In the next posts, I will explain the drawbacks of the original Salton's Term Vector Model with commercial searches. Then we can get into several solutions proposed by IR scientists.

Orion

Last edited by orion : 07-07-2004 at 10:13 PM.
orion is offline   Reply With Quote
Old 07-07-2004   #9
Incubator
Member
 
Join Date: Jun 2004
Location: toronto
Posts: 260
Incubator has a spectacular aura aboutIncubator has a spectacular aura aboutIncubator has a spectacular aura about
Talking

Thanks Orion, appreciate the reply. I am interested in seeing where this takes us from a "cloaking" point of view, if we remove the human element on this topic then it will be very evident that these theories can be broken down to templates that we can deliver...and i think thats what alot of ppl here maybe thinking but not admitting. I could be wrong... but welcome this as a open debate


cheers all

WC

Last edited by Incubator : 07-07-2004 at 10:42 PM. Reason: update
Incubator is offline   Reply With Quote
Old 07-10-2004   #10
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Material from this and future posts are taken from my new series on Term Vector Theory and Keyword Weights available at http://www.miislita.com/term-vector/term-vector-1.html


Keyword density values are not good term weight estimators. One must use local and global information to assess term weights.

This can be accomplished with the Vector Space Model. Lee, Chuang and Seamon (http://www.cs.ust.hk/faculty/dlee/Pa...ee-sw-rank.pdf) compare six different term weight models Readers interested in term vector source code may want to check the MINERVA implementation (http://www.ifcomputer.co.jp/MINERVA/...r/home_en.html).

According to Jamie Callan, from Carnegie Mellon University, (http://hartford.lti.cs.cmu.edu/class...ctorSpaceB.pdf) historically weights were computed using tf information without IDF values. Since I like history, let's start the discussion by representing weights with tf values only; ie., wi = tfi. Keep in mind that, with some modifications, the following procedure also applies to any term weight scheme.

DEMYSTIFYING TERM VECTORS

The following material is taken from my article available at http://www.miislita.com/term-vector/term-vector-2.html, which includes an in-depth analysis with figures and step-by-step calculations.

Consider an index term consisting of the words "car", "auto" and "insurance". This is my term space. Assume that the database collection consists of 3 documents, only. The term counts (tf) or number of times these terms occur in each document is as follow.

doc 1: auto (3 times), car (1 times), insurance (3 times)
doc 2: auto (1 times), car (2 times), insurance (4 times)
doc 3: auto (2 times), car (3 times), insurance (0 times)

If we query the system for "insurance", then the counts for the query in the term space are 0 for auto, 0 for car and 1 for insurance.

Evidently,

1. In this case the term space consists of three dimensions, auto, car and insurance. The term counts are the coordinates of a point in the term space that correspond to each document. The coordinates of each point are then (3,1,3), (1,2,4) and (2,3,0), respectively.
2. If the origin coordinates are (0, 0, 0), then the displacement of each point from the origin can be represented by a vector.
3. The length or magnitude of this vector can be measured with Pithagoras' Theorem. The coordinates associated to the query are (0, 0, 1).

To calculate the magnitude of each vector, we apply Pythagoras' Theorem. For n dimensions, we can write |Di|=(a2 + b2 + c2...+ n 2) 1/2. In our example, this gives the following magnitudes

|D1|=4.3589
|D2|=4.5826
|D3|=3.6056
|Q| = 1

CALCULATING THE DOT PRODUCTS, COSINES AND RANKING THE RESULTS

To compute each cosine value we need to know the scalar or dot product of the query vector and document vectors. It can be computed from coordinate values. For document 1 we have

Dot Product, QD1 = 0*3 + 0*1 + 1*3 = 3

Similar calculations gives for documents 2 and 3, QD2 = 4 and QD3 = 0.
Now we calculate the corresponding cosine values. Since the dot product is defined as the product of the magnitudes of the vectors times the cosine of the angle between them (e.g., for doc 1 we have QD1 = |Q|*|D1|*cosine), solving for the cosine we obtain in each case

cosine doc 1 = 0.6882
cosine doc 2 = 0.8729
cosine doc 3 = 0

Finally we sort and rank the documents in descending order according to the cosine values

Rank 1: Doc 2=0.8729
Rank 2: Doc 1=0.6882
Rank 3: Doc 3=0

As we can see, document 2 is the most relevant to the query "insurance". Document 1 is less relevant. Document 3 is completely irrelevant. The closer a cosine is to 1, the more relevant a document should be. If the cosine is zero, then the documents and query are orthogonal in the term space. In English this means that the documents and the query are not related. This is the case of Document 3;at least with our term counts vector model. True that we could have arrived to this conclusion by just looking at the term counts table above. It so happen that with this term vector scheme we have proved that the cosine between document vectors and query vecotrs are a valid similarity measure.

At this point, one may think. "Wait a second. I can divide tf by the total number of words, calculate a keyword density value and arrive to a similar conclusion". Not so fast. This is a term count model, historically one of the first variants of the vector model. Most commercial search engines do not use this model and with good reasons. The system can be deceived by just repeating over and over a given term (keyword spamming). This was the case of the first poorly written search engines and commercial IR systems. To assess the weight of a term, these days we use local, global and web graph information, not mere term counts or "keyword density" values.

We can do better by multiplying tf values times IDF values. Thus, in the above calculations we just need to replace wi=tfi with wi=tfi*IDF values. The term vector calculations remain essentially intact.

In the next articles, I will explain how this can be done and what we gain from such modifications. Until then, please feel free to comment this post.

Orion

Last edited by orion : 07-10-2004 at 11:30 PM.
orion is offline   Reply With Quote
Old 07-18-2004   #11
thememaster
Member
 
Join Date: Jun 2004
Location: USA
Posts: 7
thememaster is on a distinguished road
Thumbs up

Hello Orion,

I just wanted to chime in and say I think you're doing a great job in this thread explaining TVT. I've been working in this area for quite a while now and much of these principles are part of what's behind my own software. You can definitely put me on the list of people interested in the more advanced discussion of TVT variants when/if you decide to get into that later.

- Mike
thememaster is offline   Reply With Quote
Old 07-20-2004   #12
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Welcome to this thread Theme Master. It's an honor to having you here.

I appreciate a lot your kind words. I'll be presenting soon normalized term vector models. Then we can move to the "meat". As you and others already probably know, term vector theory is not that complicated. It just happen that many IR folks like to mask the key concepts with unnecessary nomenclature. While this strategy locks out outsiders, it makes IR concepts look unnecessarily complex. I am of the IR school that believes in simplicity. Perhaps fellow IRs are too protective of "secrets".

I visited your site. Awesome. Feel free to contact me by regular email and we can talk a bit more.


Orion

Last edited by orion : 07-20-2004 at 10:34 PM.
orion is offline   Reply With Quote
Old 07-26-2004   #13
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Unlike the term count model, Salton's Vector Space Model incorporates local and global information

Eq 1: Term Weight = tf*IDF = tf*log(D/df)

where

tf = term frequency (term counts) or number of times a term i occurs in a document j.
df = document frequency or number of documents containing term i.
D = number of documents in a database.
IDF is the inverse document frequency defined as log(D/df).


EXAMPLE

Consider the following example, courtesy of Professor David Grossman and Ophir Frieder, from the Illinois Institute of Technology. Suppose we query an IR system for the query "gold silver truck". The database collection consists of three documents (D = 3), with the following content

D1: "Shipment of gold damaged in a fire"
D2: "Delivery of silver arrived in a silver truck"
D3: "Shipment of gold arrived in a truck"

So, the query is Q = "gold silver truck". For step-by-step calculations, graphics, and detailed explanations see http://www.miislita.com/term-vector/term-vector-3.html


PROCEDURE

This is what we normally do:

1. we construct an index of terms from the documents and determine the term counts tf for the query and each document.
2. we compute the document frequency df for each document and each IDF value. In this case D=3, so IDF = log(3/df).
3. next, we take the tf*IDF products and compute term weights.
4. Now we treat weights as coordinates in the vector space, effectively representing documents and the query as vectors. This step is the main difference between this model and the model previously discussed. That is, we are including global information in the vector coordinates.

Steps 1 - 4 are pretty straightforward.

To find out which document vector is closer to the query vector, we resource to the similarity analysis introduced before (See previous posts)

First for each document and query, we compute all vector lengths;

|D1| = 0.7192
|D2| = 1.0955
|D3| = 0.3522
|Q| = 0.5382

Next, we compute all dot products.

QD1 = 0.0310
QD2 = 0.4862
QD3 = 0.0620

Now we calculate the similarity values

Sim(Q,D1) = Cosine D1 = QD1/(|Q|*|D1|) = 0.0801
Sim(Q,D2) = Cosine D2 = QD2/(|Q|*|D2|) = 0.8246
Sim(Q,D3) = Cosine D3 = QD3/(|Q|*|D3|) = 0.3271

Finally we sort and rank the documents in descending order according to the similarity values

Rank 1: Doc 2 = 0.8246
Rank 2: Doc 3 = 0.3271
Rank 3: Doc 1 = 0.0801


OBSERVATIONS

This example illustrates several facts. Very frequent terms such as "a", "in", and "of" tend to receive a low weight -a value of zero in this case. Thus, the model correctly predicts that very common terms, occurring in many documents in a collection are not good discriminators of relevancy. Note that this reasoning is based on global information; ie., the IDF term. Precisely, this is why this model is better than the term count model discussed in Part 2.


LIMITATIONS

As a basic model, this term vector scheme has several limitations. First, it is very calculation intensive. From the computational standpoint it is very slow, requiring a lot of processing time. Second, each time we add a new term into the term space we need to recalculate all vectors. Computing the length of the query vector requires access to every document term, not just the terms specified in the query.


Other inconvenients include

1. Long Documents: Very long documents make similarity measures difficult (vectors with small dot products and high dimensionality)
2. False negative matches: documents with similar content but different vocabularies may result in a poor inner product. This is a limitation of keyword-driven IR systems.
3. False positive matches: Improper wording, prefix/suffix removal or parsing can results in spurious hits (falling, fall + ing; therapist, the + rapist, the + rap + ist; Marching, March + ing; GARCIA, GAR + CIA). This is just a pre-processing limitation, not exactly a limitation of the vector model.
4. Semantic content: Systems for handling semantic content may need to use special tags (containers)


MODEL IMPROVEMENTS

We can improve the model by

1. getting a set of keywords that are representative of each document.
2. eliminating all stopwords and very common terms ("a", "in", "of", etc).
3. stemming terms to their roots.
4. limiting the vector space to nouns and few descriptive adjectives and verbs.
5. using small signature files or not too huge inverted files.
6. using theme mapping techniques.
7. computing subvectors (passage vectors) in long documents
8. not retrieving documents below a defined cosine threshold

The model also can be improved by normalizing term and query frequencies rather than using raw frequencies. Some vector schemes apply other modifications to the IDF term.

This is where we are heading to.

Orion

Last edited by orion : 07-26-2004 at 08:37 PM.
orion is offline   Reply With Quote
Old 07-26-2004   #14
NFFC
"One wants to have, you know, a little class." DianeV
 
Join Date: Jun 2004
Posts: 468
NFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to behold
>It just happen that many IR folks like to mask the key concepts with unnecessary nomenclature.

Well said [I think].
NFFC is offline   Reply With Quote
Old 07-26-2004   #15
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Orion, I would just like to thank you for this thread and the keyword co-occurrence thread. I personally have learnt a lot from your posts.

Thank you.

I will be chiming in later with questions but before I do so, I want to read your posts 3 or 4 times over to make sure I have a clear understanding.
rustybrick is offline   Reply With Quote
Old 07-26-2004   #16
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Thanks, RustyBrick for such kind words. I just hope others can see through my typos and grammar horrors.

The end goal and main thesis of all my posts is to make seos/sems less prone to second-guessing and trial-and-error approaches and more aware of the Scientific Method. The necessary analytical tools are outthere. It just a matter of finding them. The more seo/sem specialists know about them, the better, I think.

Orion
orion is offline   Reply With Quote
Old 07-28-2004   #17
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

I forget to mention that Dr. David Grossman and Dr. Ophir Frieder, professors cited above, and who kindly gave me permission to use their term vector example are the authors of the authority book

"Information Retrieval: Algorithms and Heuristics" - Kluwer International Series in Engineering and Computer Science, 461.

Originally published in 1997, this material is available in a new edition this year. I though dedicated search specialists like Theme Master, Rustybrick and others may be interested in knowing the datum.

This is a must-read literature for graduate students, search engineers and search engine marketers. The book focuses on the real thing behind IR systems and search algorithms. Perhaps is time for the SEO/SEM industry to pay less attention to seo speculations disguised as "facts" and become more familiarized with the scientific facts behind search technologies. Please take this as my kind two cents.

Orion

Last edited by orion : 07-28-2004 at 11:50 AM.
orion is offline   Reply With Quote
Old 07-28-2004   #18
thememaster
Member
 
Join Date: Jun 2004
Location: USA
Posts: 7
thememaster is on a distinguished road
Thanks for the resource Orion. I'll definitely be getting a copy of that.
thememaster is offline   Reply With Quote
Old 07-28-2004   #19
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Orion, do you know where I can get a copy? Amazon seems to have the old edition but they are out of stock. Is there an ISBN number for the new edition? All the sites I pulled up under that title were selling the old edition.

Thanks.
rustybrick is offline   Reply With Quote
Old 07-29-2004   #20
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Hello Theme Master and Rustybrick

Dr. Grossman mentioned to me the new book will be shipped to press by late August. He asked me if I could review a draft and I accepted. I have reviewed scientific publications before and is a cutthroat task. So, I may not be able to say more, except that wait until the book comes out after August or so via Amazon. As soon as it comes out I will let you know, most definitely.

Orion
orion is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off