Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Closed Thread
 
Thread Tools
Old 06-02-2004   #1
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Keywords Co-occurrence and Semantic Connectivity

Hello. I'll introduce myself as Orion. I'm a formal scientist, with special interest in AI applied to IR technology. Let's start this thread with a brief description of keywords semantic connectivity and what it can do for improving success across search engines. My goal is that SEO/SEM R&D departments once for all start using scientific tools rather than mere rumors and 2nd-guessing thoughts disguised as "seo expert tips". Sorry for my 2 cents in advance. I'll like to present the key concepts then anyone interested can start commenting.

According to Fuzzy Set Theory ("Modern Information Retrieval"; Baeza-Yates; Ribeiro-Neto, Addison, 1999), the degree of term co-ocurrence in a database is a measure of semantic connectivity (SM) and can be used to build thesaurus for the database. Some engines use term co-occurence in their query expansion algorithms. Understanding how one can measure term co-occurence could be used to carefully select keywords semantically connected in a given search engine database. As an added benefit, SM makes unnecessary the excesive repetition of keywords (keywords spamming).

Let's us start with the simple case of two keywords (k1 and k2). Later on we can expand on other cases (more than 2 keywords, keywords transposition, entropy relevance, etc).

Let n1 and n2 be the number of search results containing k1 and k2, respectively and n12 is the number of search results containing both terms. (One actually does a search for k1 then for k2 and finally a composite query consisting of k1 and k2). Using geometry arguments and fuzzy sets, it can be demonstrated that there exists an index, termed correlation index, c, such that

c = n12/(n1 + n2 - n12)

Thus c oscillates between 0 and 1. Term correlations increases as c approaches 1. This allows us for in a given search engine or IR database

a. test the best combination of paired keywords from a pool of keywords with the highest semantic connectivity (for that database).
b. build a thesaurus of synonisms targeting that database
c. build a query expansion or find similars library.
d. carefully craft titles and descriptions of web pages

Enough for now. Anyone interested in commenting? Excuse in advance any typo.

Regards

Orion

Last edited by orion : 06-02-2004 at 12:31 PM.
orion is offline  
Old 06-02-2004   #2
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
Quote:
Enough for now. Anyone interested in commenting?
Damn Straight.

Now if I understand this correctly, you’re trying to tell us (in a NASA kind of way) that what we already know and use, I relate as "applied semantics" & "latent semantic indexing", you are trying to make more complicated?

If I read this correctly, what we do by placing synonyms and also structured thesaurus terms within our page copy, then we should rank higher? If this is correct, then any good SEO Copywriter should be performing this already.

I know that latent semantic indexing was more a myth than anything, though the testing off placing thesaurus terms within the text and writing well structured pages, clearly reflected the positives within the rankings to make it believable beyond just a myth.

Am I on the right track here Orion?
Anthony Parsons is offline  
Old 06-02-2004   #3
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by Anthony Parsons
Damn Straight.

Now if I understand this correctly, you’re trying to tell us (in a NASA kind of way) that what we already know and use, I relate as "applied semantics" & "latent semantic indexing", you are trying to make more complicated?

If I read this correctly, what we do by placing synonyms and also structured thesaurus terms within our page copy, then we should rank higher? If this is correct, then any good SEO Copywriter should be performing this already.

I know that latent semantic indexing was more a myth than anything, though the testing off placing thesaurus terms within the text and writing well structured pages, clearly reflected the positives within the rankings to make it believable beyond just a myth.

Am I on the right track here Orion?
Thank you for responding to the post.

I agree with most of your points. Any proper use of thesaurus driven-terms must be pondered with proper copyright style and with what works well for a targeted product or service.

Fuzzy Set theory (not an IR theory) and the c-index merely is a tool used by IR researchers to build thesaurus and query expansion libraries. The central point of the post was about how SEO can use the c-index to properly identify semantically connected terms from a pool of candidate terms for a given database search engine. (c-values can be computed with a simple calculator). The expression "for a given search engine" is very important. A c-index for two keywords is not necessarily the same in Google, Yahoo, MSN, etc. It varies from engine to engine.

Certainly SEOs should stick to anything that works well for their clients. I'm simply trying to propose analytical tools well known to IR scientists, most of which predate the first wave of search engines and go back to the early 70's and 60's. Understanding how IR analytical tools work, does not hurt.

I'll try to keep my posts very simple and basic.

Orion

Last edited by orion : 06-02-2004 at 11:22 AM.
orion is offline  
Old 06-02-2004   #4
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
This is very interesting Orion. I love to learn new things, and this is it today. I look forward to reading your further posts.
Anthony Parsons is offline  
Old 06-02-2004   #5
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
Ok, I am heaps confused, I think, but very interested to learn this one. It sounds similar to what Word Tracker produces, but not quite.

Ok, lets use examples, I love examples.

Search Engine: Google
k1: web design
k2: webdesign

n1: 8,910,000
n2: 10,400,000
n12: 1,300,000

c = n12/(n1 + n2 - n12)

c = 1300000/(8910000 + 10400000 - 1300000)

c = 1300000/18010000

c = 0.0722

Ok, now can you explain those results too me please Orion.
Anthony Parsons is offline  
Old 06-02-2004   #6
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by Anthony Parsons

Ok, lets use examples, I love examples.

Search Engine: Google
k1: web design
k2: webdesign

Ok, now can you explain those results too me please Orion.
I'll love to, Tony.

The beauty of c-indices is that are easy to compute with a calculator or simple javascript (we have a lot of them for mathematical semantic analyses).

First, the c-index for term co-occurrence should be used for synonisms (equivalent terms found in a dictionary). Second, the k1 and k2 terms should be single terms and as found in a dictionary. However, the concept can be extended to cases like the example above, provided one instructs the engine to recognize phrases as a single term (eg., using " ").

General Procedure

1. A pool of let say five candidate terms (candidate k2's) is selected. We want to determine which of the five terms co-occurs the most with a preselected term k1 in a given DBa (e.g., Google). Co-occurence is a kind of evidence of semantic connectivity.

2. Test k1 and each of the individual candidate k2's separately in Google.

3. Compute c-index for each case, always using the same k1 and different k2's.

4. The optimum combination of k1 and k2 is that one with the highest c-index.

5. Emphasize k1 and k2 in the document as required (eg. in titles, descriptions, etc).

Repeat recipe for other search engines. c-indices may change, which indicates that semantic connectivity is different across databases. (Very important for SEOs!)

Note.

We haven't discussed term transpositions, yet.
We haven't discussed cases with more than 2 keywords, yet.
We haven't discusses linguistic characteristics (i.e., c-index for Spanish, French keywords)

Orion

Last edited by orion : 06-02-2004 at 11:51 AM.
orion is offline  
Old 06-02-2004   #7
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
I actually understand that.
Anthony Parsons is offline  
Old 06-02-2004   #8
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Excellent, Tony.

Love to talk more about the concept, including the connection with rankings and time evolution of c-indices (c's change from time to time). For now, I need to get off the forum to attend rutinary work and life issues. T'll next time.

Orion
orion is offline  
Old 06-03-2004   #9
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

About c-Index Calculations

c-Indices are excellent tools for building thesaurus, find-similar-libraries, query expansions and clustering

algorithms. Before presenting some examples of c-index calculations I would like to point out that

a. c-indices runs from 0 to 1; we are comparing small relative quantities. Thus, c-indices can be expressed

as % or ppt (parts per thousands).

b. In a strict sense and due to Recall and Precision arguments the number of results in the c-index

expression (the n's) are number of documents in an IR database containing the corresponding queried term.

This is not necessarily the same as number of results retrieved and shown by the IR system. However, assuming

good retrieval performance and strict adherence to pattern matching of regular expressions, the n's could be

taken for number of results produced by a search engine.

c. when consisting of more than one term, the k1 and k2 terms should be expressed withing quotes (""). In

this way the IR system will interpret k1 and k2 or any k as a single keyword (a phrase keyword).

d. we need to distinguish between c-indices for synonyms and c-indices for query expansions or query

refinement. Example: The thesaurus utility of my MS Word shows auto and automobile as synonyms for the term

car. But for the term calculator, it produces the following "synonyms": data processor, mainframe, mini

computer, PC, multitasking computer, computer, CPU, analytical engine, artificial intelligence, which clearly

correspond to query refinement and clustering considerations rather than to synonyms.

e. intepretation of c-indices are not in "black-and-white". One must consider semantic, language and usage

characteristics. A single term may occur in other languages, may have different meanings in different

languages, countries or demographics. Carefully crafted c-indices can work as semantic discriminants. Wrongly

crafted c-indices can produce messy results. Welcome to the art of analytical semantics.


Having said that, let do some simple calculations.

Case 1: Single terms (synonyms, similar terms)

By querying Google, for car, auto and automobile we obtain

k1=car = 224,000,000
k2=auto = 124,000,000
k12=car auto = 13,000,000
c=0.0388 or 38.8 ppt

k1=car = 224,000,000
k2=automobile = 50,400,000
k12=car automobile = 10,500,000
c=0.0399 or 39.9 ppt

Results: Thus in Google, k1=car and k2=automobile seem to have a greater synonymity association (semantic

connectivity) than k1=car and k2=auto. Note. The large number of results for k2=auto is not surprising; (a)

auto is considered a word in other languages (eg. Spanish) (b) auto is a root for automobile, automatic and

derivative terms.


Case 2: Single terms (query refinement with similar concepts)

k1=car = 224,000,000
k2=insurance = 111,000,000
k12=car insurance = 9,000,000
c = 0.0276 or 27.6 ppt

k1=auto = 124,000,000
k2=insurance = 111,000,000
k12=auto insurance = 8,660,000
c = 0.0383 or 38.3 ppt

Results: In Google, auto insurance has a greater c-index than car insurance, thus having a greater semantic

association (semantic connectivity).

A Final Note.

If we double quote the k12's the c-indices will change, since quoted k12 results are a subset of unquoted k12

results. For example. In the above cases we obtain.

“car insurance” = 4,810,000 with c = 0.0146 or 14.6 ppt
“auto insurance” = 4,460,000 with c = 0.0193 or 19.3 ppt

Yet in Google the results still indicate that "auto insurance" appear to be more connected than "car

insurance".


About Language, Geolocation and Demographic Characteristics

Car in Mexico and Puerto Rico means auto and is also a stem of other terms and derivatives. The popular term

for car is not auto but actually coche, in Mexico and carro, in Puerto Rico. Thus geolocation and demographic

data interpretations are better confirmed with c-indices extracted from regional directories.

For a review of c-indices, read Baeza-Yates and Ribeiro-Neto's "Modern Information Retrieval"; (1999, Addison,

Chapters 2 and 5). c-index analyzers are excellent analytical tools for doing semantic connectivity analysis and for targeting keywords. They are also easy to build. I have written several applications.


Orion

Last edited by orion : 06-03-2004 at 11:27 AM.
orion is offline  
Old 06-03-2004   #10
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
No wonder it takes me so long to do my keyword research. I do some of this now with great details, but not in this exact science. I think this would potentially have the greatest benefit in my first example in the web design field being the most competitive.

Perfecting this area on something so highly competitive could make an extraordinary difference in the way your rankings are achieved. Interesting.

Whats even funnier Orion, is that I understood everything you just said...still. I think that the first few clarifications helped me a lot actually. My understanding is good to continue. I liked those examples you have used. That helped heaps.

It really can get to quite and exact science when you get down to the nitty gritty of it. Interesting to know this and much appreciated.
Anthony Parsons is offline  
Old 06-03-2004   #11
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by Anthony Parsons

Perfecting this area on something so highly competitive could make an extraordinary difference in the way your rankings are achieved. Interesting.

It really can get to quite and exact science when you get down to the nitty gritty of it. Interesting to know this and much appreciated.
First, I would like to thank SearchEngineWatch.com for giving me the opportunity of introducing AI applied to IR to the dedicated members of the SEO/SEM community like you, Tony.

Second, you all excuseme in advance if I make too many typos.

Tony, you are right. SEOs may need to start tapping onto c-indices right away. I'm all against rumors and 2nd-guessing arguments disguised as "seo expert advices". There is no need for 2nd-guessing search engines or using trial-and-error approaches when there are many analytical tools outthere in a kind of black box protected by IR scientists. Many of these tools predate the Internet and search engines and are well known to IR grad students and search engine engineers. SEOs can benefit from them.

Now about c-indices for semantic connectivity research.

The technique is extremely simple and elegant. However, in my years of using c-indices and similar tools for systematically optimizing document relevancy I have learned that the main risk (or drawback?) consists in interpreting them. But once one understand the nitty gritty, is just a matter of

(a) computing values for candidates keywords using a sound critical thinking
(b) clearly identifying what the client(s) want to target and emphasize (i.e. sell or offer as products or services) and then do some number crunching.

Over the weekend I will discuss the effect of keyword transpositions on semantic connectivity for a given DBa (search engine) and why this is important when crafting textual information and doing keyword targeting (eg. titles, descriptions, meta tags). An introduction to combinatorial theory will be given. Once we understand the basics we can elaborate on complex cases (eg. c-indices for more than 2 keywords).


Orion

Last edited by orion : 06-03-2004 at 08:43 PM.
orion is offline  
Old 06-03-2004   #12
Anthony Parsons
Rubbing the shine of the knobs who think they're better than everyone else...
 
Join Date: Jun 2004
Location: Melbourne Australia
Posts: 478
Anthony Parsons will become famous soon enough
Mate, I am looking forward to it. Thanks for the learning curve thus far. I love learning new things.....
Anthony Parsons is offline  
Old 06-04-2004   #13
cariboo
Member
 
Join Date: Jun 2004
Location: Paris, France
Posts: 33
cariboo is on a distinguished road
Thanks a lot for this thread Orion...

I'm trying to build a "smart" search engine for a complex database for one of my sites, and i'm working with semantics to improve relevancy. I'm very interested by what you said because I use a technique similar to the one you described above.

I will have many questions and problems to submit here, because I'm just a beginner in this science.

I agree with you when you say SEO's can really improve their results by using a "scientific" approach like this. Many SEO's work like "craftsmen" and use only their "know how". But optimization has nothing to do with magic, and a scientific approach will not do any harm to their clients...
cariboo is offline  
Old 06-04-2004   #14
strategicrankings
Seolid.com - Ranking Websites First - since 1999
 
Join Date: Jun 2004
Posts: 216
strategicrankings will become famous soon enough
Great subject

Nice to read threads like these. Thanks Orion, i will make it a must to follow this thread.

Riley
strategicrankings is offline  
Old 06-04-2004   #15
rankforsales
Member
 
Join Date: Jun 2004
Location: near Montreal, Quebec, Canada
Posts: 8
rankforsales is on a distinguished road
This is all very interesting, but personally, before I jump into any of this, I need more proof and more evidence that this may improve rankings in any way.

Also, what's Google's view on this? Are there any Google members in this great new forum that may wish to talk a bit more about this, without giving away the 'Google secret recipe'? ;-) (evil grin)

Serge Thibodeau,
Professional SEO,
Rank for $ales

Last edited by Joseph Morin : 07-30-2004 at 01:13 PM. Reason: No signature links allowed
rankforsales is offline  
Old 06-04-2004   #16
Opie1Canopie
Member
 
Join Date: Jun 2004
Posts: 20
Opie1Canopie is on a distinguished road
Orion - this is all very fascinating, although my head spins thinking about how I would actually find enough time to do this level of research for my keyword lists.

You said you are working on a search engine - how about building us a program that does this analysis? I know I'd be willing to fork over some money if this made my keyword picks more solid.

Opie1Canopie
Opie1Canopie is offline  
Old 06-04-2004   #17
NFFC
"One wants to have, you know, a little class." DianeV
 
Join Date: Jun 2004
Posts: 468
NFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to beholdNFFC is a splendid one to behold
counter points

Orion, nice post but as an informal scientist [very informal some would say] I would like to point out some assumptions you have made that IMHO are not entirely accurate.

>My goal is that SEO/SEM R&D departments once for all start using scientific tools rather than mere rumors and 2nd-guessing

Trust me, this is a huge area of activity for SEO's. Many IR/AI "experts" already earn a considerable % of their income from servicing the needs and goals of SEO's. Just because you don't see it posted on a forum don't assume it doesn't exist.

>Understanding how one can measure term co-occurence could be used to carefully select keywords semantically connected in a given search engine database

As rankforsales points out, it only matters if any of the major engines use it. Any evidence that any do apart from crude stemming?

I think the major false assumption you have made is that SEO is a branch of science, I consider it to be an art. For sure certain tools can be used to help focus the artist on his craft but ultimately it comes down to an individual making decisions, very often based on just a "feel". I see SEO's as closely equated to record producers. You can have all the technology in the world but the difference between a hit record and a stinker comes down to an individual sitting in front of the mixing desk and moving those sliders until it is "just right". You can't do that with an oscilloscope, you need an "ear".

>I'll try to keep my posts very simple and basic.

hehe
NFFC is offline  
Old 06-04-2004   #18
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

What we have presented so far.

1. Term co-occurrence is a measure of semantic connectivity and easy to compute.
2. Its geometrical nature can be established through Fuzzy Set Theory.
3. A mathematical treatment is provided in "Modern Information Retrieval" (see book).

What we have explained or pointed out so far.

1. Term co-occurence is a tool for building libraries of synonyms, find similar, query expansion and clustering algorithms.
2. For dictionary-based terms, c-indices are a measure of synonymity.
3. For rutinary queries, co-occurrence is a measure of how topically connected are terms in a given dba.

What we have explained or mentioned so far.

1. How to calculate c-indices for the particular case of c12; i.e., two terms, only.
2. c-indices can be calculated for IR systems as well as for commercial search engines.
3. c-indices often are different from engine to engine.
4. c-indices can be time-dependent.

What we haven't explained, yet.

1. Time-dependent semantic connectivity (query relevance dynamics).
2. Almost everything else.
3. Special cases in which c-indices are not enough, requiring other tools for analysis.


Assumptions made.

1. The n's are treated as number of retrieved results.
2. The database respond to pattern matching of regular expressions.
3. For now, that c-indices are time-invariant.

Assumptions, statements, or claims I haven't made in this forum.

1. That SEO is a science.
2. That I work for a search engine company (Not even close).
3. That I am building a search engine (However I have constructed and use an IR system reserved for research and intelligence and for remote searching databases).
4. That I am the "owner" of the truth or cannot make rational mistakes while presenting my thoughts (indeed I make too many typos).

Clarifications I have made and further refinements

1. An IR system and a commercial search engines are different settings.
2. Many analytical tools used in IR can be used to optimize information contained in commercial online documents (e.g., Web pages)
3. n's are number of results containing queried terms.

Let me expand on point 3.

In a strict IR sense, the n's (above) are number of results containing the queried term(s).

Let

n(i,db) = # retrieved results containing the queried term i in a given database db (Google, Yahoo, MSN, etc)
n(0,db) = # retrieved results not containing the queried term, i.
n(it,db) = # of total results retrieved by querying i in a given db.

n(it, db) = n(i, db) + n(0, db)

Assuming good IR performace of recall and precision (see reference textbook) and strict adherence to pattern matching of regular expressions, n(0, db) should be negligible, thus the main assuption is that

n(it, db) = n(i, db).

As previously mentioned, over the weekend I will elaborate a bit on other cases. I didn't plan to post today, since I scheduled the day to attend ongoing research and meetings; however recent posts convinced me to do so.

Talking about recent posters.

Welcome to this thread, Cariboo, Riley, Serge, Opie, CounterPoint. It is an honor to having you all here and interested in the discussion and eager to share thoughts and ideas.


Caribbo - I agree 100% with your posts, especially the part that says "optimization has nothing to do with magic". Well put.

Riley - "Nice to read threads like these." Thanks and again, welcome Riley. We need more threads dedicated exclusively to scientific issues. SearchEngineWatch.com editors must be praised for such a great decision. More, more, more threads are needed.

Serge - "Are there any Google members in this great new forum ...?" I don't know.

To Opie:

1. "You said you are working on a search engine". I never said that. But now that you mentioned, I already have an IR engine. (see previous lines, above).
2. "how about building us a program that does this analysis?" I already have done that.
3. "I'd be willing to fork over some money if this made my keyword picks more solid." Now we are talking business. I'm listening and eager to team with anyone interested in taking optimization to the next level. Your're in. Any suggestion?

To NFFC

1. "Many IR/AI "experts" already earn a considerable % of their income from servicing the needs and goals of SEO's." Very true and well put, NFFC.
2. "Just because you don't see it posted on a forum don't assume it doesn't exist." I haven't assumed that anywhere in the forum; not even close. Still, I agree with you that some IR/AI "experts" are in the business.
3. "As rankforsales points out, it only matters if any of the major engines use it." I agree with you both. The main thesis of this thread is the presentation of analytical tools well known by IR researchers and how these tools can be used to eliminate or at least minimize trial-and-error and 2nd-guessing approaches. As mentioned to Tony, (see previous posts), SEOs should stick to what works well for their clients and that includes good copyright styles and any proven SEO technique.
4. "Any evidence that any do apart from crude stemming?" Stemming issues will soon be addressed. Let's take some baby steps and concentrate on the basic first. Not everyone is at the same pace in this forum.
5. "I think the major false assumption you have made is that SEO is a branch of science,..." I never said anywhere in this thread that SEO is a branch of science (not even close).
6. "...I consider it to be an art." I agree with you 100%, NFFC. SEO is an art. In fact, optimization, is the art of finding a happy medium or as many technical dictionaries says, "optimization consists in finding the best possible solution to a problem within a feasible region".


My final thoughts.

This section of the forum, I think, has been reserved to heuristic and science search, as determined by the editorial staff of SearchEngineWatch.com and the Editor of this forum.

Accordingly, I feel posts should reflect the spirit and intention of the editors until they decide to change the "rules of engagement" for this threat and the forum in general.

I'm not a writer but a scientist (LOL to myself, like if anybody care!). I will keep elaborating on the main thesis of this thread and try to be as polite as possible and explain things to the best of my knowledge. So, I know that in order to write as articulated as many of you or the editors I'll need editing help. A lot.


Orion

Last edited by orion : 06-04-2004 at 09:25 PM.
orion is offline  
Old 06-05-2004   #19
Dodger
Honorary Member
 
Dodger's Avatar
 
Join Date: Jun 2004
Location: Central US
Posts: 349
Dodger has a spectacular aura aboutDodger has a spectacular aura aboutDodger has a spectacular aura about
Orion, thank you for this thread it is quite interesting and I for one am following it with interest. Any knowledge concerning the inner workings of a search engine is worth listening to, no matter how minor it may play in the grander scheme of things -- but it goes a long way in a better understanding of the beast as a whole with every little bit that gets stored away in the back of your mind.

Search engines are large databases filled with records, and those records are all accessed quicker than the blink of an eye. My understanding of quick access is the use of Index servers, which their sole purpose is the storage of a key value and an index pointing to one record in the database.

Your c-index deals directly with this aspect of database indexing. It is a basic building block. I have noticed some here who are building their own search engine, and this type of information will be invaluable to them -- that I have no doubt.

The name of this forum is Search Engine Watch. That is a broad term, and it is not exclusive to just "achieving great rankings" and I did not take your post to openly state that it will. It is an interesting topic, and I am looking forward to more of the same.
__________________
I am Ronnie
Dodger is offline  
Old 06-05-2004   #20
DanThies
Keyword Research Super Freak
 
Join Date: Jun 2004
Location: Texas, y'all
Posts: 142
DanThies is a name known to allDanThies is a name known to allDanThies is a name known to allDanThies is a name known to allDanThies is a name known to allDanThies is a name known to all
Well, I thought that I could safely ignore these new forums for a while... not so. Thanks Orion for the interesting posts.

We've been working our way down a similar road with respect to relevance. It's sort of assumed that folks like Applied Semantics etc. are already using techniques like this for a number of purposes.

Our main effort to apply this particular branch of IR theory has been attempting to develop a click-through model for organic search listings. The relative frequency with which search terms appear on the web gives us a sense of whether the specific search term is broad or narrow.

One thing that occurs to me, in attempting to use Google (or any other search engine), is that we don't know how much co-occurence is influenced by SEO, and how much is really due to natural patterns.

For example, someone might decide that "car insurance" is a really important search term and construct 10,000 doorway pages in an attempt to influence the search results. Whether or not they succeed in influencing rankings at the top of the search results, they can certainly influence the perceived semantic relationships.

It doesn't make much difference in terms of searching within the database, but you also have a certain amount of skew in language, just based on what sorts of information are being put onto the web vs. what might occur in other forms of communication. Certain topics may lend themselves to larger numbers of documents, or may be more likely to have information published online.
DanThies is offline  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off