Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engine Marketing Strategies > Search Engine Optimization
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 03-30-2005   #1
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Inverted Word Barrels

In Mike Grehan and Christine Churchil's latest newsletter they cover some facinating things. Here, an amazing article by Dr. E. Garcia (Orion) about "The Keyword Density of Non-Sense", which in the words of Mike Grehan:
Quote:
This issue of e-marketing-news brings an exclusive article from my dear friend, Dr. Edel Garcia, which blows the myth of keyword density analysis into space. And it will likely change the way you do optimisation, forever.
Want to dicuss the article?

Last edited by Nacho : 03-30-2005 at 03:14 PM. Reason: fixed link
Nacho is offline   Reply With Quote
Old 03-30-2005   #2
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
The article itself is terrific, but it made me very sad. I've been working for a while on a tool to measure term weight and this article has convinced me that only an extraordinary budget and very talented programmer could build such a thing

Unless... Anyone here knows of a good page sementation analysis code for PHP and one that can perform automatic stemming and comparison for the English language...

Very depressed... Now that we know this, will we ever be able to accurately measure term strength on a page?

BTW - Thanks Orion, it's better to know you're wrong than to wallow in your ignorance - I just wish I would have known 3 months ago
randfish is offline   Reply With Quote
Old 03-30-2005   #3
Nacho
 
Nacho's Avatar
 
Join Date: Jun 2004
Location: La Jolla, CA
Posts: 1,382
Nacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to beholdNacho is a splendid one to behold
Time to move on . . . reality sometimes hurts.
Nacho is offline   Reply With Quote
Old 03-31-2005   #4
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by randfish
Very depressed... Now that we know this, will we ever be able to accurately measure term strength on a page?
Hi, Rand.


Linearization is a powerful thing we all should be doing before even trying to optimize a document. Use linearization as part of a GAP analysis before (and after) optimizing a site. It reveals a lot of things we are presenting to search engines and that we may not be aware of.

I believe the next step would be educating and training the industry on basic IR practices. SEOs would need to evolve, adapt and implement the real things, such as linearization, lexicographic analysis and pattern recognition.

If we are not able (or willing) to evolve and improve, then we can be phased out, I think.

Orion
orion is offline   Reply With Quote
Old 03-31-2005   #5
randfish
Member
 
Join Date: Sep 2004
Location: Seattle, WA
Posts: 436
randfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to allrandfish is a name known to all
Orion -

You know me. I want to do these things and provide the ability for others to do them, too. It's just the development time and expense that make it difficult. The C-Index tool for example, and now the Term Weight analysis system - these are not easy things to make, and my company, as you know, is quite small.

Is there any offerings you know of on the web that can be used by knowledgable SEOs to measure and quantify these things? Or is the development of such procedures up to us? If so, I would guess that my personal creation of these tools will be many months out. I don't suppose there are any shortcuts
randfish is offline   Reply With Quote
Old 03-31-2005   #6
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
I've just read the article, but I can't say that I followed all the maths. However, the thing that struck me most is to make better use of CSS. I use it to position the body text right under the <body> tag, and the other elements anywhere below that. The idea of writing linearized content first (or linearizing existing content first), including highlighted words (headings), and then add the layout code without altering the text's relative poritioning looks to be a lot better.

That's something that can be done immediately, without any programs to calculate weights.

Good stuff!
PhilC is offline   Reply With Quote
Old 03-31-2005   #7
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

Quote:
Originally Posted by PhilC
The idea of writing linearized content first (or linearizing existing content first), including highlighted words (headings), and then add the layout code without altering the text's relative poritioning looks to be a lot better.
Well grasped and very much right on target, Phil! Congrats.

This procedure also do wonders:

Step One: Linearize, tokenize, filtrate and stem client's code before doing any SEO and as part of your standard GAP analysis. We already have this embedded in our on-topic analyzer (stemming part was recently added for English only pages).

Step Two: Do optimization

Step Three: Repeat Step One again and if necessary fix any survival issue.


Orion
orion is offline   Reply With Quote
Old 03-31-2005   #8
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Quote:
Step One: Linearize, tokenize, filtrate and stem client's code before doing any SEO
Linearizing is easy enough and can be done by anyone right now, but the other 3 stages really need software, or a lot of time. I write my own programs but most people can't do that. I wonder how long it will be before online services start springing up for this
PhilC is offline   Reply With Quote
Old 03-31-2005   #9
Daria_Goetsch
SEOExplore.com - SEO Research Directory
 
Join Date: Jun 2004
Location: Eureka, California
Posts: 226
Daria_Goetsch has a spectacular aura aboutDaria_Goetsch has a spectacular aura about
Quote:
Originally Posted by philc
I wonder how long it will be before online services start springing up for this
Phil, as I was reading this thread I had exactly the same thought you just mentioned. That would be quite an addition to current SEM tools.

Last edited by Daria_Goetsch : 03-31-2005 at 06:47 PM. Reason: Added quote.
Daria_Goetsch is offline   Reply With Quote
Old 03-31-2005   #10
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation

True. Linearization, tokenization, filtration and stemming can be done by anyone by hand. However with large sites (1,000+ pages as the one we deal with), is better done automatically as we do.

I have been approached by some to license my software. Time will tell if we agree to terms.

The good thing about all this is that seos/sems as industry are evolving for the better.

Orion

Last edited by orion : 03-31-2005 at 11:46 PM.
orion is offline   Reply With Quote
Old 03-31-2005   #11
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Well, before you license the software, how about posting, or linking to, a suitable list of stopwords

And while you'r doing that, we could certainly make use of the stemming database/dictionary
PhilC is offline   Reply With Quote
Old 03-31-2005   #12
Frank Kilkelly
Member
 
Join Date: Mar 2005
Location: Malmφ, Sweden
Posts: 19
Frank Kilkelly is on a distinguished road
While reading this thread it strikes me that someone out must have developed a decent linearization tool at least. Anyone know of any..?
Frank Kilkelly is offline   Reply With Quote
Old 03-31-2005   #13
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
There are some issues with linearization. I think it's obvious that alt text maintains its position in the text, but do you take account of text in title attributes, Orion? Has anyone definitively discovered whether or not title atrributes are taken acount of in any engines, and, if so, title attributes for which tags?

Linearization isn't 100% straight forward, although the difference that such things as title attributes would make would be very small - I would think.
PhilC is offline   Reply With Quote
Old 03-31-2005   #14
nikoska
Search Engine Marketing in Greece
 
Join Date: Jun 2004
Location: Athens / Greece
Posts: 5
nikoska is on a distinguished road
Even though i consider myself newbio on the concepts of SEM and SEO, i must say that the excellent (as always) analysis from Dr. Garcia have tied together some thoughts that had occured to me while reading some articles on the subjects mentioned.

Maybe this happens because i am still in a "learning mode" and not that much "involved" on the practices allready used by SEOs or because of my background in mathematics.

Sine i am trying to start the first SEM agency here in Greece (and kickstart the concepts of SEM and SEO since we are far behind) my main concern is the way and the degree the procedures mentioned, along with other methods, work or doesn't work for documents written in a language other than english (in my case Greek)

I have begun making some initial tests and i will share them with you as soon as possible
nikoska is offline   Reply With Quote
Old 03-31-2005   #15
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Linearization

Quote:
Originally Posted by frank
While reading this thread it strikes me that someone out must have developed a decent linearization tool at least. Anyone know of any..?
Frank, I already developed the software that does linearization, tokenization, filtration and stemming. It is embedded in my on-topic analyzer and c-index machine.


Quote:
Originally Posted by PhilC
There are some issues with linearization. I think it's obvious that alt text maintains its position in the text, but do you take account of text in title attributes, Orion? Has anyone definitively discovered whether or not title atrributes are taken acount of in any engines, and, if so, title attributes for which tags?

Linearization isn't 100% straight forward, although the difference that such things as title attributes would make would be very small - I would think.
Phil, based on my experience doing linearization, I don't see any issue. It is pretty 100% straightforward. With regard to title attributes, all HTML attributes are ignored during linearization, except the ALT in IMG tags and the summary attribute of the caption tag that is used for tables.


Orion

Last edited by orion : 04-01-2005 at 02:48 PM. Reason: typos
orion is offline   Reply With Quote
Old 04-02-2005   #16
orion
 
orion's Avatar
 
Join Date: Jun 2004
Posts: 1,044
orion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to beholdorion is a splendid one to behold
Exclamation Spanish stemming.

Quote:
Originally Posted by orion
...based on my experience doing linearization, I don't see any issue.
I'll take that back. We do have some issues when we do stemming for Spanish text. Porter Algo does not work since is for English text.

I'll be working on a Spanish stemmer over the weekend. If I succeed then I could offer a nice bilingual package.


Orion
orion is offline   Reply With Quote
Old 04-02-2005   #17
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Do you know where to find a good/useful list of stopwords, orion?

I've got the HTML code dealt with (and I'm taking the Title text out because that's stored in a different index), and I could do with a good stopword list.

<added>

This is for use on my own use on my desktop - it's not being written for the web - so no competition

</added>
PhilC is offline   Reply With Quote
Old 04-02-2005   #18
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Cancel that stopwords list request - I found one
PhilC is offline   Reply With Quote
Old 04-02-2005   #19
PhilC
Member
 
Join Date: Oct 2004
Location: UK
Posts: 1,657
PhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud ofPhilC has much to be proud of
Quote:
Originally Posted by orion
.... Porter Algo does not work since is for English text.
That was very handy. I'd never heard of such an algo, but I found a VB version and it works a treat. I'd been thinking that a database would be needed for stemming


A question, Orion:

In the passage about what you call "burning the trees", am I right in understanding that expression to mean simply that the result of linearization produces no distinct patterns of topics/keywords/phrases? I.e. there are no identifiable chunks of text that are about specific topics/keywords/phrases, and it's all just a mix up of several topics/keywords/phrases. Is that what is wrong with fig.1's text?

Last edited by PhilC : 04-02-2005 at 03:49 PM.
PhilC is offline   Reply With Quote
Old 04-02-2005   #20
Marcia
 
Marcia's Avatar
 
Join Date: Jun 2004
Location: Los Angeles, CA
Posts: 5,476
Marcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond reputeMarcia has a reputation beyond repute
Google search for porter stemmer
Marcia is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off