|
#1
|
||||
|
||||
|
Inverted Word Barrels
In Mike Grehan and Christine Churchil's latest newsletter they cover some facinating things. Here, an amazing article by Dr. E. Garcia (Orion) about "The Keyword Density of Non-Sense", which in the words of Mike Grehan:
Quote:
Last edited by Nacho : 03-30-2005 at 02:14 PM. Reason: fixed link |
|
#2
|
|||
|
|||
|
The article itself is terrific, but it made me very sad. I've been working for a while on a tool to measure term weight and this article has convinced me that only an extraordinary budget and very talented programmer could build such a thing
![]() Unless... Anyone here knows of a good page sementation analysis code for PHP and one that can perform automatic stemming and comparison for the English language... Very depressed... Now that we know this, will we ever be able to accurately measure term strength on a page? BTW - Thanks Orion, it's better to know you're wrong than to wallow in your ignorance - I just wish I would have known 3 months ago |
|
#3
|
||||
|
||||
|
Time to move on . . . reality sometimes hurts.
|
|
#4
|
||||
|
||||
|
Quote:
Linearization is a powerful thing we all should be doing before even trying to optimize a document. Use linearization as part of a GAP analysis before (and after) optimizing a site. It reveals a lot of things we are presenting to search engines and that we may not be aware of. I believe the next step would be educating and training the industry on basic IR practices. SEOs would need to evolve, adapt and implement the real things, such as linearization, lexicographic analysis and pattern recognition. If we are not able (or willing) to evolve and improve, then we can be phased out, I think. Orion |
|
#5
|
|||
|
|||
|
Orion -
You know me. I want to do these things and provide the ability for others to do them, too. It's just the development time and expense that make it difficult. The C-Index tool for example, and now the Term Weight analysis system - these are not easy things to make, and my company, as you know, is quite small. Is there any offerings you know of on the web that can be used by knowledgable SEOs to measure and quantify these things? Or is the development of such procedures up to us? If so, I would guess that my personal creation of these tools will be many months out. I don't suppose there are any shortcuts ![]() |
|
#6
|
|||
|
|||
|
I've just read the article, but I can't say that I followed all the maths. However, the thing that struck me most is to make better use of CSS. I use it to position the body text right under the <body> tag, and the other elements anywhere below that. The idea of writing linearized content first (or linearizing existing content first), including highlighted words (headings), and then add the layout code without altering the text's relative poritioning looks to be a lot better.
That's something that can be done immediately, without any programs to calculate weights. Good stuff! |
|
#7
|
||||
|
||||
|
Quote:
This procedure also do wonders: Step One: Linearize, tokenize, filtrate and stem client's code before doing any SEO and as part of your standard GAP analysis. We already have this embedded in our on-topic analyzer (stemming part was recently added for English only pages). Step Two: Do optimization Step Three: Repeat Step One again and if necessary fix any survival issue. Orion |
|
#8
|
|||
|
|||
|
Quote:
![]() |
|
#9
|
|||
|
|||
|
Quote:
![]() Last edited by Daria_Goetsch : 03-31-2005 at 05:47 PM. Reason: Added quote. |
|
#10
|
||||
|
||||
|
True. Linearization, tokenization, filtration and stemming can be done by anyone by hand. However with large sites (1,000+ pages as the one we deal with), is better done automatically as we do.
I have been approached by some to license my software. Time will tell if we agree to terms. The good thing about all this is that seos/sems as industry are evolving for the better. Orion Last edited by orion : 03-31-2005 at 10:46 PM. |
|
#11
|
|||
|
|||
|
Well, before you license the software, how about posting, or linking to, a suitable list of stopwords
![]() And while you'r doing that, we could certainly make use of the stemming database/dictionary ![]() |
|
#12
|
|||
|
|||
|
While reading this thread it strikes me that someone out must have developed a decent linearization tool at least. Anyone know of any..?
|
|
#13
|
|||
|
|||
|
There are some issues with linearization. I think it's obvious that alt text maintains its position in the text, but do you take account of text in title attributes, Orion? Has anyone definitively discovered whether or not title atrributes are taken acount of in any engines, and, if so, title attributes for which tags?
Linearization isn't 100% straight forward, although the difference that such things as title attributes would make would be very small - I would think. |
|
#14
|
|||
|
|||
|
Even though i consider myself newbio on the concepts of SEM and SEO, i must say that the excellent (as always) analysis from Dr. Garcia have tied together some thoughts that had occured to me while reading some articles on the subjects mentioned.
Maybe this happens because i am still in a "learning mode" and not that much "involved" on the practices allready used by SEOs or because of my background in mathematics. Sine i am trying to start the first SEM agency here in Greece (and kickstart the concepts of SEM and SEO since we are far behind) my main concern is the way and the degree the procedures mentioned, along with other methods, work or doesn't work for documents written in a language other than english (in my case Greek) I have begun making some initial tests and i will share them with you as soon as possible |
|
#15
|
||||
|
||||
|
Quote:
Quote:
Orion Last edited by orion : 04-01-2005 at 01:48 PM. Reason: typos |
|
#16
|
||||
|
||||
|
Quote:
I'll be working on a Spanish stemmer over the weekend. If I succeed then I could offer a nice bilingual package. Orion |
|
#17
|
|||
|
|||
|
Do you know where to find a good/useful list of stopwords, orion?
I've got the HTML code dealt with (and I'm taking the Title text out because that's stored in a different index), and I could do with a good stopword list. <added> This is for use on my own use on my desktop - it's not being written for the web - so no competition ![]() </added> |
|
#18
|
|||
|
|||
|
Cancel that stopwords list request - I found one
![]() |
|
#19
|
|||
|
|||
|
Quote:
![]() A question, Orion: In the passage about what you call "burning the trees", am I right in understanding that expression to mean simply that the result of linearization produces no distinct patterns of topics/keywords/phrases? I.e. there are no identifiable chunks of text that are about specific topics/keywords/phrases, and it's all just a mix up of several topics/keywords/phrases. Is that what is wrong with fig.1's text? Last edited by PhilC : 04-02-2005 at 02:49 PM. |
|
#20
|
||||
|
||||
|
Google search for porter stemmer
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|