PDA

View Full Version : Google's Assessed Features


Anthony Parsons
06-02-2004, 11:00 AM
This has created a lot of interest at my forum and we are trying to expand the list of Google's assess features as far as possible. Whether proved or not, we would like to get as much input into this as we can and see how many features / supposed features where dealing with from good old G.

This list so far:

Revised List

***************************************
# PageRank Algorithm
# Inbound Link Text
# Inbound Link URL
# Outbound Link Text
# Outbound Link URL
# Internal Link Text
# Internal Link URL
# Link Quantity Per Page
# Varied Inbound Link Text
# Link Title Tag
***************************************
# Keywords in <title>
# Keyword Density in <title>
# <title> characters
# Keywords in <description>
# Keyword Density in <description>
#<description> characters
# Keywords in <body>
# Keyword Density in <body>
# Keywords in first paragraph
# Keyword font size
# Keyword proximity
# Keyword phrase order
# Keywords in <h1>
# Keywords in <h2>
# Keywords in <h3>
# Keywords in <strong>
# Keywords in <strong> linked
# Keywords in <em>
# Keywords in <em> linked
# Keyword order
# Keyword proximity
# Keywords and stop words or symbols ordering and placement
# Double Tag Use
***************************************
# Linked Images Keywords in "alt" attribute
# Linked Images Keywords "alt" Density
# Keyword stuffing "alt" attribute
# Image size Kb
# Image size (1x1)
***************************************
# Keywords in Domain Name
# Hyphens in URL
# Underscores in URL
# Keywords in Folder Name
# Keywords in File Name
# Folder / File Access
# https Exclusion
# Domain Name Length
# Domain Extension (.com.au / .edu.uk / etc)
***************************************
# Database query time
# Dynamic URL Query String Length
# Keyword in Query String
# Dynamic URL Session Id's
# Cookies
# Parsing Variables
# Variable Length
# Last-modified headers or HTTP Headers
***************************************
# Applied Semantics (Synonyms)
# Keyword Stemming
# Latent semantic indexing
# Text Position in Relation to Top of Code
# Copywriting
# Duplicate Content
# Content In Full Sentences
# Vertical Indexing
# Keyword Architecture (ie. relevancy to one another)
# Spelling and Intentional Mispelings
# All Caps (Yelling)
***************************************
# Site Navigation Architecture
# Code Validation
# Page Size
# Page Load Time
# Page Freshness
# Page Update Frequency
# Table Depth
# Frames
# CSS
# Scripts
# File Type
# Flash
# Hidden Text
# Tiny Text
# Cloaking
# Redirects
# Legitimate Page (Not Doorpage / Equivalent)
# Must Contain Sufficient Content
# Automated Software Features (Web Position Gold)
# Language Options
# Human Interaction from Google (Penalties/Banned)
# Lines Wraps <80
# Age of Site (established)
# Site Uses Adsense
# Page Extension
# History of 404 Errors
# Dead Links
# Inbound Link IP
***************************************

Total=94

Mikkel deMib Svendsen
06-03-2004, 06:25 AM
Interesting list. However, I think it is important to point out that we will never know for sure what Google will actually be looking at from each site or page and even more important: what they will use to determine ranking. In some searches, for some websites, certain things may be more important than for other searches, sites or vertical markets.

I think the list would be more usefull, as a raw data list, if we leave out subjective conclusions like : # Variable Length (3 Max) and # Keyword stuffing "alt" attribute (10 Word Limit). Such things are not facts but oppinions :)

bwelford
06-03-2004, 08:38 AM
I think most of the 89 are opinion rather than fact. Indeed there are very few precise facts to be derived from Google official publications. Since there are over 100 factors I would guess that this list must have a considerable overlap with whatever the Google list is.

One or more elements that were missing are the following. I'm not sure how best to describe them but they are as follows:
Keyword order
Keyword proximity
Keywords and stop words or symbols ordering and placement.

For example I was recently looking at a website at another forum for 'SomePlace Bed & Breakfast'. This produced a different Google listing from 'SomePlace Bed and Breakfast'. Both & and 'and' were treated as stop items and supposedly did not figure in the algorithm.

Anthony Parsons
06-03-2004, 10:37 AM
Mikkel...excellent point. I was actually kicking myself whether to include those or not. Some are way out left field when you add variables like that. We might just change that now.

Barry, excellent additions. Thanks.

As you stated Barry, we will never actually know. But hell, we can have a good estimated guess at what we all do now, and most of do these things without even thinking off them. I would like to get the list over 100, then maybe sit down and pull it apart and break it into factual & fiction.

Anthony Parsons
06-03-2004, 10:42 AM
Original list updated too this point

*********************************

bwelford
06-03-2004, 11:08 AM
Of course some of these factors may apply in a combinatorial way to get to the "over 100 factors". In other words, keyword factors may apply in Titles, Headings, and body text. They may even be differently handled in the three situations.

The "over 100 factors" may even have started when <description>s were still considered by Google. I assume that currently we can forget <description>s.

Anthony Parsons
06-03-2004, 08:01 PM
Not at all Barry. I include "double tag use" to cover those, but do you think we should list them individually? How would Google assess them IYO.

<title>
<title>
<description>
<description>

Do you think it would assess each individually or under just double tag use? Good question. This is similar to what happened when the list begun, and things where being listed as generic, but in essence they had to be broken down. That was the one that I was not sure about.

What do you think. Leave them under the one, or break them down.

Dodger
06-06-2004, 06:08 PM
How about linked images with dimensions of 1x1 pixels, or the width and height of larger images expressly set to 0x0 (or 1x1). Larger linked images (.gif) that are totally transparent or a solid color. Let's just call them hidden images.

Anthony Parsons
06-06-2004, 10:51 PM
Now that's something I've never heard or thought off before. I am just wondering as the SE's can't read images as such, size yes, image content no.

Damn interesting point.

added
******************************

bwelford
06-07-2004, 06:07 AM
On the question you asked, Anthony Parsons, I think it's fine to handle it as you have. Undoubtedly many aspects may be involved in each of the text elements of a web page. They may even be treated differently in each case. However the list would become unusably long if you added every aspect for every text element. IMHO.

Dodger
06-07-2004, 06:21 AM
Now that's something I've never heard or thought off before. I am just wondering as the SE's can't read images as such, size yes, image content no.

Damn interesting point.


Well since some SE's (Google in particular) are indexing images, there are certain parts of a image that are readable. The header of a graphic image file is easily readable -- this would contain some meta data as well such as image descriptions that I am sure Google can read very easily. They would almost have to to index them properly in their database.

Comparing the actual image dimensions in the header to the width and height parameters of the IMG tag is an easy thing to do. Using a 1x1 spacer gif with the attributes set to 120x80 and used as image link is very easy to pick up on.

Just food for thought.

bwelford
06-07-2004, 06:25 AM
You're right, Dodger. Google does index images in order to provide image search. However I've not heard that the data recorded in the image database necessarily gets into the database of indexed web pages.