Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > General Search Issues > Search Technology & Relevancy
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 01-12-2005   #1
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
Question Does validated HTML count?

I have noticed some sites that are "not valid HTML 4.0 Transitional," using W3C Validator. Based on this definition of its purpose, also from W3C:

In addition to the text, multimedia, and hyperlink features of the previous versions of HTML, HTML 4.0 supports more multimedia options, scripting languages, style sheets, better printing facilities, and documents that are more accessible to users with disabilities. HTML 4.0 also takes great strides towards the internationalization of documents, with the goal of making the Web truly World Wide.

and the opinion of the esteemed developers in here, can Google or any other search engine be using this as a factor in rankings?

Bonus question: Why does MS use tags within its HTML that can only be read by IE? (used as an example to argue that even MS itself doesn't support HTML Standards)
Chris Boggs is offline   Reply With Quote
Old 01-12-2005   #2
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
I don't believe that Google at least cares whether a page is XHTML 1.0 Strict vs HTML 4.0.

Also, try running www.google.com through the W3C validator - that should tell you how important XHTML 1.0 is.
I, Brian is offline   Reply With Quote
Old 01-12-2005   #3
St0n3y
The man who thinks he knows something does not yet know as he ought to know.
 
Join Date: Jun 2004
Location: Here. Right HERE.
Posts: 621
St0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to all
I think having valid html plays a big role in how search engines view and assign importance to your site. Sure you can get rankings with poor code, but I think having properly valitdated code can helps significantly.

Which standard of validation helps the most, I don't know.
St0n3y is offline   Reply With Quote
Old 01-13-2005   #4
dannysullivan
Editor, SearchEngineLand.com (Info, Great Columns & Daily Recap Of Search News!)
 
Join Date: May 2004
Location: Search Engine Land
Posts: 2,085
dannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud ofdannysullivan has much to be proud of
Tim Mayer from Yahoo in December on an SES panel specifically said Yahoo didn't care if a page was validated. It wasn't going to give you any type of ranking boost.

Pretty sure that will be the case with Google, as well. In fact, I've many times seen pages rank well that had bad HTML or even broken HTML. The search engines seem to make the best of what they can.

Having said this, I still think it's good to have valid HTML as much as you can. There are other good reasons for doing it. But as to which standard you want to follow, well that's another thing
dannysullivan is offline   Reply With Quote
Old 01-13-2005   #5
Chris_D
 
Chris_D's Avatar
 
Join Date: Jun 2004
Location: Sydney Australia
Posts: 1,099
Chris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud ofChris_D has much to be proud of
Imagine if nobody at the next United Nations meeting had a name tag - and everybody at that UN meeting just stood up and started talking in their native language - without firstly identifying themselves and their country.

By not following the W3C standards - that's what your website is effectively doing.

If you choose not to declare a doctype - then the user agent (your browser, googlebot etc) will be forced to parse you webpages in quirks mode. And basically 'guess' what your page says.

You can ignore the W3C - and you can mix HTML 3, 4, and XHTML all in the same document. And you can keep on relying on the user agent to sort out what your documents say. Its a bit like someone having a conversation with you - but they suddenly start mixing French, German, Lithuanian and English mid sentence. Do you think that you'll understand PRECISELY what they said?

Alternatively, you can select a doctype and code to it. Then you can validate your code, and having correctly declare your doctype - you can find and fix potentially 'fatal' errors before you publish - and rest assured that all user agents will be able to parse your documents.......

You won't get any 'bonus' from the search engines for valid code - but your content will be very indexable, and there won't be any poorly coded showstoppers.

Validation is the beginning of the W3C process. And validation leads to semantics, and semantics leads to the separation of content (HTML/ XHTML) and presentation (CSS). And that leads to leaner, faster loading code. And next thing - accessibility starts making sense - for humans and search engine spiders. And thats the concept of the 'universality' of the web - accessible by anyone.


The differerence between HTMl 4.01 and XHTML1.0 ?

Well - you can design for last century's standard - or this century's standard - it's really up to you. But this century's standard caters better for those who use more than traditional desktop PCs - or will in the future.

Last edited by Chris_D : 01-13-2005 at 08:29 AM. Reason: typos & clarification
Chris_D is offline   Reply With Quote
Old 01-13-2005   #6
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
muchas gracias

Thanks for the detailed responses. Perhaps this should have been accompanied by a poll.

Chris D's comment brings out some of the "other reasons" that Danny writes of. However, according to what Tim Mayer publicly said, coupled with the tests I have run of some highly-ranked sites, validation doesn't seem important in order to perform.

Perhaps some of the more "pure" directories such as DMOZ do take this into consideration?

Also, for the "code-challenged" people in here, is XHTML 1.0 one step up from HTML 4.01? How much of a difference between these two? Is the lesser of the two as "last century OMG" as Chris D describes it as?
Chris Boggs is offline   Reply With Quote
Old 01-13-2005   #7
fathom
Member
 
Join Date: Jun 2004
Location: Nova Scotia, Canada
Posts: 475
fathom is a jewel in the roughfathom is a jewel in the roughfathom is a jewel in the rough
Quote:
Originally Posted by Chris Boggs
Perhaps some of the more "pure" directories such as DMOZ do take this into consideration?
Not a factor for DMOZ listing acceptance.
fathom is offline   Reply With Quote
Old 01-13-2005   #8
St0n3y
The man who thinks he knows something does not yet know as he ought to know.
 
Join Date: Jun 2004
Location: Here. Right HERE.
Posts: 621
St0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to all
Google's Web master Guidelines

Quote:
Check for broken links and correct HTML.
Now, I think you can make a fair argument that this does not imply that valid code will give you a rankings boost. On the other hand, I think a fair argument can be made to doing whatever you can to "please" the search engines (in this case Google).

Google seems to be a company that looks far beyond simple "search relevance" and has an eye toward professional standards of the results its showing as well. Natural language analysis lends credence to this. If that is the case, I would think that Google would be likely to give a couple of bonus points to sites that do use valid HTML over sites that don't. I'm sure it wouldn't be enough, by itself, to give a rankings boost, but as we know, every little bit can make a difference.
St0n3y is offline   Reply With Quote
Old 01-13-2005   #9
rcjordan
There are a lot of truths out there. Just choose one that suits you. -Wes Allison
 
Join Date: Jun 2004
Posts: 279
rcjordan is a name known to allrcjordan is a name known to allrcjordan is a name known to allrcjordan is a name known to allrcjordan is a name known to allrcjordan is a name known to all
I'm running bad code without a doctype. It ranks and has ranked (competition level: 2M-4M) for years. Currently top 3 in G, M, & Y.
rcjordan is offline   Reply With Quote
Old 01-13-2005   #10
fathom
Member
 
Join Date: Jun 2004
Location: Nova Scotia, Canada
Posts: 475
fathom is a jewel in the roughfathom is a jewel in the roughfathom is a jewel in the rough
Quote:
Originally Posted by rcjordan
I'm running bad code without a doctype. It ranks and has ranked (competition level: 2M-4M) for years. Currently top 3 in G, M, & Y.
I doubt anyone is saying 'it can't rank' but there is a possibility that 'it could rank better'.

I've unsuccessfully tried many times to observe a difference.

The only thing that keeps me believing there is something in it - is the fact that Google indexes the check pages for 'validate'.

Acirehp (bottom)

A site with 100 validate pages has 100 additional links to the website... whether these are counted [don't know].

Note: these pages are but a few days old and yet the check page is already indexed.

Last edited by fathom : 01-13-2005 at 12:36 PM.
fathom is offline   Reply With Quote
Old 01-13-2005   #11
shor
aka Lucas Ng. Aussie online marketer.
 
Join Date: Aug 2004
Posts: 161
shor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the roughshor is a jewel in the rough
Does validated code count specifically towards a higher ranking over a non-validated document?
On the basis of general observations and as answered by previous contributors, no.

Will it count towards higher rankings in the future?
No, as search engine technologies are inclined not towards 'studying' code but towards document semantics and a document's relevant domain. In other words, search engines are being engineered to follow more natural human thinking processes and patterns, Which I find a little ironic as SEOs are trying to figure out how to best assimilate themselves closer to SEs (and their algos) while the SEs themselves are trying to evolve towards human user search patterns! Sounds like a Benny Hill chase scene.

So what's the point of validation?
Chris_D's analogy is succinctly appropriate. Should you have validated XHTML to boost rankings? No. Should you have validated XHTML? Damn straight you should - the standardisation that validated XHTML offers is a vital ingredient of professional web development.

The difference between HTML and XHTML?
XHTML is based on XML. It was designed to replace the haphazard HTML coding.
The main benefits of XHTML usage are:
- its ability to accommodate non-standard web browsing agents
- a structured adherence to coding standards - eg. XHTML must be all lowercased, all tag sets have closing tags, termination of empty elements
- ability to be extensible (the X in XML/XHTML), as the generation of new elements is easily facilitated by XML

Check out http://www.wdvl.com/Authoring/Langua...XHTML/dif.html for the major differences between XHTML and HTML

Quote:
Originally Posted by Chris Boggs
Bonus question: Why does MS use tags within its HTML that can only be read by IE? (used as an example to argue that even MS itself doesn't support HTML Standards)
Because MS is MS.
Every iteration of IE has introduced 'features' that do not conform to W3C standards or behave (infuriatingly) differently to every other browser on the market. That's one good reason why so many of us are using Firefox, Opera, Safari or whatnot

Last edited by shor : 01-13-2005 at 06:48 PM.
shor is offline   Reply With Quote
Old 01-14-2005   #12
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Quote:
Originally Posted by Chris_D
But this century's standard caters better for those who use more than traditional desktop PCs - or will in the future.
Indeed, and that's the only reason for moving from HTML 4.0 to XHTML 1.0 that I can think of. But desktops still rule - for the moment at least while the mobile revolution properly establishes itself.

Ultimately, the browser market has been the only concern in terms of validation - there has been no need to cater to third party interpretations of "do and don'ts" if it works in your target browser market in the first place.

Frankly, a lot of W3C compliance seems expoused by "Design Puritans", who spend too much of their time dictating that:

1. W3C must be mindlessly obeyed,
2. complaining that browsers can't properly support their CSS tricks, and
3. attempting to ignore the existence of Apples for surfing.

However, it should be pretty clear that even being W3C compliant on XHTML 1.0 standards is simply not enough for designing web pages for mobile devices.
I, Brian is offline   Reply With Quote
Old 01-14-2005   #13
Chris Boggs
 
Chris Boggs's Avatar
 
Join Date: Aug 2004
Location: Near Cleveland, OH
Posts: 1,722
Chris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud ofChris Boggs has much to be proud of
onward...

Quote:
Originally Posted by shor
...I find a little ironic as SEOs are trying to figure out how to best assimilate themselves closer to SEs (and their algos) while the SEs themselves are trying to evolve towards human user search patterns! Sounds like a Benny Hill chase scene.
Very good analogy, no pats on the head for you today... (That silly Benny Hill tune is now in my head)

Quote:
Originally Posted by shor
Should you have validated XHTML to boost rankings? No. Should you have validated XHTML? Damn straight you should - the standardisation that validated XHTML offers is a vital ingredient of professional web development.
If it takes considerably longer to validate hundreds of pages, and it is not by consensus so far required, why not devote those developer hours towards other tasks? I would venture to bet that there are a lot of sites without validated code out there that look every bit as "professionally developed" as many that are.

Quote:
Originally Posted by shor
Because MS is MS.
Every iteration of IE has introduced 'features' that do not conform to W3C standards or behave (infuriatingly) differently to every other browser on the market. That's one good reason why so many of us are using Firefox, Opera, Safari or whatnot
Thanks for posting what I feel to be the correct answer. You win: positive reputation. (damn must spread some around first...maybe someone else can "hit you" for me)

Last edited by Chris Boggs : 01-14-2005 at 08:28 AM. Reason: comment
Chris Boggs is offline   Reply With Quote
Old 01-14-2005   #14
fathom
Member
 
Join Date: Jun 2004
Location: Nova Scotia, Canada
Posts: 475
fathom is a jewel in the roughfathom is a jewel in the roughfathom is a jewel in the rough
Quote:
Originally Posted by Chris Boggs
You win: positive reputation. (damn must spread some around first...maybe someone else can "hit you" for me)
OK DONE!
fathom is offline   Reply With Quote
Old 01-14-2005   #15
St0n3y
The man who thinks he knows something does not yet know as he ought to know.
 
Join Date: Jun 2004
Location: Here. Right HERE.
Posts: 621
St0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to all
Perhaps, I may play a little devils advocate here... Obviously very bad HTML can still render in browsers properly. Why then, do you suppose Google would go out of its way to say to check for correct HTML?

I mean, I hate to bow at the alter of Google, but if Google takes the effort to mention something like this you would think it would be for a reason. And I don't believe they simply want to be the clean code police!
St0n3y is offline   Reply With Quote
Old 01-14-2005   #16
fathom
Member
 
Join Date: Jun 2004
Location: Nova Scotia, Canada
Posts: 475
fathom is a jewel in the roughfathom is a jewel in the roughfathom is a jewel in the rough
W3 Robots.txt has been edited since the last time I checked?:

Quote:
#
# robots.txt for http://www.w3.org/
#
# $Id: robots.txt,v 1.28 2004/09/26 06:20:55 sandro Exp $
#

# Exclude Ontaria until it can handle the load
User-agent: *
Disallow: /2004/ontaria/basic

# For use by search.w3.org
User-agent: W3C-gsa
Disallow: /Out-Of-Date

# W3C Link checker
User-agent: W3C-checklink
Disallow:

# exclude some access-controlled areas
User-agent: *
Disallow: /Team
Disallow: /Project
Disallow: /Systems
Disallow: /Web
Disallow: /History
Disallow: /Out-Of-Date
Disallow: /2002/02/mid
Disallow: /mid/
Disallow: /People/all/
Disallow: /2003/03/Translations/byLanguage
Disallow: /2003/03/Translations/byTechnology
fathom is offline   Reply With Quote
Old 01-14-2005   #17
I, Brian
Whitehat on...Whitehat off...Whitehat on...Whitehat off...
 
Join Date: Jun 2004
Location: Scotland
Posts: 940
I, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of lightI, Brian is a glorious beacon of light
Quote:
Originally Posted by St0n3y
Perhaps, I may play a little devils advocate here... Obviously very bad HTML can still render in browsers properly. Why then, do you suppose Google would go out of its way to say to check for correct HTML?

I mean, I hate to bow at the alter of Google, but if Google takes the effort to mention something like this you would think it would be for a reason. And I don't believe they simply want to be the clean code police!
Bad HTML can prevent a page from displaying properly. When that happens, Google gets an incorrect view of the page's meaning, which defeats Google's purpose.

Google will index bad HTML - but where the HTML does not render your page elements properly, your search traffic may suffer for it.

Try it by deleting the </title> tag from a page, or the end quotes from a URL in an anchor.
I, Brian is offline   Reply With Quote
Old 01-14-2005   #18
St0n3y
The man who thinks he knows something does not yet know as he ought to know.
 
Join Date: Jun 2004
Location: Here. Right HERE.
Posts: 621
St0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to all
Sure, I understand bad html can prevent pages from rendering properly, but in other forms, bad html will render just fine. Some will be spider stoppers, some not. But checking for proper HTML does not stop at simiply closing all tags. That, by itself, implies check for ALL proper HTML usage, not just some.
St0n3y is offline   Reply With Quote
Old 01-14-2005   #19
hardball
Member
 
Join Date: Oct 2004
Posts: 83
hardball will become famous soon enough
You index documents, parse words and attributes and otherwise slice and dice the pages before storing them in various indexes, I guess you could assign a validation score to the documents before you start parsing but I doubt it. I see way too much goofy stuff (some of it mine) ranking well to conclude that validation has any effect anywhere.
hardball is offline   Reply With Quote
Old 01-14-2005   #20
St0n3y
The man who thinks he knows something does not yet know as he ought to know.
 
Join Date: Jun 2004
Location: Here. Right HERE.
Posts: 621
St0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to allSt0n3y is a name known to all
I think the big picture here is being missed. I'm not suggesting that non-validated HTML will hurt your rankings. That's like saying that not having an ALT attribute will hurt, or if you don't have ALTs and rank well that Google doesn not analyze them. It certainly can happen. What I'm suggesting is that validating HTML will will add some kind of additional importance to your site.

If I were Google and I was analyzing what to consider "authoritative" sites, I think I would consider sites that conform to other authoritative sources, in this case W3C, to be of more significant importance.

Don't get me wrong... I'm not saying I'm right and those who don't agree are wrong... just exploring the potentials here.
St0n3y is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off