PDA

View Full Version : How important is HTML validation to the SEs?


Lyndsay
01-13-2006, 03:10 PM
I have spent the last three days validating HTML code, and I've got quite a bit yet to go.

I'm wondering if I'm wasting my time, if the search engines don't really care about the quality of HTML code.

Or, does it help a great deal that it is validated?

I really really really like clean code, but going through it is tedious and using up a lot of my time.

simons1321
01-13-2006, 07:42 PM
i'm not entirely sure if validated code is favored by search engines, but i do know this:

Usually having cleaner html code, properly implementing css, and exporting page code (ie javascript to external javascript files) can dramatically lower your content to code ratio.

Now this, IMO, might help you more in the search engines because then more of your spiderable code is content, properly written with the right keywords and phrases. And we all know spiders love content!

In addition, from a webdesign standpoint, you should always make your site as accessible as possible for people who arent using the most up to date browsers on the most up to date hardware or require the use of screen readers or some other device. This includes using the alt feature in image tags. I can't see how Google or any other search engine would punish you for proper webdesign, as long as you dont abuse it by stuffing elements with keywords!

I also love validated html and css and try to practice it as often as possible. Most of the time, I can remove as much as 50 to 60% of html code from a page by using proper css and html techniques, especially by getting rid of tables.

So far, none of the sites i work on have been penalized for this, but at the same time, I haven't seen huge jumps either.

rcjordan
01-13-2006, 09:11 PM
bad code will rank just fine.

mcanerin
01-13-2006, 10:43 PM
From a search engine standpoint, if it renders in text only mode, it's not bad code. If validation was a ranking issue 90% of the internet would not rank for anything, and people who focus on HTML instead of good content would tend to rank better - this is not the case, and it should not be the case.

Googles home page, one of the most simple pages around, does not validate and has about 50 errors (http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com). Most Blogger sites don't validate, and Amazon fails so completely the validator gives up. And yet, they seem to do pretty well in the SERPs.

I prefer to validate all my work, but that's because I take pride in being a professional. Additionally, it guarantees that my code is not the problem, so when I go looking for a problem, I don't have to worry about that aspect, and can concentrate on other potential issues.

There are times when I will allow a page to not validate, for one reason or another. For example, I may use a useful 3rd party product that doesn't validate but offers strong value to my visitors. In this type of case, my visitors win my loyalty, not the W3C.

Finally, try to validate a PDF, .doc or .txt file some time, all of which are indexed and read just fine. ;)

Best practice is to make damn sure your template(s) validate, then build your website, rather than doing it the other way around, as you are discovering. That way, even if a particular page does not validate for one picky reason or another, it's usually minor and has no effect on browsers or search engines.

Ian

Chris_D
01-14-2006, 12:44 AM
The short answer is no - there is no ranking benefit for 'valid' code.

Specifically - Matt Cutts said:

Q: "In more general terms, what do you think is the relationship between Google and the W3C? Do you think it would be important for Google to e.g. be concerned about valid HTML?

A: I like the W3C a lot; if they didn't exist, someone would have to invent them. :) People sometimes ask whether Google should boost (or penalize) for valid (or invalid) HTML. There are plenty of clean, perfectly validating sites, but also lots of good information on sloppy, hand-coded pages that don't validate. Google's home page doesn't validate and that's mostly by design to save precious bytes. Will the world end because Google doesn't put quotes around color attributes? No, and it makes the page load faster. :) Eric Brewer wrote a page while at Inktomi that claimed 40% of HTML pages had syntax errors. We can't throw out 40% of the web on the principle that sites should validate; we have to take the web as it is and try to make it useful to searchers, so Google's index parsing is pretty forgiving.
http://blog.outer-court.com/archive/2005-11-17-n52.html


Validation is just a tool to ensure that your code works according to the standards for the markup language you use. It helps identify issues before you publish your code.

The longer answer is that there are several aspects involved in Web Standards - web standards are about more than *just* validating your code.

For example - using correct Semantic code is generally rewarded by ranking (i.e. relevance)

Accessible valid code is generally rewarded by full spidering of the content. Valid code is generally rewarded by SE's indexing your full content (e.g. leave off a </p> and see what gets indexed).

But 'validation' in itself is not rewarded per se.

Its a bit hard to explain in a few paragraphs - but you'll actually find that many Web Standards Supporters are SEO's - because web standards are about more than just code validation.

Advantages include leaner, faster loading code - due to the seaparation of content (html) and presentation elements(CSS); higher page position for content (once the page presentation elements are in external files like external CSS and external javascript); and using semantic code elements correctly.

As a 'do as I say - not what I do' - Google recommends:
Check for broken links and correct HTML....
http://www.google.com/intl/en/webmasters/guidelines.html

We had quite a detailed thread about this a while back http://forums.searchenginewatch.com/showthread.php?p=30561

Blinky
01-14-2006, 09:24 AM
I think the only thing, which can help site ranking is to put alternative names:"alt=" in the HTML code.And renaming images :)

ssjothun
01-19-2006, 10:31 PM
It's so much more to it than that. This article at A List Apart give s fairly good, but brief description about the importance of proper HTML http://www.alistapart.com/articles/accessibilityseo

Check you PM, btw, cause I might be able to help you further.

rcjordan
01-19-2006, 11:08 PM
geez, go away for a few days and the w3-huggers just move in, don't they? let me put this another way; bad code can rank higher than good code --you just have to know where bad is good.

ssjothun
01-20-2006, 02:31 AM
ok...I work for one of the biggest online gambling sites on the net, and one of my main responsibilities is to ensure that the technical standard is absolutely perfect.

If you have two sites that is optimized identically in every aspect except from the mark-up, I can guarantee you that the site with the best markup will win.

And I'm not just talking about validated code, I'm talking about proper document structure, proper use of H tags instead of classes for headings, proper image naming convention, proper file and folder naming convention and so on and so forth.

In a competitive environment such as online gambling, you have to utilize every single tool you have to gain the top spots.

But honestly, it's not just about that from my point of view, it's about making your website accessible to as many potential customers as possible. Validating your code is just good business.

And it is from my personal experience a way faster and more cost effective way of developing sites. The question should not be "why should you validate and make your site accessible", but rather "why on earth shouldn't you".

Yep...I'm passionate about this stuff ;)

Conservative
01-27-2006, 06:13 AM
ssjothun, while I don't disagree with anything you are saying, how on earth are you going to measure "when all things are equal" on two casino sites?

Even if they have EXACTLY the same duplicate content, structure but one , one would be indexed before the other one, which the engines would know and could detect. If they don't have duplicate content how would you know that it's not content differences which cause different rankings.

Similarly I have seen a couple of pages in your industry which use what I have posted about and what I term "semantically incorrect emphasis" (http://forums.searchenginewatch.com/showthread.php?t=9801)
For me bolding two phrases next to each other would break validation rules, which govern language and logic. Of course WYSIWIG editors are not a great help and have put a lot of silly bolded content out there and marked it up in stupid way. I wonder how the search engines are making sense of this mess?

I hope that in a future W3C spec, documents won't validate if you've got two
bolded phrases like this next to each other.
It would be interesting to know how many documents are wrongly marked up in this manner and whether the search engines filter the centre tags out like and read it like this before applying their indexing.
this next

Sorry, I'm passionate about this ;)

Chris_D
01-27-2006, 09:03 AM
Hey rcjordan

I still think the best comment about web standards was made by one of the W3C guys here in Sydney a few years back.

"Most people have said that they would RTFM if there was a FM to FR" :)

mcanerin
01-27-2006, 11:16 AM
Most people have said that they would RTFM if there was a FM to FR

ROFL! I love that!

By the way, and as an aside, the W3C is NOT a standards organization. They admit that themselves. Thinking of them as a standards organization is a good way to get confused. For example, standards organizations typically don't have multiple competing incompatible but current standards. Actually, standards organizations are intended to stop that kind of thing!

One way to think of them is a compatibility specification organization. Think compatibility, not standards. If you claim your browser is HTML 4.01 compatible, then it has to render HTML 4.01 verified pages properly, or your browser is not compatible with the specification.

This designation allows a browser to be compatible with many specifications, not just one, and to add functionality that is new, thus advancing technology.

Likewise, when you validate your page as XHTML 1.1, you are making sure that a browser or other user agent that is XHTML 1.1 compatible will be able to properly read it.

Googlebot, as a text only agent, makes no attempt to render tables, CSS etc. But it can certainly read the various flavors of validated W3C HTML. It can also read a lot of other types of pages the W3C wouldn't touch with a 10-foot pole, like really poorly written code, and things that are outside of W3C mandate, like text, doc and PDF files. Speaking of text, ASCII is an actual standard that affects browsers. But it won't validate using any W3C tool.

Naturally, the more compatible you are, the less issues you will have. But that's not standards. Standards are things like how you handle DNS queries and network protocols. These are set by the IEEE and other standards organizations, not the W3C. If you don't follow a standard, you may find yourself completely unable to do anything (mommy, why can't I surf the net using netbeui instead of TCP/IP?)

I know it sounds like I'm being picky about semantics, but there really is an important difference between the concepts, IMO.

Ian

sebastian
01-27-2006, 01:08 PM
The cleaner the code, the easier it is for search spiders to index your content and index it properly and completely.

Following the DOM is very, very important and if this is not glaringly obvious to you, especially with Google - you have some homework to do.

In fact, there is no valid reason for NOT having clean code.

rcjordan
01-27-2006, 01:20 PM
pick a page, any page, in G's recent analysis of page-coding (http://code.google.com/webstats/2005-12/pageheaders.html) (I'll point to page headers) and you'll see G merrily ignoring bad code by the ton.

ssjothun
01-27-2006, 02:15 PM
ssjothun, while I don't disagree with anything you are saying, how on earth are you going to measure "when all things are equal" on two casino sites?

Even if they have EXACTLY the same duplicate content, structure but one , one would be indexed before the other one, which the engines would know and could detect. If they don't have duplicate content how would you know that it's not content differences which cause different rankings.

Similarly I have seen a couple of pages in your industry which use what I have posted about and what I term "semantically incorrect emphasis" (http://forums.searchenginewatch.com/showthread.php?t=9801)
For me bolding two phrases next to each other would break validation rules, which govern language and logic. Of course WYSIWIG editors are not a great help and have put a lot of silly bolded content out there and marked it up in stupid way. I wonder how the search engines are making sense of this mess?



I obviously meant theoretically.

As when it comes to my industry, there is unfortunately a lot of crap - completely infested with black hat main sites, landing pages, micro and mini pages and what not. Competitors, some worse than others, just push everything to the limit - and beyond. Some efficient, others just plain pathetic.

Certain companies are using free BBS boards, spam these and do javascript redirects - really low level stuff, but scaringly enough, it works.

b2.boards2go.com and lol.to are two boards that are exploited this way. Last week, a network of casino related pages on b2.boards2go.com was on page one in google for virually any casino related term you can think of...it's really bad...

sebastian
01-27-2006, 03:03 PM
pick a page, any page, in G's recent analysis of page-coding (I'll point to page headers) and you'll see G merrily ignoring bad code by the ton.

It's quotes like this that make me want to pull my hair out.

If you are a true, dedicated and respected Internet Marketing professional and you come up with the above statement, I wonder how you become employed in the first place.

I'm sorry - but the comment (quoted above) is 100% useless to the discussion.

While it's difficult to definitively state that proper coding techniques will offer better ranking than poorly coded pages, the following is simply a fact:

The search engine spiders are BOTS and rudimentary ones at that. The cleaner and more valid the code, the easier the bot travels and indexes your site.

Why anyone would argue this or try to discount it I just don't know. Just means they are bored or grossly inexperienced.

rcjordan
01-27-2006, 03:12 PM
>If you are a true, dedicated and respected Internet Marketing professional and you come up with the above statement, I wonder how you become employed in the first place.

me, too.

but, back to spiders, they could give a rat's derriere about bad code.

mcanerin
01-27-2006, 04:15 PM
I agree that professionals work to create great code. It's one of the things that make them professionals.

But Google doesn't really care about how professional a web page is - it cares about the content. As long as that content is accessible to the bot, that's what it cares about.

Real web professionals separate content from code as much as possible. So does Google.

The code is not the content. It's the content that drives rankings and results.

Yes, your code should be clean. Your graphics should also be professional, and your scripts should function. Your Flash should be well programmed, your sound files clean and clear, and you should make good use of white space.

But Google doesn't care about any of that. It doesn't mean it's not important, since Google is hardly the final arbiter of proper design. It's a search engine, not a judge. It's not even a user.

My opinion,

Ian

ssjothun
01-27-2006, 05:38 PM
hmmm...but what it does care about is coherence, clustering and theming of terms on a page. A great coder is able to lay a site out in a way that makes it easier for the SE to make sense of your site and to see the theme clearer.

Using XHTML and CSS will greatly improve the readability of your code for search engine spiders, help maintain a positive content-to-code ratio and in most cases when creating a menu, Java Scripts can easily be replaced with CSS.

With CSS, you can do many tricks to make the important parts of your site, namely the page content, more prominent. If you read the Hilltop document, you may have noticed how it mentions FullnessFactor, or more frequently used Keyword Density – A rating many obsess over.

I do believe, however, that there are other aspects to this, and Dr E. Garcia has an excellent, but rather technical and longwinded description and solution to the problem in his article called “The Keyword Density of Non-Sense”.

The search engine reads the page from top to bottom. And even though this is not directly related to validation, it says a lot about the importance of effective code. Typically, a nested tables site has the logo first, then menu, then content, and then maybe a second menu. In this particular order in the code.

Now, with CSS, you are able toput the content div first, so that the theme, the prominence of what's really important comes first.

In the same way as you use H tags to construct proper document structure, you should also use XHTML and CSS to construct an efficient and logical overall document structure.

The closer you can get the important on-page items together -- Title, Heading, Content, In-Content Links -- the more efficiently you are able to define the over-all theme.

In addition, with tables layout:
• The browser doesn’t cache the layout, thus making it slower.
• The search engine have way more code to go through to get to the actual content
• Frames prohibit the spider to find the content, and when it does find it, only the content frame is displayed, not the whole site

With CSS:
• CSS by its nature gives you cleaner and faster code
• Using CSS classes for frequently used objects reduce code further
• Cleaner and less code makes it easy for the search engines to spider your page
• CSS Positioning makes it possible to put content first

What this amounts to is something I’ve said for years now -- html tables should be use for presenting data, not for designing websites.

pixel1
02-06-2006, 11:21 PM
:) Many reports online make the satement that search engine crawlers only search down so far in html, so if this is the case it is probably better to have nice clean code

mcanerin
02-07-2006, 01:29 AM
html tables should be use for presenting data, not for designing websites.

I agree, and yet a website using tables for design will validate just fine. The fact that it validates doesn't mean it's good design. It could be an ugly monstrosity of garish colors and misspelled words with little semantic meaning, and could still validate perfectly.

Many reports online make the satement that search engine crawlers only search down so far in html, so if this is the case it is probably better to have nice clean code

I certainly am not arguing against clean code or validation, but since crawlers index a minimum of 100K (that's of code/content, not of pictures, etc) it's not likely to be a serious issue. I'm very long winded (you might be able to tell...) and the largest page on my site (a scientific article) is 37k. Most are between 10-20k. I'm not denying it could be an issue, but it would also be an issue with validated pages that just happen to be really long, too.

Bottom line, although it would be nice, in a perfect world, for good design to get a direct bonus in search engines directly, it's not going to happen. For one thing there is that separation of code and content issue.

But here is something to think about.

Are people more or less likely to link to well coded, highly usable sites? I would argue that, all things being equal, a highly usable, well coded site is more likely to get links than one with similar information but poorer coding. For one thing, using CSS, etc makes the pages load faster, which makes people more likely to trust the site.

Now, this is different from validation, since you could have minor validation errors and the visitor would not care. But in general, I would suggest that well designed, fast sites are more likely to get links, and therefore rank better, than poorly coded ones, everything else being fairly equal.

Good design counts for people. And people are the ones doing the linking.

Just an opinion,

Ian