View Full Version : Does validated HTML count?
Chris Boggs
01-12-2005, 12:35 PM
I have noticed some sites that are "not valid HTML 4.0 Transitional," using W3C Validator (http://validator.w3.org/). Based on this definition of its purpose, also from W3C:
In addition to the text, multimedia, and hyperlink features of the previous versions of HTML, HTML 4.0 supports more multimedia options, scripting languages, style sheets, better printing facilities, and documents that are more accessible to users with disabilities. HTML 4.0 also takes great strides towards the internationalization of documents, with the goal of making the Web truly World Wide.
and the opinion of the esteemed developers in here, can Google or any other search engine be using this as a factor in rankings?
Bonus question: Why does MS use tags within its HTML that can only be read by IE? (used as an example to argue that even MS itself doesn't support HTML Standards)
I, Brian
01-12-2005, 07:38 PM
I don't believe that Google at least cares whether a page is XHTML 1.0 Strict vs HTML 4.0.
Also, try running www.google.com through the W3C validator - that should tell you how important XHTML 1.0 is. :)
St0n3y
01-12-2005, 07:42 PM
I think having valid html plays a big role in how search engines view and assign importance to your site. Sure you can get rankings with poor code, but I think having properly valitdated code can helps significantly.
Which standard of validation helps the most, I don't know.
dannysullivan
01-13-2005, 08:03 AM
Tim Mayer from Yahoo in December on an SES panel specifically said Yahoo didn't care if a page was validated. It wasn't going to give you any type of ranking boost.
Pretty sure that will be the case with Google, as well. In fact, I've many times seen pages rank well that had bad HTML or even broken HTML. The search engines seem to make the best of what they can.
Having said this, I still think it's good to have valid HTML as much as you can. There are other good reasons for doing it. But as to which standard you want to follow, well that's another thing :)
Chris_D
01-13-2005, 09:17 AM
Imagine if nobody at the next United Nations meeting had a name tag - and everybody at that UN meeting just stood up and started talking in their native language - without firstly identifying themselves and their country.
By not following the W3C standards - that's what your website is effectively doing.
If you choose not to declare a doctype - then the user agent (your browser, googlebot etc) will be forced to parse you webpages in quirks mode. And basically 'guess' what your page says.
You can ignore the W3C - and you can mix HTML 3, 4, and XHTML all in the same document. And you can keep on relying on the user agent to sort out what your documents say. Its a bit like someone having a conversation with you - but they suddenly start mixing French, German, Lithuanian and English mid sentence. Do you think that you'll understand PRECISELY what they said?
Alternatively, you can select a doctype and code to it. Then you can validate your code, and having correctly declare your doctype - you can find and fix potentially 'fatal' errors before you publish - and rest assured that all user agents will be able to parse your documents.......
You won't get any 'bonus' from the search engines for valid code - but your content will be very indexable, and there won't be any poorly coded showstoppers.
Validation is the beginning of the W3C process. And validation leads to semantics, and semantics leads to the separation of content (HTML/ XHTML) and presentation (CSS). And that leads to leaner, faster loading code. And next thing - accessibility starts making sense - for humans and search engine spiders. And thats the concept of the 'universality' of the web - accessible by anyone.
The differerence between HTMl 4.01 and XHTML1.0 ?
Well - you can design for last century's standard - or this century's standard - it's really up to you. But this century's standard caters better for those who use more than traditional desktop PCs - or will in the future.
Chris Boggs
01-13-2005, 10:25 AM
Thanks for the detailed responses. Perhaps this should have been accompanied by a poll.
Chris D's comment brings out some of the "other reasons" that Danny writes of. However, according to what Tim Mayer publicly said, coupled with the tests I have run of some highly-ranked sites, validation doesn't seem important in order to perform.
Perhaps some of the more "pure" directories such as DMOZ do take this into consideration?
Also, for the "code-challenged" people in here, is XHTML 1.0 one step up from HTML 4.01? How much of a difference between these two? Is the lesser of the two as "last century OMG" as Chris D describes it as?
fathom
01-13-2005, 11:31 AM
Perhaps some of the more "pure" directories such as DMOZ do take this into consideration?
Not a factor for DMOZ listing acceptance.
St0n3y
01-13-2005, 11:45 AM
Google's Web master Guidelines (http://www.google.com/webmasters/guidelines.html)
Check for broken links and correct HTML.
Now, I think you can make a fair argument that this does not imply that valid code will give you a rankings boost. On the other hand, I think a fair argument can be made to doing whatever you can to "please" the search engines (in this case Google).
Google seems to be a company that looks far beyond simple "search relevance" and has an eye toward professional standards of the results its showing as well. Natural language analysis lends credence to this. If that is the case, I would think that Google would be likely to give a couple of bonus points to sites that do use valid HTML over sites that don't. I'm sure it wouldn't be enough, by itself, to give a rankings boost, but as we know, every little bit can make a difference.
rcjordan
01-13-2005, 01:19 PM
I'm running bad code without a doctype. It ranks and has ranked (competition level: 2M-4M) for years. Currently top 3 in G, M, & Y.
fathom
01-13-2005, 01:33 PM
I'm running bad code without a doctype. It ranks and has ranked (competition level: 2M-4M) for years. Currently top 3 in G, M, & Y.
I doubt anyone is saying 'it can't rank' but there is a possibility that 'it could rank better'.
I've unsuccessfully tried many times to observe a difference.
The only thing that keeps me believing there is something in it - is the fact that Google indexes the check pages for 'validate'.
Acirehp (http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLD,GGLD:2004-50,GGLD:en&q=acirehp) (bottom)
A site with 100 validate pages has 100 additional links to the website... whether these are counted [don't know].
Note: these pages are but a few days old and yet the check page is already indexed.
Does validated code count specifically towards a higher ranking over a non-validated document?
On the basis of general observations and as answered by previous contributors, no.
Will it count towards higher rankings in the future?
No, as search engine technologies are inclined not towards 'studying' code but towards document semantics and a document's relevant domain. In other words, search engines are being engineered to follow more natural human thinking processes and patterns, Which I find a little ironic as SEOs are trying to figure out how to best assimilate themselves closer to SEs (and their algos) while the SEs themselves are trying to evolve towards human user search patterns! Sounds like a Benny Hill chase scene.
So what's the point of validation?
Chris_D's analogy is succinctly appropriate. Should you have validated XHTML to boost rankings? No. Should you have validated XHTML? Damn straight you should - the standardisation that validated XHTML offers is a vital ingredient of professional web development.
The difference between HTML and XHTML?
XHTML is based on XML. It was designed to replace the haphazard HTML coding.
The main benefits of XHTML usage are:
- its ability to accommodate non-standard web browsing agents
- a structured adherence to coding standards - eg. XHTML must be all lowercased, all tag sets have closing tags, termination of empty elements
- ability to be extensible (the X in XML/XHTML), as the generation of new elements is easily facilitated by XML
Check out http://www.wdvl.com/Authoring/Languages/XML/XHTML/dif.html for the major differences between XHTML and HTML
Bonus question: Why does MS use tags within its HTML that can only be read by IE? (used as an example to argue that even MS itself doesn't support HTML Standards)
Because MS is MS.
Every iteration of IE has introduced 'features' that do not conform to W3C standards or behave (infuriatingly) differently to every other browser on the market. That's one good reason why so many of us are using Firefox, Opera, Safari or whatnot ;)
I, Brian
01-14-2005, 08:49 AM
But this century's standard caters better for those who use more than traditional desktop PCs - or will in the future.
Indeed, and that's the only reason for moving from HTML 4.0 to XHTML 1.0 that I can think of. But desktops still rule - for the moment at least while the mobile revolution properly establishes itself.
Ultimately, the browser market has been the only concern in terms of validation - there has been no need to cater to third party interpretations of "do and don'ts" if it works in your target browser market in the first place.
Frankly, a lot of W3C compliance seems expoused by "Design Puritans", who spend too much of their time dictating that:
1. W3C must be mindlessly obeyed,
2. complaining that browsers can't properly support their CSS tricks, and
3. attempting to ignore the existence of Apples for surfing.
However, it should be pretty clear that even being W3C compliant on XHTML 1.0 standards is simply not enough for designing web pages for mobile devices.
Chris Boggs
01-14-2005, 09:26 AM
...I find a little ironic as SEOs are trying to figure out how to best assimilate themselves closer to SEs (and their algos) while the SEs themselves are trying to evolve towards human user search patterns! Sounds like a Benny Hill chase scene.
Very good analogy, no pats on the head for you today... :D (That silly Benny Hill tune is now in my head)
Should you have validated XHTML to boost rankings? No. Should you have validated XHTML? Damn straight you should - the standardisation that validated XHTML offers is a vital ingredient of professional web development.
If it takes considerably longer to validate hundreds of pages, and it is not by consensus so far required, why not devote those developer hours towards other tasks? I would venture to bet that there are a lot of sites without validated code out there that look every bit as "professionally developed" as many that are.
Because MS is MS.
Every iteration of IE has introduced 'features' that do not conform to W3C standards or behave (infuriatingly) differently to every other browser on the market. That's one good reason why so many of us are using Firefox, Opera, Safari or whatnot ;)
Thanks for posting what I feel to be the correct answer. You win: positive reputation. ;) (damn must spread some around first...maybe someone else can "hit you" for me)
fathom
01-14-2005, 10:29 AM
You win: positive reputation. ;) (damn must spread some around first...maybe someone else can "hit you" for me)
OK DONE! ;)
St0n3y
01-14-2005, 10:55 AM
Perhaps, I may play a little devils advocate here... Obviously very bad HTML can still render in browsers properly. Why then, do you suppose Google would go out of its way to say to check for correct HTML?
I mean, I hate to bow at the alter of Google, but if Google takes the effort to mention something like this you would think it would be for a reason. And I don't believe they simply want to be the clean code police!
fathom
01-14-2005, 11:14 AM
W3 Robots.txt has been edited since the last time I checked?:
#
# robots.txt for http://www.w3.org/
#
# $Id: robots.txt,v 1.28 2004/09/26 06:20:55 sandro Exp $
#
# Exclude Ontaria until it can handle the load
User-agent: *
Disallow: /2004/ontaria/basic
# For use by search.w3.org
User-agent: W3C-gsa
Disallow: /Out-Of-Date
# W3C Link checker
User-agent: W3C-checklink
Disallow:
# exclude some access-controlled areas
User-agent: *
Disallow: /Team
Disallow: /Project
Disallow: /Systems
Disallow: /Web
Disallow: /History
Disallow: /Out-Of-Date
Disallow: /2002/02/mid
Disallow: /mid/
Disallow: /People/all/
Disallow: /2003/03/Translations/byLanguage
Disallow: /2003/03/Translations/byTechnology
I, Brian
01-14-2005, 11:39 AM
Perhaps, I may play a little devils advocate here... Obviously very bad HTML can still render in browsers properly. Why then, do you suppose Google would go out of its way to say to check for correct HTML?
I mean, I hate to bow at the alter of Google, but if Google takes the effort to mention something like this you would think it would be for a reason. And I don't believe they simply want to be the clean code police!
Bad HTML can prevent a page from displaying properly. When that happens, Google gets an incorrect view of the page's meaning, which defeats Google's purpose.
Google will index bad HTML - but where the HTML does not render your page elements properly, your search traffic may suffer for it.
Try it by deleting the </title> tag from a page, or the end quotes from a URL in an anchor.
St0n3y
01-14-2005, 11:57 AM
Sure, I understand bad html can prevent pages from rendering properly, but in other forms, bad html will render just fine. Some will be spider stoppers, some not. But checking for proper HTML does not stop at simiply closing all tags. That, by itself, implies check for ALL proper HTML usage, not just some.
hardball
01-14-2005, 02:15 PM
You index documents, parse words and attributes and otherwise slice and dice the pages before storing them in various indexes, I guess you could assign a validation score to the documents before you start parsing but I doubt it. I see way too much goofy stuff (some of it mine) ranking well to conclude that validation has any effect anywhere.
St0n3y
01-14-2005, 02:28 PM
I think the big picture here is being missed. I'm not suggesting that non-validated HTML will hurt your rankings. That's like saying that not having an ALT attribute will hurt, or if you don't have ALTs and rank well that Google doesn not analyze them. It certainly can happen. What I'm suggesting is that validating HTML will will add some kind of additional importance to your site.
If I were Google and I was analyzing what to consider "authoritative" sites, I think I would consider sites that conform to other authoritative sources, in this case W3C, to be of more significant importance.
Don't get me wrong... I'm not saying I'm right and those who don't agree are wrong... just exploring the potentials here.
ThouShaltSeo
01-14-2005, 03:35 PM
Pick any "major" website like BBC, Google.com, MSN, Microsoft and all will fail validation. http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2F&charset=%28detect+automatically%29&doctype=%28detect+automatically%29
If they went by it, instead of 8 billion pages, they'd be indexing 80000 pages right now ;)
Tim Mayer from Yahoo in December on an SES panel specifically said Yahoo didn't care if a page was validated. It wasn't going to give you any type of ranking boost.
Pretty sure that will be the case with Google, as well. In fact, I've many times seen pages rank well that had bad HTML or even broken HTML. The search engines seem to make the best of what they can.
Having said this, I still think it's good to have valid HTML as much as you can. There are other good reasons for doing it. But as to which standard you want to follow, well that's another thing :)
fathom
01-14-2005, 03:52 PM
Interesting enough - there's a site that the owner thought was affected by sandbox since March.
When they added their tracking script to page bottom they accidentally deleted:
</body>
</html>
on all 200+ pages.
Will this affect ranks?
ThouShaltSeo
01-14-2005, 04:13 PM
this is weird. Tried that on a HTML page and it still loads on IE.
I would see how SE spider simulators "see" the page:
http://www.google.com/search?num=100&hl=en&lr=&c2coff=1&q=simulator+spider
Does he have a Google cache that's older than "delete" day?
Interesting enough - there's a site that the owner thought was affected by sandbox since March.
When they added their tracking script to page bottom they accidentally deleted:
</body>
</html>
on all 200+ pages.
Will this affect ranks?
St0n3y
01-14-2005, 04:18 PM
Pick any "major" website like BBC, Google.com, MSN, Microsoft and all will fail validation. http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2F&charset=%28detect+automatically%29&doctype=%28detect+automatically%29
If they went by it, instead of 8 billion pages, they'd be indexing 80000 pages right now ;)
You've got to be kidding me, right? Nobody here it talking about getting thrown out of Google if your code doesn't validate. Lets expand the mindset a bit and get past not validated equals (or does not equal) any of the following: indexing, inclusion, top rankings. We already know, and has already been stated here, that non-validated pages can perform. What about the subtle nuances that Google considers and give or take away points for? Couldn't this be one? couldn't it have an effect, not on ranking well for lesser competitive sites but against authoritative sites?
I do find it interesting that Google does not validate, but then we also know that Google does not follow its own rules and recommendations. (i.e. "Google" is not a descriptive title and no relevant content is on the page... both things they suggest for proper sites.)
When they added their tracking script to page bottom they accidentally deleted:
</body>
</html>
on all 200+ pages.
Will this affect ranks?
I honestly don't think it would, but I would still suggest that get fixed.
ThouShaltSeo
01-14-2005, 04:24 PM
The reason I use the main pages as example is because they do very well in the SERPS in my opinion. Unless you have identical pages (other then the code) with the same off page things etc., it would very hard to test that. It would be interesting though. I still would bet that as long as the bot sees the page, it doesn't matter. Knowing how to code perfectly or willing to spend extra time to do it, shouldn't be a criteria and I doubt G uses it.
You've got to be kidding me, right? Nobody here it talking about getting thrown out of Google if your code doesn't validate. Lets expand the mindset a bit and get past not validated equals (or does not equal) any of the following: indexing, inclusion, top rankings. We already know, and has already been stated here, that non-validated pages can perform. What about the subtle nuances that Google considers and give or take away points for? Couldn't this be one? couldn't it have an effect, not on ranking well for lesser competitive sites but against authoritative sites?
I do find it interesting that Google does not validate, but then we also know that Google does not follow its own rules and recommendations. (i.e. "Google" is not a descriptive title and no relevant content is on the page... both things they suggest for proper sites.)
I honestly don't think it would, but I would still suggest that get fixed.
Chris_D
01-15-2005, 12:45 AM
There are literally thousand of coding errors that can cause a page not to validate. Some of those 'errors' only affect the ability for certain types of users to have an issue with the page - eg no alts means a screen reader can't tell its blind user a description of your images.
Some errors are 'more fatal' than others. Generally, if a browser can't 'understand' something on the page - because the HTML is malformed - then it will ignore it.
Go to one of your live pages & try changing a few <p> tags to <p
Now try and find that paragraph on your page. That could affect your rankings if that paragraph was important.
Similar issues will also arise with badly nested tables too.
At the most basic level, validation is important because it sets a discipline of reviewing, checking and correcting your code.
You can check your code and choose not to 'fix' certain aspects that prevent the page validating. I still cater for NN4 users for a clients site by using some deprecated elements (the clients logs indicate there are still some NN4 customers!). Use of these deprecated elements prevents those pages from validating - but at least I know thats the only reason WHY the page won't vaildate. I can just 'ignore' those errors - I know exactly what caused them.
At the most basic level - that's why validation is important - it's a tools for the review of your work, and alerts you to check and correct any errors in your code.
And malformed <p tags WILL affect your rankings!
:)
Dave Hawley
01-15-2005, 07:29 AM
IMO, SEs could not care less about "validation" or "good code". Their business is relevance and to mix the 2 would be a HUGE mistake.
Does a user care if some tags aren't closed etc? No of couse not, just so long as they can see the page.
Chris_D
01-15-2005, 08:17 AM
Dave Hawley wrote:
Does a user care if some tags aren't closed etc? No of couse not, just so long as they can see the page
Dave - they can 'see' the page - but maybe they can't 'see' all the content. I thought the example I gave would provide a good demonstration. Did you test it Dave?
Go to one of your live pages & try changing a few <p> tags to <p
Now try and find that paragraph on your page.
Dave Hawley
01-15-2005, 08:40 AM
I see what you are saying, but this would likely be a problem seen by the Webmaster before any SE. Also, while the user cannot see it, I'm not so sure the spider can't and hence have any adverse ranking effect. However, I agree that some "bad coding" could 'hold you back'.
If 'valid' HTML does have any neutralizing effect, I don't believe it would be a conscience effect created by the SEs.
strategicrankings
01-15-2005, 11:12 AM
Along with the seo plans i propose to prospect, comes html validation and clean up.
i can add more questions to the initial one such as "does loading time of a page influence the spider's behaviour on a site?", the answer to this question (and fortunately not only this answer) will help us decide whether we should go for valid html (which proves to parse quicker) or not. And if consequently as a professional i cannot provide valid html pages to my clients as deliverable then my ability to deliver "well" optimized on page content (which depend on the html too) as well as my ability to have the pages rank well may be challenged with respect to the valid html issue.
In other industries there are standards which the professional use as reference to validate their work and IMHO i don;'t think it would be bad if we as SEO can be references to our clients as far as standards are concerned.
mcanerin
01-15-2005, 03:02 PM
Why does MS use tags within its HTML that can only be read by IE?
For what it's worth, the reason MS writes all that extra code in it's Word HTML documents is to support what they call round-tripping - the ability to save a doc as HTML and yet then bring it back into Word and use it as a Word document. http://support.microsoft.com/default.aspx?scid=kb;en-us;219214
As for compliant HTML - I recommend all my clients get as close as practical to it. Note I'm not saying 100% - though I recommend it where possible.
Why? Because I believe in Practical SEO - it's not the theory, it's what actually works that matters.
Lets put it this way - I have never seen a website that was re-written in compliant code perform worse in the SERPS. Never.
I have seen many sites perform better, however. The worst case scenario was no change.
I don't think the reason in bonus points, though (personal opinion). I believe the reason is, and I know this from the process of making my own website compliant, when you are making your website compliant you are totally focused on the website structure - you actually use things like header tags instead of just making a font bigger and bolder, you no longer have multiple nested tables, you actually remember to use unique titles and metatag information.
You use Alt tags - not in a spammy way, but the right way. Your code gets cleaner and faster because you use CSS and get rid of bulky, useless stuff. because you use CSS, you focus on how your page is organized, rather than using a WYSIWYG interface to drag things around and drop it in. Your pages load and display faster. There are no broken paragraph or other critical tags.
All of these things help your website present itself to a search engine as best it can. They eliminate dumb mistakes and sloppy code.
But the point is that a talented handcoder probably would do all that anyway without validation. The first sites I ever made were completly compliant and I never even heard of the W3C at the time. Heck, I didn't even know you could manipulate search engine rankings at the time.
But I cared deeply about doing the job right, and since my main browser was lynx run through a SunOS telnet session, I was very interested in proper coding technique.
Once WYSIWYG came along, people (and software) got sloppy. I think that making sure you have compliant code is an excellent counterpoint to using WYSIWYG technology (which I love, since it's faster and I never have enough time anymore).
Do I think you get a rankings boost because your site is compliant? No.
Do I think you get a rankings boost because your site is fast, well thought out, tightly coded and uses proper document structure? Yes. Not because it's validated, but because there is no guesswork required on the part of the SE, and no impediments to it's indexing and organizing.
You don't need to be compliant to do that, but it's a pretty darn good way to accomplish it.
Ian
DianeV
01-16-2005, 10:41 AM
Let's look at it this way: compliant code is actually correct code, which then ensures not only that browsers don't have to guess at how to display it but that they don't have to overcome errors in displaying it as intended.
I'm not sure that we can entirely assume that because browsers are built to overlook HTML errors, search engine spiders are too. I recall in the old days (1998-ish) that we were cautioned to ensure that code was correct, all TDs, TRs and table tags closed, no strings of & nbsp ; that we were told would "choke" a spider, etc.
Nowadays, whether coding with or without tables, we can pretty much ensure that HTML is much lighter. Ensuring that it is also correct is important. You never know whether quirks mode will go away. <grin>
As I read the W3C pages they are recommendations only, not the law ;)
Is a page which is missing a DTD but which has great content for baseball trading cards a more relevant page for the searcher for baseball cards than the same page with the DTD?
Certainly not so far as the searcher is concerned, he won't know whether there is a DTD there or not, so why should Google complicate its algorithm to introduce a factor which does not help the relevancy as seen by the searcher?
I agree that it is good common sense to validate your pages to insure that you are not missing something that will make it difficult for the search engines to parse your page, but I simply refuse to add a dozen empty alt texts to spacer gifs on every page in order to be "compliant".
Google for instance has built into thier spiders the ability to parse noncompliant code and thier primary objective is relevancy, not compliance with recommendations.
St0n3y
01-16-2005, 08:05 PM
I think to assume that Google is only concerned with relevance is a mistake. If relevance was the only goal, then Google wouldn't necessarily be trying too hard to rid their results of spammy stuff. Much spam: cloaking, redirecting, hidden text, etc. is still highly relevant to any given query.
Another example is giving better rankings to "authoritative" sites. Just because a site is an authority, does not necessarily make it more relevant. For me, the corner hardware store is more relevant than the "authoritative" Home Depot for a whole variety of reasons.
I think Google wants all their results to be relevant, but that doesn't mean that they are the MOST relevant. This, opens up the door to analyze a whole host of on-the-page factors that don't necessarily have anything to do with a site's relevance to a query.
Taken into consideration that there is probably little difference in relevance between any site listed in the first page or so, looking at other factors as a means to order those sites makes sense.
DianeV
01-16-2005, 08:05 PM
insure that you are not missing something that will make it difficult for the search engines to parse your page
That's about as far as I take it for the purposes of this discussion: just make sure SEs can read your code.
Whether anyone wants to write table-less semantic validated XHTML is not what I meant. While I personally like the idea, I have not yet seen that it, in and of itself, makes anything rank higher. It may be that the Web Standards-type designers found that their sites ranked higher, but then we'd have to see how they were designing beforehand, wouldn't we? :)
mcanerin
01-16-2005, 09:33 PM
And here is another thought - isn't the goal of HTML and current thinking on the direction of the web to separate content from structure?
Wouldn't focusing on structure to the detriment of content go against the idea of relevancy by definition?
Being compliant is a really good idea. I strongly recommend it. It minimizes mistakes and usually makes your site better and easier to use. It also shows professionalism.
But as long as a search engine can read it with no problems, I don't think compliance is the issue. Just make sure the SE can read it with no problems.
Ian
St0n3y
01-16-2005, 10:37 PM
Again: I think to assume that Google is only concerned with relevance is a mistake.
Is it safe to say that relevance and importance are two different things? I say they are, and that Google tries to apply both.
PageRank Technology: PageRank performs an objective measurement of the importance of web pages by solving an equation of more than 500 million variables and 2 billion terms.
Google tries to extract the relevance of a site by reading the link text pointing to it, but links alone, as stated by Google themselves, measure a site's importance.
On the validation issue, I don't think that sites must validate in order to rank, I'm simply suggesting that its very plausible for this to matter some.
Wouldn't focusing on structure to the detriment of content go against the idea of relevancy by definition?
You may be 100% correct, but to quote from Google themselves:
# Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link.
# Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages.
# Check for broken links and correct HTML.
# If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small.
# Keep the links on a given page to a reasonable number (fewer than 100).
Each of those applies to structure, NOT relevance.
fathom
01-16-2005, 10:46 PM
Why shouldn't you develop you and clients [SEO clients] to be compliant?
I have never heard a good reason for this?
People 'love' those little banners on their websites that says "I comply with" - you know... BBB, TRUSTe, SEO Code of Ethics, SEMPO, Safe Shopping, WebAssured, Family Friendly, Industry specific credibilities, integrities, and authorities, etc.
W3C is a recognized authority that you can use as a USP - it most definitively counts.
What do you tell a potential client that asks about this [because they are comparing SEO firms] and "I" specificially tell them with 'ME' their website 'will be compliant'...
In your every day life - would you be more likely to buy or outsource through a company that tells 'we comply with standards' [proposed or otherwise] or go with one that that says "trust me - standards are a waste of time and do nothing for you".
Dave Hawley
01-16-2005, 11:21 PM
W3C is a recognized authority that you can use as a USP - it most definitively countsBut does it count toward Google ranking the page better? I say not.
Also, as far as I can tell, the W3C image that you say "is a recognized authority" is not so IMO. I say this as many sites that display this, simply do not comply. One only needs to click the image to see this in many cases.
However, it has ensured they get 1000's of links to their site, which seems to be their driving factor as apposed to quality.
fathom
01-17-2005, 12:12 AM
No direct ranking help what-so-ever - I completely understand and fully agree.
That however isn't a very good reason to not be compliant... or is it?
mcanerin
01-17-2005, 12:40 AM
I would argue that a professional should try to put out the best product or service they can under the circumstances given.
Ian
Dave Hawley
01-17-2005, 01:18 AM
I fully agree that a job worth doing is worth doing well. No argument there at all.
However, I do not believe for a moment that displaying a W3C compliant logo means a single thing toward quality. They are there for all/anyone to grab and hence mean zip IMO. Like I say, it's more a marketing ploy by W3C than any indication of quality.
Chris_D
01-17-2005, 02:44 AM
Mel said:
As I read the W3C pages they are recommendations only, not the law
That actually depends upon the country Mel. Here in Australia, under the Disability Discrimination Act 1992, Website accessibility is the 'law'. The Sydney Olympic Organising Committee discovered it was the 'law' to have an accessible website back in 2000 in a landmark legal action. Conformance to W3C 'guidelines' was a benchmark in the process.
An article I wrote providing an overview of the accessibility issues we face in Australia, with much more info is here http://www.cogentis.com.au/website-accessibility-issues.html
Generally speaking, conformance to accessibility legislation is generally accepted to be based upon conformance to the WCAG guidelines. http://www.w3.org/TR/WAI-WEBCONTENT/
Part of that compliance is addressed in the WCAG Appendix A, which recommends that you validate both syntax (e.g, HTML, XML, XHTML etc.) and style sheets (e.g CSS). It also suggests:
Begin using validation methods at the earliest stages of development. Accessibility issues identified early are easier to correct and avoid.
Accessibility and validation are not just good business practice - its good business. Its not just about whether Google cares or whether it improves your rank. If you can't 'see' your site using a text based browser like Lynx - odds are Google can't see it either.
Thats why Google recommends that you test your site using Lynx.
http://www.google.com/intl/en/webmasters/guidelines.html
As I demonstated earlier in this thread - with the <p example - malformed html can affect your rankings. On large sites - small errors can go undetected in the Dev process. W3C validation is a free online tool for testing and detecting errors your code syntax.
Check it by hand - or use a free automated tool to check it. My point is check it.
Dave Hawley
01-17-2005, 03:06 AM
Chris, on the page you have linked to: http://www.cogentis.com.au/website-accessibility-issues.html you choose to display the W3C CSS logo. Yet upon clicking it, errors are shown. Don't you find it odd that 'standards' which are supposed to stand for "quality" allows anyone anywhere to use and show their logo? IMO, it makes a joke of the whole thing.
Anthony Parsons
01-17-2005, 10:52 AM
My two cents. Validated code makes no difference towards rankings IMO. What it does do is ensure that if you convert that visitor from the SERP, you may possibly keep them upon your site or get a sale, as it renders correctly through all recent browsers (go with the odds) as it is web compliant, thus will look the same in IE, FF, NS, etc.
Whether your code is HTML or XHTML has nothing to do with ranking either IMO and from my previous experiments with this type of thing. That is purely a web related matter in which best suits your situation and what you want to do with your site.
Me? Web standard or not? Definately web standard everytime for the above reasons. We see it on a daily basis, what we can get away with in coding for IE, you just can't get away with it in firefox or NS, thus your site looks like crap, thus the visitor immediately turns away from your site if surfed in another browser, which lets face it, more and more are moving to firefox. Its not the new IE just yet, but its promising, so web standard is only going to help in this instance.
Also, something that always sticks in my mind from something GG said once before, "the less you make the SE's think, the better chances you have", or words to that effect anyway. Why make them think about coding errors?
dstew
01-17-2005, 05:57 PM
There have been a few posts where Stoney has said something to the effect that it would be a mistake to assume that Google only is concerned with relevance. What else is there that matters to a searcher, but relevance? That said, there are many things that go into what makes a page relevant, as we know.
Having validated HTML only comes into play if it trips up the spider. Are we to assume that even though browsers have made strong strides in compensating for poor or irrelevant code, that search engines have not been able to accomplish the same thing? Hogwash. I would even say that if I.E. renders the page properly, then Google, Yahoo and MSN can read the page's content properly.
People get too caught up in trying to make SEO difficult. They think that if they aren't ranking well, that there must be some incredibly complicated secret that they're missing. I made that mistake back in 1997. I tried everything I could think of, and made it way too difficult. I'm sorry, but this is not brain surgery. It's simply knowledge, experience, and the ability to apply those.
St0n3y
01-17-2005, 06:34 PM
There have been a few posts where Stoney has said something to the effect that it would be a mistake to assume that Google only is concerned with relevance. What else is there that matters to a searcher, but relevance?
dstew, I appreciate the response, however quotes pulled directly from Google still remain unanswered by any who think that that I'm coming out of left field here. You are right that to the searcher, it is relevance that matters, but it is clear that Google looks at more than simple relevance. All relevance being equal, Google must use something else to determine ranking order. Unless, relevance can be measured down to the nth degree. But we have Google themselves saying that PR is a measure of importance not relevance.
Validation aside, I don't think that the search spiders are as forgiving as browsers. I've seen pages that render good in a browser but with a lynx viewer, totally different.
SEO isn't brain surgery, but anyone would admit that its a lot different today than it was several years ago. With each change in search algorithms, SEO does, in fact, get more and more complex. I wouldn't suggest that simply validating code is going to magically make your rankings better. Probably wont' do anything at all, but based on what Google has published, I believe it is a small piece of a very large puzzle. Probably more important than a Keyword Meta tag but many SEOs still use those.
You said it best here: It's simply knowledge, experience, and the ability to apply those.
dstew
01-17-2005, 06:47 PM
I would submit that what most refer to as PageRank (PR is already outdated and shouldn't be relied upon), is part of what Google considers makes a page relevant. A page's relevance is not limited to on-page elements.
mcanerin
01-17-2005, 07:15 PM
Actually, this is how I've been telling people how to look at it.
There are major 2 parts to the presentation of a SERP based on a query:
1. First, the SE finds the sites that are relevant. If the search is for "frog" then ALL pages that mention frog or are linked to with frog in the text or who are part of the measurable frog universe are relevant. In Yahoo, this also includes sites with frog in the keyword metatag but nowhere else.
2. Next, those relevant sites are sorted by importance (authority). In this way, that mention of frog in the keyword metatag I referred to earlier is relegated to the bottom of the pile. This is accomplished by analyzing content and links, typically.
It's kind of like grabbing all the books in the library about your subject, then going through them and deciding which ones you are going to bother checking out and carrying home for your report. Two steps.
It can be relevant but not authoritative - meaning: it's on topic but not useful.
It can be authoritative but not relevant - IBM is an authority site, but not for frogs...
Accordingly, when you SEO a site you first make it relevant (content and link text, typically) and then make it authoritative (links). This will get you first into the SERP, then to the top of the SERP (hopefully).
I'm not sure if this has any bearing on W3C compliance, but that's the difference between relevance and authority, IMO.
My 2 cents,
Ian
St0n3y
01-17-2005, 09:10 PM
mcanerin, I believe you have summed up pretty much what I've been saying all along. I would take it one step further, however, and state that valid HTML is likely a factor that goes to a site's importance, but definately not its relevance.
I believe this because Google has gone out of its way to state that good html, along with the other factors quoted above, are important enough to consider when putting together your site. It may not be worth much, and maybe nothing at all, but I don't think Google's guidelines should be ignored, if rankings on Google is a goal. Yes, yes, rankings can be achieved without, and so can they with many many other factors that SEOs swear by. Again, its all just pieces of the puzzle.
Dave Hawley
01-17-2005, 09:26 PM
I would not be at all suprised if Google, Yahoo, MSN etc have a small army of super geeks ensuring their spiders can read and render even very sloppy coding. After all, it's in their best interest, so why wouldn't they?
Google mission is to "organize the world's information and make it universally accessible and useful." That pretty much answers the question IMO.
St0n3y
01-17-2005, 09:48 PM
I would not be at all suprised if Google, Yahoo, MSN etc have a small army of super geeks ensuring their spiders can read and render even very sloppy coding. After all, it's in their best interest, so why wouldn't they?
Google mission is to "organize the world's information and make it universally accessible and useful."
Nothing here about how its organized. Retrieving information is one thing, how its organized is another.
Anthony Parsons
01-17-2005, 09:51 PM
Yahoo is renouned for issue's spidering sloppy code. If everything is closed, or only a basic issue, it gets through, but present it with something a little less favourable, and it tends to stop following links and so forth as it can't get past certain aspects due to so many tags not being closed and so forth.
Anthony Parsons
01-17-2005, 09:53 PM
When people present to me a site that Yahoo will only index the first page, or limited pages, the code is generally the first place I look. Begin tidying it up a little, closing tags and so forth, and Yahoo tends to start indexing more of its pages.
Dave Hawley
01-17-2005, 09:59 PM
Nothing here about how its organized. Retrieving information is one thing, how its organized is another."Organized" by defintion means that it's in some logical order, otherwise it's call disorganized.
It's patently clear and common sense, IMO, that this information will be organized based on relevancy stemming from the search query. That's their job and exactly what a SE does.
St0n3y
01-17-2005, 11:11 PM
"Organized" by defintion means that it's in some logical order, otherwise it's call disorganized.
Nobody is arguing that. Your assumption, however, is that organization of search results is what YOU want it to be. As pointed out by myself and others in this thread (mcanerin), Google does not organize by relevance alone. It may be common sense to do so, but that is not what happens, at least in Google's case.
Dave Hawley
01-18-2005, 12:02 AM
Google does not organize by relevance aloneI dissagree. There might be thousands of other factors for other areas, but come time to 'dish up' the SERPs, it is all based on what Google's algo perceives as relevant to the search term. There is importance (authority), links etc that come into play, but these too come under the main heading of relevance.
Googles mission is to "organize the world's information and make it universally accessible and useful." With the statement in mind, do you honestly think they are going to rely on these billions of pages coded by millions of different people to be 'perfect', or even close to it? Of course not and they will have to adapt to all sorts of coding errors to come even close to their mission. They are even taking on the task of paper books and I'm quite sure they are not written in anything close to HTML.
fathom
01-18-2005, 12:24 AM
I fully agree that a job worth doing is worth doing well. No argument there at all.
However, I do not believe for a moment that displaying a W3C compliant logo means a single thing toward quality. They are there for all/anyone to grab and hence mean zip IMO. Like I say, it's more a marketing ploy by W3C than any indication of quality.
Which isn't any different to BBB, TRUSTe and a raft load of others... there are 'terms of use' in each case.
In the case of validation Icon it is mechanism to 're-validate'.
I would also agree 'it is marketing' [call it ploy if you will]... "just like BBB".
"Ploy" however isn't "scam".
Dave Hawley
01-18-2005, 12:59 AM
"Ploy" however isn't "scam".Who are you quoting as saying it's a "scam"? Or have I misunderstood?
fathom
01-18-2005, 01:43 AM
Who are you quoting as saying it's a "scam"? Or have I misunderstood?
Marketing is often a ploy... [I said] to your statement that W3C compliancy is a marketing ploy for W3C. "I agree with you".
I also said - this doesn't make 'the ploy' a scam.
Dave Hawley
01-18-2005, 01:47 AM
I also said - this doesn't make 'the ploy' a scam.But you said "scam" with quotations, I just wondering who it was you were quoting? It seems like you are under the impression I said a "ploy" was a "scam".
mcanerin
01-18-2005, 02:14 AM
I don't think the W3C's intentions behind their validation tool and logo have are a deciding factor in whether or not your rankings with a search engine are influenced by having compliant code, and so far no one has made that claim, which means we are getting off topic.
Another thread can always be started if necessary to discuss side issues. <gentle nudge>
Ian
fathom
01-18-2005, 02:15 AM
But you said "scam" with quotations, I just wondering who it was you were quoting? It seems like you are under the impression I said a "ploy" was a "scam".
Sorry - didn't mean to imply "you said" scam.
Chris Boggs
01-18-2005, 09:32 AM
There are major 2 parts to the presentation of a SERP based on a query:
1. First, the SE finds the sites that are relevant. If the search is for "frog" then ALL pages that mention frog or are linked to with frog in the text or who are part of the measurable frog universe are relevant. In Yahoo, this also includes sites with frog in the keyword metatag but nowhere else.
2. Next, those relevant sites are sorted by importance (authority). In this way, that mention of frog in the keyword metatag I referred to earlier is relegated to the bottom of the pile. This is accomplished by analyzing content and links, typically.
Ian
Thanks McCanerin for breaking it down like that to the most basic steps. I would definitely agree that step 2 is where the majority of ranking takes place, however, isn't it possible that bad code could cause a site with "frog" somewhere deep within to not be counted in the first step? This would disagree with Stoney's idea that importance outweighs relevance...
I would take it one step further, however, and state that valid HTML is likely a factor that goes to a site's importance, but definately not its relevance.
St0n3y
01-18-2005, 11:10 AM
This would disagree with Stoney's idea that importance outweighs relevance...
Hold on now, I don't believe I ever said that importance outweighs relevance. I'm simply suggesting that importance is a separate factor outside of relevance and that valid HTML can be a part of the "importance" factor.
But good question!
mcanerin
01-18-2005, 03:28 PM
Agreed - if bad coding prevents you from getting into the relevance stage, then all the SEO in the world will not help you in the authority stage.
On the other hand, so far the spiders seem to index sites that do not comply strictly pretty well - I expect there are a LOT of sites that do not validate but are still perfectly accessable to spiders.
Validation makes sure of that. But does not cause it. For example, I did not change my company website over to being HTML 4 Strict + CSS due to my rankings, (which were and are fine) but rather due to the fact that it was professional and responsible to do so.
There is a difference between "valid code" and "validation by the W3C"
But I still recommend that anyone calling themselves a professional try to follow best practices wherever possible. Regardless of their field or industry.
So I would say that HTML validation is a strong recommendation, not a requirement, but workable and readable code is a requirement.
Ian
Dave Hawley
01-19-2005, 01:33 AM
There are major 2 parts to the presentation of a SERP based on a query:
1. First, the SE finds the sites that are relevant. If the search is for "frog" then ALL pages that mention frog or are linked to with frog in the text or who are part of the measurable frog universe are relevant. In Yahoo, this also includes sites with frog in the keyword metatag but nowhere else.
2. Next, those relevant sites are sorted by importance (authority). In this way, that mention of frog in the keyword metatag I referred to earlier is relegated to the bottom of the pile. This is accomplished by analyzing content and links, typically.To me that suggests the "relavant pages" are sorted by importance only. That cannot be the case. Or have I missed something?
if bad coding prevents you from getting into the relevance stageAnd that is the question i.e "if". I still hold that a 50 billion $ SE company whos mission is to "organize the worlds data and make it universally accessible" is not going to allow coding errors (unless they are extreme) get in their way.
Chris
Thanks McCanerin for breaking it down like that to the most basic steps. I would definitely agree that step 2 is where the majority of ranking takes place, however, isn't it possible that bad code could cause a site with "frog" somewhere deep within to not be counted in the first step? This would disagree with Stoney's idea that importance outweighs relevance...
I really don't think this is quite accurate, since no search engine that I know of includes every page which somehow contains the word into the ranking pool.
As google was basically organized (and I doubt that the basic mechanics have changed much) only the top N pages in the relevant inverted barrels are included in the initial ranking pool.
This implies that the pages within the inverted barrels are already sorted in some order and thus a very important part of the ranking is done in the first step, since if you do not make it into the top N pages in the inverted barrels you are never going to rank for that search term.
Google says about parsing code:
Parsing -- Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone's imagination to come up with equally creative ones
From which it seems to me that Google has the ability built into the system to deal with a large number of code "errors" .
Chris Boggs
01-19-2005, 10:38 AM
As google was basically organized (and I doubt that the basic mechanics have changed much) only the top N pages in the relevant inverted barrels are included in the initial ranking pool.
This implies that the pages within the inverted barrels are already sorted in some order and thus a very important part of the ranking is done in the first step, since if you do not make it into the top N pages in the inverted barrels you are never going to rank for that search term.
I agree, but I think that the point of contention is whether HTML could possibly have a factor in the site making it into the barrel. That is certainly the "first cut," but separate completely IMO from the actual ranking that occurs in Ian's "Step 2." This is what I meant by agreeing with Ian in my quote...sorry to be unclear.
From which it seems to me that Google has the ability built into the system to deal with a large number of code "errors" .
Yet Google still takes pride in being a Ferrari, does bad HTML pose a rocky road potentially avoided by some drivers/spiders?
dstew
01-19-2005, 11:57 AM
... I think that the point of contention is whether HTML could possibly have a factor in the site making it into the barrel...
I think the point of contention is either whether a search engine has the ability to cope with poor HTML, or whether a search engine will somehow see HTML and give some sort of penalty for poor HTML in the ranking algo. I think that either of these lines of thinking is wrong. I think that search engines have the ability to understand virtually anything that IE can parse, and that it just make any sense to penalize a page for poor HTML.
I haven't laid this out, but here's my thinking...the goal of a search engine is to attract the most traffic, so it can sell the most ads/bids/paid results. If a site is going to render well in 97% of the browsers, and it's relevant, why weed it out? It doesn't make any sense.
Yet Google still takes pride in being a Ferrari, does bad HTML pose a rocky road potentially avoided by some drivers/spiders?
Back to the reasoning, does Farrari care what road you drive you car on after you've bought it? Nope.
fathom
01-19-2005, 12:48 PM
Back to the reasoning, does Farrari care what road you drive you car on after you've bought it? Nope.
Ya - But you do... 'after you get the mechanic's bill'! :eek:
Chris Boggs
01-19-2005, 12:57 PM
I agree that it is good common sense to validate your pages to insure that you are not missing something that will make it difficult for the search engines to parse your page, but I simply refuse to add a dozen empty alt texts to spacer gifs on every page in order to be "compliant".
funny i just analyzed a site with what I thought to be hidden text that turns out to be these "spacer gifs."
Probably a whole other thread we'll see depending on the responses to this question but are "spacer gifs" a black-hat way of getting more hidden content???
St0n3y
01-19-2005, 02:22 PM
I still hold that a 50 billion $ SE company who's mission is to "organize the worlds data and make it universally accessible" is not going to allow coding errors (unless they are extreme) get in their way.
I don't think its a matter of getting in the way. See it as a measure of quality. If a site if full of forgivable errors but still needless errors, what does that say about the site itself. Maybe the content is relevant but if the site owner doesn't care enough about his site to fix simple coding problems, the SE may not care enough about the site to give it a good measure of importance (as opposed to relevance) over another site that does take care of these issues.
I think the point of contention is either whether a search engine has the ability to cope with poor HTML, or whether a search engine will somehow see HTML and give some sort of penalty for poor HTML in the ranking algo.
Nobody here is talking about being penalized for non-validating code. I don't think that has once been asserted. Penalties against are completely different than additional points for. Google can give additional points for validated code (if it chooses) without leveraging any type of penalty against those that don't. The point of contention in this thread is whether or not Google does or does not give additional points of importance to sites that do have validated code. I don't know the definitive answer to that, but it would certainly make sense to do so.
Relevance AND importance of a site are both factors in how they rank. I don't think you can rank well on importance alone (while you can on relevance alone) but in competitive industries I think you need the advantage of both. Could valid html be a piece of the importance measure? Sure. If a site owner cares enough to follow the guidelines set by the authorities (in this case W3C) that can tell the engine a great deal about the site and the lengths to which the business will go to have a more professional presentation. That, in my mind, is important.
dstew
01-19-2005, 02:37 PM
Penalty was a poor choice of words. However, I can't see where it makes sense to give points for every single page with perfect HTML. And, if that's the case, isn't that a pently (of sorts) by default for pages without perfect HTML?
A search engine cares about what the searcher sees when they click on a link, and very few searchers know HTML, and would also go a step further to look at the HTML for a broken tag. Sooooo, if it isn't going to make any difference to the searcher, why would it make any difference to the one providing those results?
St0n3y
01-19-2005, 04:09 PM
Penalty was a poor choice of words. However, I can't see where it makes sense to give points for every single page with perfect HTML. And, if that's the case, isn't that a penalty (of sorts) by default for pages without perfect HTML?
No, not at all. If one of my kids helps me with some difficult chores and I reward her with ice cream, am I penalizing the other kids who didn't help? No. This is pretty much the system that algorithms work from. Points are assigned when pages do or don't do certain things and points are taken way (penalized) for pages that do or don't do certain other things. If I get extra points because I have a better title tag than someone else, is that someone else getting penalized? Not at all.
A search engine cares about what the searcher sees when they click on a link
How many incoming external links does the visitor see when the visit a site? None. Does this mean that incoming external links are not a factor? Again, no. The number of incoming external links makes little or no difference to the visitor, but is a big part of how sites are analyzed. There is a lot more that goes on than meets the eye, especially where search engine algorithms are concerned.
dstew
01-19-2005, 04:25 PM
Of course there is more than what meets the eye...yes, links matter...no the searcher doesn't care about the links. The search engine uses links, and many other things to help determine ranking.
All I'm saying is that a broken tag that doesn't keep a page from rendering properly is not going to matter to the reader, therefore it's not going to matter to the search engine. Links, and other methods of determining importantce, authority, relevance, etc. DO make a difference to the searcher's experience (not that they would know it), and therefore matter to the search engine. If it doesn't effect the searcher's experience, either in the positive or negative, it's not going to matter a lick to a search engine.
St0n3y
01-19-2005, 04:33 PM
If it doesn't effect the searcher's experience, either in the positive or negative, it's not going to matter a lick to a search engine.
That's where I would disagree. Your own argument is self-defeating because links DO NOT matter to the searcher's experience. Links only matter because the search engine makes them matter, and if the content is relevant, it makes no difference to the searcher if a site has 10 links or 1000 links.
Dave Hawley
01-19-2005, 08:59 PM
don't think its a matter of getting in the way. See it as a measure of quality. If a site if full of forgivable errors but still needless errors, what does that say about the site itself. Maybe the content is relevant but if the site owner doesn't care enough about his site to fix simple coding problems, the SE may not care enough about the site to give it a good measure of importance (as opposed to relevance) over another site that does take care of these issuesBut it's the end user that Google is focused on. If the page is seen, it gives up it's information and that is ALL the user cares about. One could say the same about a lime green page with pink dots, that is "the owner doesn't care". I will concede that in the event of a tie (although I think that unlikely) the code quality may come into it.
Does a phone book care if your place of business meets local building guidelines? Does it care if the front door is hanging off its hinges? Does it care if you work from home, or have top floor office space with sweeping views? No, why? because the potential customer should always retain the right to decide for themsleves.
IMO the holy grail of search engines is relevancy and while search engines may use many ways of determining relevancy, the bottom line is still relevancy to the searcher. Even importance is only used in order to better judge relevancy.
If two pages have more or less identical content and inbound links, but one has not added quoted empty tags to thier spacer gifs and one has so they can be compliant, it simply makes no sense at all that the second should be more relevant according to the algorithm because it has added some useless code that the viewer will never know or care about.
I can see no common sense reasons or benefits to the search engines or the users to award points for having compliant code, but just suppose that they did, how would they determine that a page was compliant or not?
Run 8 billion pages every two weeks through a validation checker? I really doubt that any possible benefits would outweigh the cost of doing so.
Use the W3C logo as a check? Not likely as that is easily put on any page that anyone wants.
It may be a nice sounding idea, but unless and until someone can show that its actually being done, I'm not buying it.
dstew
01-20-2005, 11:20 AM
What Mel said
St0n3y
01-20-2005, 12:02 PM
Does a phone book care if your place of business meets local building guidelines? Does it care if the front door is hanging off its hinges? Does it care if you work from home, or have top floor office space with sweeping views?
No, but lets also keep in mind that the phone book does not care if your business is in some remote region or in a main street. They don't care how many other people think you're ad is relevant or "link" to you. The phone book will put in anybody who pays. I understand what you're getting at but that's not a workable analagy.
I think you and Mel raised some good points, but reading back through the thread there has been no real answer to why Google would go out of its way to provide certain guidlines if, as you contest, Google simply doesn't care about such things. In reading Google's own information, it would seem that they do.
IMO it may be a big mistake to take the Google guidelines as webmaster gospel. A few examples:
Google says no hidden text but how much of it do you find that is ranking well.
Google says don't cloak, but some of the highest ranking sites on the web cloak.
Google says don't use automated ranking checkers and even mentions WPG, but WPG has sold millions of copies of that program. What are they using them for if not to check rankings?
I read Googles guidelines not as content put there to help webmasters and SEOs decipher Googles algo, or to help them rank better, but as Public Relations releases well crafted to create a certain mindset in the the target audience.
This being the case I really don't think that you can broadly interpret the public information on Googles pages to support the concept that they do this or that in their ranking algo.
Dave Hawley
01-20-2005, 10:49 PM
there has been no real answer to why Google would go out of its way to provide certain guidlines..I don't think they have "gone out of its way". It's only some text on page that has always been there.
Google is just to smart to expect the world to read that small piece of text and comply. In fact they simply could not afford to rely on millions of different Webmasters to ensure that billions of pages are 'conforming' to their statement. They would have to solve the issue themsleves just as IE and many other Browsers have and continue to do.
Google use to have problems with different types of links, dymanic pages etc etc. They didn't/haven't relied on the World to change, they took it upon themselves to adapt as would any business that wants to stay in business. Now they have the largest database of free information in the World and have to to continue to show users the most relavant pages to their search query.
Chris Boggs
01-21-2005, 10:09 AM
It looks like "the nays have it." Other than a few valiant souls standing up from the "purity and sanctity of the Internet" camp, it seems that HTML is not and will not soon be a factor in ranking relevancy.
IMO, this is unfortunate, because I am of the opinion that without order there could be anarchy. Isn't verification the only "order" system there is??
DianeV
01-21-2005, 11:11 AM
Well, let's break this down into pieces: correct/incorrect HTML, and what search engines do about it.
Sure, many/most browsers are built to display incorrectly coded HTML, particularly in quirks mode as dictated by the DTD or lack thereof. Does this mean that broken HTML always displays correctly or even reasonably? I think the answer here can only be no.
When browsers attempt to display mis-coded HTML in a reasonable fashion, they are are guessing ... and programs can only do what they've been programmed to do. So we cannot even count as fact that a page that displays as intended has correctly formed HTML.
In fact, given the divergence, small or otherwise, in displays of various browsers, the only real way to control how browsers read and display code is to code correctly, and then perhaps to add hacks for various browser weirdnesses.
The easiest way to determine whether the code is correct is to use a validator. Whether you do is not the issue; I'm just saying that it's the easiest way without reviewing source code of every page of a website (and relying on a program to correct errors it allowed in the first place does not equate to having ensured correct coding). Whether you choose to implement all recommendations of the validator is your choice; some of the recs are pretty picky, IMO.
Now, with respect to search engines, the fact that browsers make allowance for errors, and that it appears that search engines also make allowance for errors, does not equate to all search engines making allowance for all errors.
Lastly, whether correct coding, or even semantic coding, imply relevancy is another question altogether.
Bottom line for all three: it's important not to confuse what "makes sense" with what actually exists. Or the degree to which it exists.
St0n3y
01-21-2005, 01:15 PM
In fact they simply could not afford to rely on millions of different Webmasters to ensure that billions of pages are 'conforming' to their statement. They would have to solve the issue themselves just as IE and many other Browsers have and continue to do.
Again, this is a substantial overstatement of the argument. I'm not suggesting that Google is relying on anybody to validate their code. The code doesn't need to validate for Google or other engines to know what's there (so long as it still functions properly). I've just been arguing that validation is possibly a small measure in the total Google algorithm.
It looks like "the nays have it." They certainly do, by the numbers, but I still stand by my arguments. ;)
DianeV makes a good point here When browsers attempt to display mis-coded HTML in a reasonable fashion, they are are guessing
Perhaps there are no points FOR validated code but points (gasp! penalty) against making the search engine guess. Not a strong argument, I know, but just a thought.
Great conversation everybody. Its been very educational. If I got one new talking point out of this its that search engines measure both relevancy AND importance and there are different factors for each. I've already used that when explaining links to a potential client!
DianeV
01-21-2005, 01:29 PM
I'm not sure that any search engine gives a penalty for making them guess, or credits for not. Frankly, I'd never thought of it. :-)
My thought is that a page is either completely readable standard coding, or reasonably readable based on guessing, or it's not. Not providing code that search engines can read is problematic. How are they then to interpret what's on the page?
IMO the difference between code that validates and code that spiders can understand is vast.
A great many coding errors have to do with visual elements and this is something that the search engines do not bother reading even if it is perfect.
Chris_D
01-22-2005, 08:51 AM
Authors are supposed to communicate their intentions using the Web standards. Otherwise, finding out the intentions of each particular author would require psychic abilities which can’t be implemented in software. Even in cases where a human could deduce the intention, doing so in software would be very slow, bug-inducing, difficult and complicated.
http://www.mozilla.org/docs/web-developer/faq.html
Yes but here we are talking about spiders not browsers.
Chris_D
01-22-2005, 09:23 AM
No Mel. User Agents. Software.
Web user agents range from web browsers to search engine crawlers ("spiders"), as well as screen readers and braille browsers used by people with disabilities.
fathom
01-22-2005, 09:47 AM
No Mel. User Agents. Software.
ya a browser could be a spider with some simple mods (and vice versa).
No Mel. User Agents. Software.
Well Gosh you could have fooled me. And here I thought this was about:
....and the opinion of the esteemed developers in here, can Google or any other search engine be using this as a factor in rankings?
:)
Fathom I think the differences in the purpose and construction of spiders and browsers is enough to put them into different classes of Software, but I would really like to see someone convert Firefox into a spider that could say furnish results to Google at say something like a thousand pages a second???
mcanerin
01-22-2005, 01:58 PM
On that note, you might want to check out lynx (http://www.delorie.com/web/lynxview.html) ;)
Speaking of which, in practice, I personally "validate" for search engines with lynx. If it doesn't work there, I fix it - every single time, no exceptions.
If it does, then W3C validation is icing on the cake - I try to make it validate to W3C, but don't stress out if it complains about a form or flash element or the fact that I mixed a valid (but deprecated) HTML3 element into an otherwise valid HTML4 document in order to solve a problem, for example.
In addition, I also try to validate using Bobby - once again, I try to do my best, and hit as close as possible, but validation to accessability standards has a lot more "art" than "science" in it, since rather than saying "you have chosen the wrong two colors" they say "make sure these colors are acceptable to people with poor eyesight" in some cases, which makes it difficult to actually say you are completly compliant (it's opinion rather than fact), but you can get pretty close. And a spider is the equivilent of a disabled visitor.
My opinion,
Ian
It seems to me that this discussion has drifted into a discussion of spiders and browsers, neither of which IMO have much, if any, bearing on the way that search engines parse html.
Browsers clearly do not parse code the same way that search engines do, as thier job is to display the code on a user device, usually a visual device, while search engines split the code up into various files or databases in order for it to be processed later by the ranking algorithm.
A spider does not parse code at all, it is a multi-threaded device designed to go to a particular address, access whatever it finds there (pictures, word documents, pdf, html etc etc) and upload that to the search engine. To the best of my knowledge it does not display or attempt to understand what is presented to it, and is optimized to retrieve and upload to the search engine the maximum number of pages per second, period. The parsing of code is done by the search engine, as in the case of google where it is done by a program called the indexer.
If my understanding is correct it seems to me that there is little, if any, understanding to be gained by attempting to compare what browsers do to what search engines do.
Dave Hawley
01-23-2005, 12:51 AM
I've just been arguing that validation is possibly a small measure in the total Google algorithm
Well, I still hold that Google aint going to let coding errors (unless HUGE) play any part in their ranking algo simply because it's not in their best interest and it would be counter-productive.
If I read something that makes me think that they could, then I will join back in, but until then bye!