PDA

View Full Version : noindex,follow in meta


newreality
08-15-2006, 07:11 PM
<meta name="robots" content="noindex,follow">

Is intended to direct the robot to not index the given page, and yet follow links on that page.

That this could stop the indexing of other site pages -- is this just a rumor?

I have a single page I definitely do not want indexed.
(custom searches that parallel important indexed pages)

Brian M
08-15-2006, 10:09 PM
<meta name="robots" content="noindex,follow">
That this could stop the indexing of other site pages -- is this just a rumor?
I have a single page I definitely do not want indexed.
I don't know where you heard that rumor, but I see nothing in that META statement that would stop a robot from indexing other pages if it wants to. It might stop some search engine today, but there is no guarantee that it will stop all of the engines tomorrow...

If you really have a page that you do not want indexed, I would recommend that you have the server header put out a 404 (Page Not Found) on the desired page.

Even this may result in the desired page appearing in the SE (with no description, and the title from the anchor text pointing to the page), but there have been many reports about the engines ignoring the robots.txt file and META instructions, so your safest bet is to 404 the page at the server level.

Brian M

P.S. I know this is being anal, (some would say, "stupid"), but most specs show a space after the comma, so for "best practices" it is safest to include it in all cases, IMHO.

newreality
08-15-2006, 11:11 PM
P.S. I know this is being anal, (some would say, "stupid"), but most specs show a space after the comma, so for "best practices" it is safest to include it in all cases, IMHO Not really (first part).

You know I pulled that directly from here (http://www.robotstxt.org/wc/meta-user.html) and questioned that myself when I saw it. Did a view source and there is no space. So I just want to know if there should be space or not.

Are you saying then that <meta name="robots" content="noindex,nofollow"> //whichever spacing is right// would not be reliable either?

The page that this post is in reference to -- it is a job search results page.
So it needs to be viewable, yet not indexed since there are other pages having the same content.

Brian M
08-17-2006, 06:22 PM
Hi newreality,

<meta name="robots" content="noindex,nofollow"> should, in theory, keep a robot from indexing the page and from following any links on the page. Unfortunately, some robots ignore all directives, while other robots occasionally are "broken" and inadvertently ignore these directives until somebody notices and complains. In either case, it can be a very unpleasant surprise to see your "hidden" pages appear all over the web.

That's why I recommend a 404 server header on the page that you want to keep hidden. This is a server level instruction that the requested page cannot be found, and it has always worked for me in the past whereas the robot directives have failed.

P.S. There is often an exception to any rule, and in this case, there should probably NOT be a space (I always defer to the spec when in doubt), and I'm sorry if I miss-led you. I had the keywords META tag in mind when I made that comment because I once theorized (many years ago) that I could squeeze an extra keyword or two into this tag if I eliminated the space after the comma. I tried out this theory in spite of the W3C giving examples that clearly showed the space after the comma (including their own source code). The result was a disaster in one of the engines, but when I put the spaces back in, the pages magically re-appeared in their previous positions. It took almost 2 months for them to regain their positions, and that painful process has stuck in my mind.

newreality
08-17-2006, 10:57 PM
<meta name="robots" content="noindex, nofollow">

Another risk here is that if this page is crawed early, the rest of the site won't get crawled at all unless I'm in error.