Search Engine Watch
SEO News

Go Back   Search Engine Watch Forums > Search Engines & Directories > Google > Google Web Search
FAQ Members List Calendar Forum Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
Old 12-22-2004   #1
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Google Not Obeying the NoIndex NoFollow Meta Tag

Take a look at http://www.google.com/search?q=site:...le.com&num=100

See those pages listed that look like http://www.seroundtable.com/mt-comme...de=red&id=1879

Those all have the <meta name="robots" content="noindex, nofollow"> in header.

It is the built in redirect function to discourage comment spammers from spamming.

anyway, why is Google not listening to it?

Did I do something wrong that I am missing?
rustybrick is offline   Reply With Quote
Old 12-22-2004   #2
massa
Member
 
Join Date: Jun 2004
Location: home
Posts: 160
massa is just really nicemassa is just really nicemassa is just really nicemassa is just really nicemassa is just really nice
This relates to the unique class c thread but I've been trying very hard for over two years now to convince my clients and employees that this is a problem. I have a LOT of examples of no follow tags being followed anyway.

Oops. What an unfortunate series of events. Those darn glitches.
massa is offline   Reply With Quote
Old 12-22-2004   #3
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Mr. Massa,

Do you think you can expand a bit on that for me?

I am sorry, crazy morning...
rustybrick is offline   Reply With Quote
Old 12-22-2004   #4
ThouShaltSeo
Member
 
Join Date: Dec 2004
Posts: 206
ThouShaltSeo is on a distinguished road
many on another board are saying that not even the robots.txt is being obeyed. Last time that happened, G announced that they have 8 billion pages...the same day MSN announced their beta. Maybe now they need to boost it to 10 billion. When is MSN releasing their SE?

Quote:
Originally Posted by rustybrick
Take a look at http://www.google.com/search?q=site:...le.com&num=100

See those pages listed that look like http://www.seroundtable.com/mt-comme...de=red&id=1879

Those all have the <meta name="robots" content="noindex, nofollow"> in header.

It is the built in redirect function to discourage comment spammers from spamming.

anyway, why is Google not listening to it?

Did I do something wrong that I am missing?
ThouShaltSeo is offline   Reply With Quote
Old 12-22-2004   #5
powerofeyes
Member
 
Join Date: Jun 2004
Location: IN
Posts: 110
powerofeyes is on a distinguished road
Quote:
anyway, why is Google not listening to it?
Where is google not listening to it, I have not seen any of those pages indexed, those are URL only listings,

That has happened because you have the noindex,nofollow tag, What happens is google picks the URL and checks the tag and doesnt index it, It just stays as URL only, This is very common with google and pointed long back in various forums,

A typical example,,

http://www.google.com/search?q=inurl...start=300&sa=N
__________________
Search Engine Optimization - Website Promotion Services from Search Engine Genie
powerofeyes is offline   Reply With Quote
Old 12-22-2004   #6
fathom
Member
 
Join Date: Jun 2004
Location: Nova Scotia, Canada
Posts: 475
fathom is a jewel in the roughfathom is a jewel in the roughfathom is a jewel in the rough
I would think that the Googlebot specific tag would work better?
fathom is offline   Reply With Quote
Old 12-22-2004   #7
Jeff Martin
 
Jeff Martin's Avatar
 
Join Date: Jun 2004
Location: Dallas, Texas
Posts: 364
Jeff Martin is just really niceJeff Martin is just really niceJeff Martin is just really niceJeff Martin is just really nice
Quote:
Originally Posted by powerofeyes
Where is google not listening to it, I have not seen any of those pages indexed, those are URL only listings,
Powerofeyes is right on. G is not indexing those pages. The presence of a URL in the SERPS just means G knows that a page at that URL exists.
__________________
Jeff Martin - SEW Moderator
Vericlix
Jeff Martin is offline   Reply With Quote
Old 12-22-2004   #8
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Quote:
Originally Posted by fathom
I would think that the Googlebot specific tag would work better?
Well yea, but it doesn't matter.

I know of examples disallowing pages in the robots.txt but they are still being indexed.
rustybrick is offline   Reply With Quote
Old 12-22-2004   #9
rustybrick
 
rustybrick's Avatar
 
Join Date: Jun 2004
Location: New York, USA
Posts: 2,810
rustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud ofrustybrick has much to be proud of
Quote:
Originally Posted by Jeff Martin
Powerofeyes is right on. G is not indexing those pages. The presence of a URL in the SERPS just means G knows that a page at that URL exists.
I thought that when a page is found in the Google SERPs, URL or otherwise, it is also in the Google Index.

I am obviously oversimplifying.
rustybrick is offline   Reply With Quote
Old 12-23-2004   #10
zamolxes
Member
 
Join Date: Dec 2004
Posts: 23
zamolxes has a little shameless behaviour in the past
Unfortunatelly many cases confirm that google shows at least some first level pages excluded by robots.txt as URL only listings.

I know of a few pages on different sites that can prove it and anyone can check. At first I thought that might have happened because some of them might have been excluded in the robots.txt (or the robots meta tag) after being indexed.
However that is not the case - I know of 2 pages that have been excluded in the robots.txt from the very beginning - when the site was first uploaded - and they still apear in the google index as URL only listings.

I know that URL only listings are not fully indexed pages but it still seems to me that google should not show at all the pages excluded by the robots.txt. I have seen plenty of URL only pages come up in "link:", "site:" searches and sometimes even in normal searches!

Last edited by zamolxes : 12-23-2004 at 05:59 AM.
zamolxes is offline   Reply With Quote
Old 12-23-2004   #11
fathom
Member
 
Join Date: Jun 2004
Location: Nova Scotia, Canada
Posts: 475
fathom is a jewel in the roughfathom is a jewel in the roughfathom is a jewel in the rough
Quote:
Originally Posted by Jeff Martin
Powerofeyes is right on. G is not indexing those pages. The presence of a URL in the SERPS just means G knows that a page at that URL exists.
This makes perfect sense. G would need to a record.

An example of why... for a link that it crawls from elsewhere prior to reading page code or robots.txt [the latter being the second thing it does on entering a domain - but only 'once per day']
fathom is offline   Reply With Quote
Old 12-23-2004   #12
zamolxes
Member
 
Join Date: Dec 2004
Posts: 23
zamolxes has a little shameless behaviour in the past
I think sometimes is a matter of opinion what exactly URL only listings are.
As I wrote above, I have seen them showing up occasionally in various types of searches including ordinary searches.

Last edited by zamolxes : 12-23-2004 at 06:07 AM.
zamolxes is offline   Reply With Quote
Old 12-23-2004   #13
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
GoogleGuy has said that pages that have a URL only in the search results are pages that Google knows about but has not spidered for some reason.

Rusty if you search for specific passages of text or the exact page title do these pages show up in the search results?
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Old 12-23-2004   #14
zamolxes
Member
 
Join Date: Dec 2004
Posts: 23
zamolxes has a little shameless behaviour in the past
What about the following google results (it took me 3 minutes to find some)?

http://www.google.com/search?q=joke&...&start=40&sa=N
http://www.google.com/search?q=politics
http://www.google.com/search?q=cazare

They all include URL only results. Googleguy doesn't always explain everything! Why would those pages show up if they were not fully spidered?
My point is we don't really know what/why exactly are URL listings only.

Also, looking for text on page is not always a great way to check if that page was indexed. Many Google results show pages without any of the searched words on page (hence the power of anchor text). Also, a page excluded in robots.txt will probably not have many or any links, will not be optimized, etc. Some will have not much text at all (or not really unique).

P.S. Google results keep changing! (I'm getting tired of having to edit the post because of it!)

Last edited by zamolxes : 12-23-2004 at 06:54 AM.
zamolxes is offline   Reply With Quote
Old 12-23-2004   #15
powerofeyes
Member
 
Join Date: Jun 2004
Location: IN
Posts: 110
powerofeyes is on a distinguished road
Quote:

I think sometimes is a matter of opinion what exactly URL only listings are.
As I wrote above, I have seen them showing up occasionally in various types of searches including ordinary searches.
Hello zamolxes,

there are various reasons for URL only listing in google, Blocking googlebot is not the only reason,

Following are the reasons I know of why it is URL only,

1. Duplicate contents across different pages of the site, This is kind of penalty where the page link is followed and the duplicate page is dropped and listed URL only,

2. Duplicate sites, a whole site sometimes become URL only if there are various duplicate sites with substantial duplicate content and duplicate template, This is very rare since the sites should be more than 96 to 100% identical,

3. BLocking googlebot from a specific page which have links from other pages, This implies to both blocking in robots.txt and via .htaccess and through meta tags,

4. blocking googlebot accidently from robots.txt or through any other method

5. Potential automated penalty for excessively outbount links to bad neighbourhoods or any sort of Search engine spam detected by an automated algorithm,

5. Some server error when googlebot tried crawling the site,

6. Server is very slow in responding to googlebot request for dynamic URLs or normal pages,

7. Server was down when googlebot tried to crawl the pages of a site,

8. A 302 redirect picked up from some go.php or out.php site and the target page is crawled with the source URL( discussed a lot as page hijacking and in other duplicate content isses ),

9. The site owner deletes the domain or let it to expire, Even this time the pages with become URL only since the pages dont exist any more on the target site, This is temporary though when the domain is expired for a long time the pages will eventually drop out,

We know from research the above reasons cause a URL only listing in google, If i know of more reason ill post it,
__________________
Search Engine Optimization - Website Promotion Services from Search Engine Genie
powerofeyes is offline   Reply With Quote
Old 12-23-2004   #16
zamolxes
Member
 
Join Date: Dec 2004
Posts: 23
zamolxes has a little shameless behaviour in the past
Quote:
there are various reasons for URL only listing in google, Blocking googlebot is not the only reason...
I know that, what I mean is actually exactly that: there are many reasons for it and what googleguy said doesn't cover all the possibilities and reasons.

The point is why would google show excluded pages at all? - there is no proof that url listings only are not indexed, on the contrary.

Last edited by zamolxes : 12-23-2004 at 07:20 AM.
zamolxes is offline   Reply With Quote
Old 12-23-2004   #17
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
If they are indexed they should have a cache, try to find a cache for any page with shows in Google as a URL only.

Why should they have that data? In order for them to know about that page there has to be at least one link in the database which points to the unindexed page, and that data has to be preserved if PR is going to be calculated correctly.
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Old 12-23-2004   #18
zamolxes
Member
 
Join Date: Dec 2004
Posts: 23
zamolxes has a little shameless behaviour in the past
Quote:
If they are indexed they should have a cache, try to find a cache for any page with shows in Google as a URL only.
Not necessarily. What about pages using the no archive meta tag?
Also why would then url only pages appear in search results (as according to what you're saying they are not indexed). How could they be in the google search results without being in the google index?
zamolxes is offline   Reply With Quote
Old 12-23-2004   #19
Mel
Just the facts ma'm
 
Join Date: Jun 2004
Location: Malaysia
Posts: 793
Mel is just really niceMel is just really niceMel is just really niceMel is just really nice
First off if pages have a no cache meta they won't be cached, but what does this have to do with the URL only links not having a cache???

Its easy enough for pages to rank on terms as a result of anchor text links pointing at them even though the page does not contain the term either on the visible page or in the header. While the page data may not be indexed the link pointing to that page from an indexed page is cached, and credited to the page it points to.
__________________
Mel Nelson
Expert SEO Dont settle for average SEO
Singapore Search Engine Optimization and web design
Mel is offline   Reply With Quote
Old 12-23-2004   #20
zamolxes
Member
 
Join Date: Dec 2004
Posts: 23
zamolxes has a little shameless behaviour in the past
What I meant is the following is not correct, pages without cache are in the index.

Quote:
If they are indexed they should have a cache...
I am aware of the Power of Anchor Text! That doesn't mean they could "put" a page not indexed by google in google search results! They could only after google has indexed it.

Last edited by zamolxes : 12-23-2004 at 08:35 AM.
zamolxes is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off