|
#1
|
||||
|
||||
|
Google Not Obeying the NoIndex NoFollow Meta Tag
Take a look at http://www.google.com/search?q=site:...le.com&num=100
See those pages listed that look like http://www.seroundtable.com/mt-comme...de=red&id=1879 Those all have the <meta name="robots" content="noindex, nofollow"> in header. It is the built in redirect function to discourage comment spammers from spamming. anyway, why is Google not listening to it? Did I do something wrong that I am missing? |
|
#2
|
|||
|
|||
|
This relates to the unique class c thread but I've been trying very hard for over two years now to convince my clients and employees that this is a problem. I have a LOT of examples of no follow tags being followed anyway.
Oops. What an unfortunate series of events. Those darn glitches. |
|
#3
|
||||
|
||||
|
Mr. Massa,
Do you think you can expand a bit on that for me? I am sorry, crazy morning... ![]() |
|
#4
|
|||
|
|||
|
many on another board are saying that not even the robots.txt is being obeyed. Last time that happened, G announced that they have 8 billion pages...the same day MSN announced their beta. Maybe now they need to boost it to 10 billion. When is MSN releasing their SE?
Quote:
|
|
#5
|
|||
|
|||
|
Quote:
That has happened because you have the noindex,nofollow tag, What happens is google picks the URL and checks the tag and doesnt index it, It just stays as URL only, This is very common with google and pointed long back in various forums, A typical example,, http://www.google.com/search?q=inurl...start=300&sa=N |
|
#6
|
|||
|
|||
|
I would think that the Googlebot specific tag would work better?
|
|
#7
|
||||
|
||||
|
Quote:
|
|
#8
|
||||
|
||||
|
Quote:
I know of examples disallowing pages in the robots.txt but they are still being indexed. |
|
#9
|
||||
|
||||
|
Quote:
I am obviously oversimplifying. |
|
#10
|
|||
|
|||
|
Unfortunatelly many cases confirm that google shows at least some first level pages excluded by robots.txt as URL only listings.
I know of a few pages on different sites that can prove it and anyone can check. At first I thought that might have happened because some of them might have been excluded in the robots.txt (or the robots meta tag) after being indexed. However that is not the case - I know of 2 pages that have been excluded in the robots.txt from the very beginning - when the site was first uploaded - and they still apear in the google index as URL only listings. I know that URL only listings are not fully indexed pages but it still seems to me that google should not show at all the pages excluded by the robots.txt. I have seen plenty of URL only pages come up in "link:", "site:" searches and sometimes even in normal searches! Last edited by zamolxes : 12-23-2004 at 06:59 AM. |
|
#11
|
|||
|
|||
|
Quote:
An example of why... for a link that it crawls from elsewhere prior to reading page code or robots.txt [the latter being the second thing it does on entering a domain - but only 'once per day'] |
|
#12
|
|||
|
|||
|
I think sometimes is a matter of opinion what exactly URL only listings are.
As I wrote above, I have seen them showing up occasionally in various types of searches including ordinary searches. Last edited by zamolxes : 12-23-2004 at 07:07 AM. |
|
#13
|
|||
|
|||
|
GoogleGuy has said that pages that have a URL only in the search results are pages that Google knows about but has not spidered for some reason.
Rusty if you search for specific passages of text or the exact page title do these pages show up in the search results? |
|
#14
|
|||
|
|||
|
What about the following google results (it took me 3 minutes to find some)?
http://www.google.com/search?q=joke&...&start=40&sa=N http://www.google.com/search?q=politics http://www.google.com/search?q=cazare They all include URL only results. Googleguy doesn't always explain everything! Why would those pages show up if they were not fully spidered? My point is we don't really know what/why exactly are URL listings only. Also, looking for text on page is not always a great way to check if that page was indexed. Many Google results show pages without any of the searched words on page (hence the power of anchor text). Also, a page excluded in robots.txt will probably not have many or any links, will not be optimized, etc. Some will have not much text at all (or not really unique). P.S. Google results keep changing! (I'm getting tired of having to edit the post because of it!) Last edited by zamolxes : 12-23-2004 at 07:54 AM. |
|
#15
|
|||
|
|||
|
Quote:
there are various reasons for URL only listing in google, Blocking googlebot is not the only reason, Following are the reasons I know of why it is URL only, 1. Duplicate contents across different pages of the site, This is kind of penalty where the page link is followed and the duplicate page is dropped and listed URL only, 2. Duplicate sites, a whole site sometimes become URL only if there are various duplicate sites with substantial duplicate content and duplicate template, This is very rare since the sites should be more than 96 to 100% identical, 3. BLocking googlebot from a specific page which have links from other pages, This implies to both blocking in robots.txt and via .htaccess and through meta tags, 4. blocking googlebot accidently from robots.txt or through any other method 5. Potential automated penalty for excessively outbount links to bad neighbourhoods or any sort of Search engine spam detected by an automated algorithm, 5. Some server error when googlebot tried crawling the site, 6. Server is very slow in responding to googlebot request for dynamic URLs or normal pages, 7. Server was down when googlebot tried to crawl the pages of a site, 8. A 302 redirect picked up from some go.php or out.php site and the target page is crawled with the source URL( discussed a lot as page hijacking and in other duplicate content isses ), 9. The site owner deletes the domain or let it to expire, Even this time the pages with become URL only since the pages dont exist any more on the target site, This is temporary though when the domain is expired for a long time the pages will eventually drop out, We know from research the above reasons cause a URL only listing in google, If i know of more reason ill post it, |
|
#16
|
|||
|
|||
|
Quote:
The point is why would google show excluded pages at all? - there is no proof that url listings only are not indexed, on the contrary. Last edited by zamolxes : 12-23-2004 at 08:20 AM. |
|
#17
|
|||
|
|||
|
If they are indexed they should have a cache, try to find a cache for any page with shows in Google as a URL only.
Why should they have that data? In order for them to know about that page there has to be at least one link in the database which points to the unindexed page, and that data has to be preserved if PR is going to be calculated correctly. |
|
#18
|
|||
|
|||
|
Quote:
Also why would then url only pages appear in search results (as according to what you're saying they are not indexed). How could they be in the google search results without being in the google index? ![]() |
|
#19
|
|||
|
|||
|
First off if pages have a no cache meta they won't be cached, but what does this have to do with the URL only links not having a cache???
Its easy enough for pages to rank on terms as a result of anchor text links pointing at them even though the page does not contain the term either on the visible page or in the header. While the page data may not be indexed the link pointing to that page from an indexed page is cached, and credited to the page it points to. |
|
#20
|
|||
|
|||
|
What I meant is the following is not correct, pages without cache are in the index.
Quote:
Last edited by zamolxes : 12-23-2004 at 09:35 AM. |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
|
|