PDA

View Full Version : Google has a BIG problem !!


zaphod
12-28-2005, 02:46 PM
Greetings!

I am posting in frustration in the hope that someone from Google will see this:

There are actually two BIG problems: One is that if you have lots of old pages in Googles index it is apparantly impossible to remove them, the second is that if you choose a new domain name you risk being sandboxed or at least a long term drop in rankings.

Clearly Google has a huge problem with both allowing a sites domain name to be changed without effecting it's ranking and also ensuring the data in it's indexes is current.

Right now I have 772 pages in the google index that do no exist and have not for over 6 months and every single page is giving a 404 !

This is just one site (and after "Googling" this problem I know there are many people with this problem) imagine how badley this is effecting Googles relevancy, I don't think any surfer would be happy with a 404 instead of info on the term they searched for.

...come on Google this is a serious probem for both yourself, your customers/users and webmasters... how about fixing this, you have geniuses to spare from what I hear...this can't be too difficult to resolve!..can it ?

How about a simple web page with a space for a url and button that says "re-index me" !..a webmaster enters their domain and hits "re-index" and all existing pages are deleted and the site is re-spidered based on the "google map" ?

...I am sure I am over simplifying, but there has to be a solution to this problem

Anyone else want to chime in and see if we can nudge Google into taking action ?

Any other comments?...or better still has anypone found any solutions ?

Regards to all :-)

David Wallace
12-28-2005, 03:01 PM
Have you tried Google's Automatic URL Removal System? More info can be found here (http://www.google.com/intl/en/webmasters/remove.html#outdated).

promarkweb
12-28-2005, 03:05 PM
Interesting post, I have a similar problem, that I'm attacking with 301 redirect posts. Only issue is finding a way to redirect specific .htm files using the 301 redirect function. I'm doing this for some old content that has been updated with a new look. The new files are in google, as well as the old ones, so now I'm simply trying to redirect to the new areas.

I guess I did my job too well with the old pages as they are still getting a lot of traffic. I haven't tried simply deleting the old content, as I am concerned about losing the traffic.

Other issue is that we have used "meta redirects" to point the current files to new pages, but I'm concerned about the spam factor within the search engines. I'd love to her lots of input as well.

promarkweb
12-28-2005, 03:08 PM
David,

Thanks for the link, do you have a similar page for Yahoo?

AussieWebmaster
12-28-2005, 04:13 PM
If you do your redirects page by page and have them in your htaccess file you will pass the links to the new pages as well as all Google goodwill.

promarkweb
12-28-2005, 04:20 PM
Okay, so here is a stupid question. .htaccess file on windows server 2003 with IIS? How do I manage that one? Everything I can read shares that .htaccess won't work on an windows server running iis.

zaphod
12-28-2005, 04:21 PM
Sorry David I should have included the steps I have already taken to remove the invalid URL's.

Yes I have used the URL removal facility for several different URL's to see if it would work and. None of the links were removed after a month and it would take me a long time to reomove hundreds and hundreds of URL's this way anyway so I gave up on it and hoped eventually the pages would be removed due to the 404's.

BTW I just revisited the facility to make sure what you suggested (David) WAS the same page I had used (it was) and found that all the urls I had requested be deleted were now displayed as follows:

"2005-09-22 23:41:19 GMT :
removal of http://workingtraffic.com/main.cgi/Tube%20Fittings%20&%20Connectors/Asymmetrical%20Y%20Connectors/
request denied"

"Request Denied" being the important part of this message.

So, the plot thickens, it seems I am not allowed to delete my own pages? Maybe I was not throwing a 404 when I requested deletion?, but at the end of the day this should not matter, if I don't know when my own pages need to be deleted then I really wonder who does ??

I believe there is also a way to use robots.txt to remove/hide pages but apparantly all the pages come back again after 180 days, so I can't see much point in that.

I have also used 301's for several months which also failed.

I have also deleted the entire site whch also failed.

...next step is to submerge the entire server in a vat of hydroclauric acid, maybe that will work ?

I am starting to think the only way I can get Google ot remove my pages is to replace them with pics of Larry or Sergey in compromising corcumstances...maybe that would work :rolleyes: ...only joking :D ....don't have any pics anyway :-/

As you can see I am sadley starting to behave irrationally after many months of frustration.

Thanks again for your more reasonable suggestion though David.

zaphod
12-28-2005, 04:30 PM
Hi promarkweb,

I did a search for "htaccess +IIS +alternative" and after a very brief look the following page looked promising:

http://support.microsoft.com/?kbid=324064

promarkweb
12-28-2005, 04:30 PM
I only think the "180 days" removal using the robots.txt files is if you don't have access to the root robots.txt file. If you are placing the robots.txt file in another directory than the root, then submit that location to Google is the only time that they do the temporary removal.

At least, that is what I read into their note on robots.txt files. I'm using a combination of the robots.txt file and 301 redirects and hoping this will work. We just redid our site in the last 2 months and are still getting quite a bit of traffic to our old content.

promarkweb
12-28-2005, 04:32 PM
Zaphod, I'm going to try that. We'll see how it works. I think a 34 pronged approach will be needed to remove some of these pages.

I do think the conversation is a little crazy, when we're trying to get pages out, instead of pages into the search engines.

But at least I'll never lack for something to do with the domains I manage and the search engines.

zaphod
12-28-2005, 04:38 PM
promarkweb...yes the irony of the situation has not illuded me

...good luck with your own pages, I believe I have been trying to get OUT of the index for 6 months now :-/

zaphod
12-28-2005, 04:53 PM
I just double checked Googles instructions for the URL removal facility again and it says:


Note: If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. We'll accept your removal request only if the page returns a true 404 error via the http headers. Please ensure that you return a true 404 error even if you choose to display a more user-friendly body of the HTML page for your visitors. It won't help to return a page that says "File Not Found" if the http headers still return a status code of 200, or normal.

So 404 must be thrown for it to work, as I said not sure if that was the case when I did it, I will try just one again to make sure.

...BUT the first sentence specifically states that google should remove 404's when it re-crawls...this clearly is not happening.

zaphod
12-28-2005, 06:36 PM
OK I have configured the server (via htaccess) so that any request for my site results in 410 being thrown, which means "deliberatly permanently deleted" and oif you hit the URL the result is this:


Gone
The requested resource
/
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.
Apache/1.3.34 Server at www.workingtraffic.com Port 80

I have also deleted the entire site and used the URL removal facility in google to remove the entire domain (hope i can get the site re-indexed after this :-(

...so we will see if I am FINALLY successfull with this.

AussieWebmaster
12-28-2005, 06:45 PM
I seem to be a bit slow today... I hope you are not just jetisoning the site without forwarding the potential traffic through links that others have to it and the pages that the search engines have from it....

zaphod
12-28-2005, 07:07 PM
G'day Aussiewebmaster,

The site is really just in development and the only traffic it gets is from Google so there is nothing to loose really. Visitors from the SERPS will find the site just as useless now as they did before ;-)

Thanks for the thought anyway.

BTW I am an Aussie (albeit expat) webmaster too...hope the weather is kinder to you in OZ than it is in NY at this time of year....brrrrr

calebw
12-28-2005, 07:18 PM
I would be cautious of using a 410 HTTP Status code. I don't know if Google handles such status codes properly - does anyone else?

What it sounds to me is that your server wasn't actually returning a 404 status code. I've tested this hypothesis using the URL you gave in a previous post and this handy-dandy HTTP status code checker (http://www.searchenginepromotionhelp.com/m/http-server-response/code-checker.php). I entered the url:

http://workingtraffic.com/main.cgi/Tube Fittings & Connectors/Asymmetrical Y Connectors/

Note the spaces in your URL. Browsers will convert these spaces to %20 ASCII characters, but if you check the status code of the URL without those characters, take a look at what you get:

Server Response Code: 400
Additional Headers:
Date Wed, 28 Dec 2005 23:15:26 GMT
Server Apache/1.3.34 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.11 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
Connection close
Content-Type text/html; charset=iso-8859-1

Check your server settings for status code responses for these URL's. Or, better yet, as AussieWebmaster proposed, 301 the pages to another site if you have one. That will accomplish removing the old URL's from the Google index and gain you the traffic from any listings in the mean time.

calebw
12-28-2005, 07:20 PM
I have also deleted the entire site and used the URL removal facility in google to remove the entire domain (hope i can get the site re-indexed after this :-(

Did you really do this already? If you haven't pulled the trigger on this yet I'd say don't do it! Removing domains will make gaining future rankings difficult for at least several months.

All the best.

AussieWebmaster
12-28-2005, 11:13 PM
G'day Aussiewebmaster,

The site is really just in development and the only traffic it gets is from Google so there is nothing to loose really. Visitors from the SERPS will find the site just as useless now as they did before ;-)

Thanks for the thought anyway.

BTW I am an Aussie (albeit expat) webmaster too...hope the weather is kinder to you in OZ than it is in NY at this time of year....brrrrr

I am in NY as well... work downtown and live in Bay Ridge

projectphp
12-29-2005, 12:41 AM
I just double checked Googles instructions for the URL removal facility again and it says:
Why not just put up a robots.txt that excludes all these URLs? Then submit the robots.txt to Google, voila, URLs removed.

http://services.google.com:8882/urlconsole/controller?cmd=reload&lastcmd=login is the complete URL removal tools available. Quite frankly it is pretty comprehensive and I have NEVER EVER had a problem either using it nor having the URLs removed within days of the request.

promarkweb
12-29-2005, 12:35 PM
zaphod

thanks for the link to the microsoft page, using IIS all these years and never needed that capability till now, that's a huge help

regarding the robots.txt removal, hadn't tried that, but will give it a shot, hadn't seen the "link removal tool" before, so another resource added to my "bookmarks".

so, as of today, all of the old links are redirecting to new files, and our robots.txt file is updated, and we can move back to working on content again!

zaphod
01-08-2006, 02:18 PM
My appologies I thought this thread dried up as I received no more emails informing me of ongoign comments and just checked back now by chance ?

I will respond to previous posts in chronological order:

cablew (thanks for you thoughts)

410 did not work anyway, and I have now put them back to 404's. I had the site redirecting via 301 for about 6 weeks, that was the first thing I did when I noticed the problem. I did not want to leave it redirecting for longer as workingtraffic is my business site and if a prospective client checked it I did not really want them seeing an unrelated site :-/

All my URL's have spaces in them but they are URL encoded, if you check this Google SERP:

http://www.google.com/search?num=20&hl=en&lr=lang_en&safe=off&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&q=site%3Aworkingtraffic.com+%2BAsymmetrical&btnG=Search

You can see the URL that G has indexed and if you click on it (or run it through the "HTTP Server Response Code Checker" you should get a 404 ?

Aussiewebmaster

..is Bay Ridge on Long Island ?

projectphp

I think you are correct, that the URL removal works and that the URL's were not removed becuse I was not throwing a 404 at the exact time when I requested removal which made me loose faith in it's reliability.

I still think Google has a problem removing pages that have a 301 redirect as it was like this for a long time and the pages were never removed and I am not convinced it removes URL's that throw a 404 for an extended period of time either.

Anyway...I just checked the site again and all the incorrect page are still in the index :-/ but I believe that is becuase I asked google to remove everythign to do with http://workingtraffic.com (which once again (arghhh) was not giving a 404 (more haste less speed) so now I have requested that http://workingtaffic.com/main.cgi/ be removed which deffinately throws a 404 ...so third time lucky I hope..I will wait and see.

...anyway I think I have muddied the waters so much at this point that it is now difficult for me to draw any reliable conclusions.

I will post back when/if my bad URLs are finally dropped fromt he index.

Thanks for all help and suggestions.

projectphp
01-08-2006, 07:42 PM
I still think Google has a problem removing pages that have a 301 redirect as it was like this for a long time and the pages were never removed and..
But what does it matter? I mean, if you 301 redirect, the two URLs are now semantically the same, so it doesn't matter one bit if they aren't "removed", as all referrals redirect anyway.

Or am I mssing something?

zaphod
01-08-2006, 07:59 PM
I neglected to mention that the content indexed under the domain workingtraffic.com is for a totaly different site that I built for a client so I don't want traffic for unrelated content ariving at my site and wondering why on earth they were directed there by the SE's.

I just posted the [clients] content to workingtraffic as I had not used that domain at that time and it was convenient for the client to review there during develpopment... and of course as there were no incoming links I did not think it would be indexed...but it was! (no idea how) so now I have 470 pages of a totally unrelated site indexed under my domain name.

If I leave the redirect on nobody ever sees the workingtraffic site.

Hope that clears it up.