PDA

View Full Version : Big mistake with robots.txt


nza2k
03-31-2005, 11:45 AM
Hello,

I think I just made a big mistake with Google and I hope someone can help me.

The facts :

- The domain "http://mywebsite.com" was a duplicate content of "http://www.mywebsite.com" for many years.

- A few months ago, I made redirects 301 from "http://mywebsite.com/directoryX" to "http://www.mywebsite.com/directoryX" to suppress duplicate pages from Google index.

- Since "http://mywebsite.com" urls were still present in Google index (and since my site felt in Sandbox after Allegra), it appeared necessary to me to clean theses duplicate url asap.

- I created the file "http://mywebsite.com/robots.txt" (different from "http://www.mywebsite.com/robots.txt") to suppress all the url of this domain from Google index.

- I submitted this robots.txt this morning to Google urgent form

- Big mistake : even the url of the domain "http://www.mywebsite.com" were suppressed.

=> Did someone experience such a tragedy ?
=> What happenned in the next days ?
=> Is there any good way to manage it ?

Thank you and sorry for my "bad English"

semanticist
03-31-2005, 01:12 PM
My gut instinct is that you've corrected the problem too well.

Did you do a sitewide 301? That is, does EVERY page at http://mywebsite.com redirect to its corresponding page at ****//www.mywebsite.com If so, then Google probably isn't even seeing ****//mywebsite.com/robots.txt - it's probably being redirected to ****//www.mywebsite.com/robots.txt

I say delete ****//mywebsite.com/robots.txt and let the 301s do their work. It can take quite a while for all those pages to leave the index, so be patient.

nza2k
03-31-2005, 01:35 PM
Thanks Semanticist,

To submit "http://mywebsite.com/robots.txt", I used the Google form available there :

"http://services.google.com:8882/urlconsole/controller?cmd=reload&lastcmd=login"


I suspended redirects 301 from "http://mywebsite.com/directoryX" to "http://www.mywebsite.com/directoryX" before submitting "http://mywebsite.com/robots.txt"

To check the redirects were suspended, I simply typed the address "http://mywebsite.com/robots.txt" on my browser and I confirm there was no redirect to "http://www.mywebsite.com/robots.txt"

Anyway, since 15h (GMT+1), the command "site:www.mywebsite.com" returns no result whereas it returned ~400 000 pages a few hours before.

In fact, my question is :

- Do you think :
. Google will now consider "www.mywebsite.com" as a new domain (with a 3-6 months sandbox) ?
. Google will now index my pages as it used to, without the "new domain sandbox" (but still with an "Allegra-old-domain-Sandbox") ?

semanticist
03-31-2005, 02:50 PM
Darn. I thought I had it figured out. :)

Well, without knowing what exactly caused Google to drop your site, I think I would still hold to the same strategy.

I think I would:

1. Keep the 301s in place but change the robots.txt file so that your "disallow all" command is gone.
2. Resubmit the url of your robots.txt file to the Google URL you used before.

Are you sure there aren't any other on-page factors that could have led to this? Typically, "sandboxed" sites aren't characterized by a lack of indexed pages. Instead, they're generally fully indexed, but just ranking very poorly. If your site is more than a year or two old, as you say, it doesn't seem like a sandbox issue to me.

p.s. your English is not bad at all.

nza2k
04-01-2005, 03:56 AM
"Sandbox" isn't the right word I guess (my site is 4-5 years old).

What I meant is that in one day, I lost more than 60% of my google traffic (in early February).

Since duplicate content between my pages might be responsible for this, I started an agressive duplicate content hunt...

And now, I'm completely out of Google index.

Do you think I lost my PR when all my pages were withdrawed by robots.txt ?

nza2k
04-04-2005, 01:41 PM
Hi,


I updated my robots.txt on March, the 31st.

Since then, no page came back in Google index.

Do you think this situation will be lasting long ?

:(