View Full Version : Quickest way to remove pages from index
Dantek
09-12-2008, 12:38 AM
What is the quickest way to remove pages/urls from google's index (not using the URL removal tool from google?
1. Using robots.txt
2. Adding a noindex meta tag
3. Redirect page to a subdirectory that's blocked via robots.txt
If a page is indexed and then never gets spidered again, will it eventually disappear from google's index? If so, how long before it expires?
JohnW
09-12-2008, 08:24 AM
1. Using robots.txt will not necessarily remove a page or even keep a new page out of the index. There are many examples of pages that are indexed (but with no cache or description) that are also blocked in robots.txt
2. No index will work better however until g crawls it again and sees the tag, the tag will not be effective. A 410 code may be better, depending on your but also it must wait for G to come back and see it.
3. A 301 redirect is very similar to a 410 in this context, but this may not be the best plan for quick removal. Like both above examples, there will be no effect until g comes back.
>If a page is indexed and then never gets spidered again, will it eventually disappear from google's index?
Pages disappear all the time even when you don't want them to ;-) But I think the short answer is that there is no set expiration date.
If you want to remove a page with certainty use removal tool and back it up with a 410, since removals expire after 6 months.
thurmax
09-12-2008, 10:54 AM
i think when a page doesn't exist anymore there will still be indexed pages.. but it will take some time before it disappears in the search engine specially if it is not optimized.
bikeman
09-13-2008, 07:16 AM
Another web developer recently took over a client of mine and rebuild the site with different page addresses.
I noticed that after deleting the old site they put a
<meta name="description" content="" />
on every new page.
Within a few days all new pages with the same address (as the old site) were updated without descriptions and any old pages were removed from the index.
They then put in new description tags on the new pages and voila Google listed the entire site with new page titles/descriptions.
Seemed pretty effective at cleaning things up for the new site and no messing about with 301's on old pages.
Dantek
09-15-2008, 02:28 AM
Thank you John!
So I guess the best option would be to go with the noindex (or 410).
I had tried the robots for a week, but that didn't seem to help. The noindex is now running for a week, and it seems to be working (very slowly).
My problem here is that I cannot use the URL removal tool (limitations) and the pages I want removed are not discoverable. I would have to make a sitemap of 50,000 url's to get them respidered (yikes).
Is there anything else I can do to get these url's removed asap?
JohnW
09-15-2008, 09:36 AM
I took a quick look at another thread
http://www.highrankings.com/forum/index.php?showtopic=37185
where you give much more details. I would say that you are on track and agree with Jill that slipping from #1 to #2 is not likely related to this issue (check your allin results to see where you should focus) but I also think you should correct it quickly before it does become a problem. Blocking the search form is a good place to start, also it may take some effort and I'm not sure it's worth it in this case, but the removal tool is the only option that will work prior to Gbot coming back and seeing the 404/noindex/disallow/410/301/whatever.
Dantek
09-16-2008, 12:56 AM
Going forward is not a problem because it is currently blocked.
Unfortunately, using the removal tool would mean submitting 1 url at a time, up to 100, wait for google to process, and then a few days later process another 100. I'm estimating I would be done by 2013.
Can you elaborate on this: "(check your allin results to see where you should focus)" ?
A site search yields 60,000 pages. Adding an allinurl shows 11,000 (normal). Adding an allintitle shows 25,000. Adding an allintext shows 45,000. Now I'm confused. The allinurl shows the good pages and seems to confirm that the bad pages are all in the supplemental (where they belong). The allintitle and the allintext shows about 65% of the crap. Does this make sense? If so, please explain it to me :-) Does the allinurl avoid displaying the supplementals? Does the allintitle or allintext show the supplementals, but only 2/3 of them? Whereas the site query is meant to show all. This must be by design from google....weird.
JohnW
09-16-2008, 08:00 AM
Sorry, I guess that was a bit of slang and not very complete. What I mean is do this- allinanchor:keyword
This can give you an idea of how your offpage optimization stacks up and may show that rather than loosing position because of a problem, perhaps a site has simply risen above you with better linking. Or not, but it will be good to see.
j0nyDzine
09-16-2008, 12:02 PM
I know Matt C has actually addressed this outright, and as of the most recent posting on this says that robots will not actually keep it from getting indexed, and that the Meta no index or from G Web Master tools is the only way to keep a page from getting indexed...
I end up having this discussion way too much, and there's always some guy out there saying that the robots will keep it from happening, but there's tons of examples out there to see this isn't true, and if ol' MC says it, it MUST be true! =)