PDA

View Full Version : How does Yahoo clear dead links?


promarkweb
01-17-2006, 12:39 PM
So,

I'm trying out this Yahoo Site Explorer page and randomly running some domains I've worked on and know the file structure.

Interesting thing is that "Site Explorer" lists pages that were deleted over a year ago and return a standard error message.

Any ideas on how to remove old links from Yahoo's results?

PMWeb

Brian M
01-31-2006, 06:55 PM
“Any ideas on how to remove old links from Yahoo's results?”

I am replying since nobody else seems to have an answer, probably because Yahoo does not provide a URL removal tool, so pages tend to stay in their index forever.

The only way to remove a page completely is to delete the page and make certain that the robots can crawl in and see a 404 server header message.

Do not block the robots with the robots.txt file because the old page will remain in the index for an incredibly long time, even if you have deleted the page. I can see that your "/pdf/" folder is currently “Disallowed” but many of the files in the “site:” command are listed as being in that folder. Those files will remain in the index almost forever because the robots can no longer come in and update their cache information.

Also, be careful with 301 and 302 re-directs, because those are handled differently, depending upon the location of the original “source” page in relation to the new “target” page. If you are not careful, you may tell the robot to keep the old “source” page and ignore the re-direct altogether, and both pages can appear in the index as a result. Yahoo explains Slurp’s handling of re-directs here: http://help.yahoo.com/help/us/ysearch/slurp/slurp-11.html

Once a true 404 server header message is delivered (that the robot can see) it can take many months before the old page is removed.

However, it can take even longer if there is another page out on the web that links to the deleted page. This prolongs the agony because the robots keep finding the page via an external link. Use Yahoo’s Site Explorer http://siteexplorer.search.yahoo.com/ to look for links to a page that you have deleted. If you do find external links, your only recourse is to contact the site owner and ask that the link be removed.

Brian M

promarkweb
01-31-2006, 07:06 PM
Okay, thank you very much. I've moved all old content into other directories, items we've been waiting for "removal" for a while, and cleaned out my robots.txt file so that the fine search engines of old can go get the new and current information.

We'll see how much fun we have getting everything updated.

Such is life, do this, don't do that, no wait, do that and don't do this. I love being a designer.

Brian M
01-31-2006, 10:01 PM
And, you need incredible patience to become a master of SEO...

If you have moved things around, you may want to create "dummy" files that have recognizable page titles so you can easily spot the updates in each search engine cache. Once you see those page titles appear in the SERPs, then it is safe to delete the files and let the robots do their removal job (which will happen in a few months from now).

Also, just when you think you are finished, an "update" will bring those old pages back to haunt you...

Brian M

promarkweb
01-31-2006, 10:37 PM
Yeah...I know...nothing is easy...I've been working on this redesign of this site, and had to leave some of this for "legacy" concerns. Now that our current site is building traffic and we have the proper redirects in place, I can clear out that file and allow the engines to do their job.

I know that google allows you to "request" removal and it happens pretty quickly. Yahoo has been a "guessing game" for me. But we're getting there. All of the sites I've done for this company have been redesigned over the last 6 months.

We're now moving forward, doing things right and I'm always asking questions that people might know a new answer to. Like Yahoo adding a removal tool and so on.

promarkweb
02-03-2006, 12:05 PM
So...in a follow-up on this message, I was reading in Yahoo's help section that "We do not crawl or index any of the content found in "disallowed" pages." And this is supposed to be within my robots.txt file. But hopefully they'll get around to properly updating the directory.

Life is fun isn't it.

Then we die.