View Full Version : Another Lost Yahoo SERPS
pm3500
11-27-2005, 05:36 PM
I'm another sorry soul who lost all SERPS in the latest Yahoo update.
I think I finally figured out the reason....duplicate content...however, I can not figure out the reason behind the reason.
There were two problamatic pages on my site
clipart and growing celery
The Yahoo search on the pages show 735 results for growing celery
http://search.yahoo.com/search?p=site%3Agreennature.com+growing+celery&ei=UTF-8&rls=org.mozilla%3Aen-US%3Aofficial&pstart=1&fr=moz2&dups=1
The Yahoo search on clip art shows 1,270
http://search.yahoo.com/search?p=site%3Agreennature.com+categores+clip+art&prssweb=Search&ei=UTF-8&fl=0&xargs=0&pstart=1&fr=moz2&dups=1
Now, I did an overhaul of the site, new software etc. and the growing celery page now automatically redirects to a 404, hopefully Yahoo will eventually get rid of all of those duplicate pages.
The clip art page is different.
http://greennature.com/clipart.php
It is a static page, not part of the CMS, just using the CMS wrapper.
For some reason, Slurp picked up all of those wild folders and files and kept spidering the same page under different urls.
Even with the site change, the urls were still coming out with a 200 ok header, all leading back to the main page, clipart.php
The only way I was able to get them to 404 was to do a 301 redirect
redirect 301 /clipart.php/ http://greennature.com/clipart.php
in .htaccess
The problem is that Slurp now looks at the page, shows a 301, then looks at the new 301 page and then shows the 404. It's a two step process.
My first question is how do I just make all those pages go 404 without having to use a double step 301 redirect?
My second question is why on earth did slurp do this in the first instance? I've never seen this before and my site was always spidered and indexed correctly by Yahoo.
P.S. Hello to Chris Sherman, my old Mining Co./About colleague
pm3500
11-29-2005, 02:52 AM
this is getting more curious day by day...
I've been working on the problem for two weeks now. The problem has been around for about a month.
One day after I post the url here all the duplicate clip art pages were removed from the directory.
I checked by clicking the link I provided above. One minute this evening as I was working on the problem with a RedirectMatch 301 command (which btw worked) Yahoo was showing 1,270 pages.
I checked on something else for two minutes and when I returned to the problem, Yahoo was showing 0 pages and no information found.
I don't know if that is good or bad. Is Yahoo dropping all of my indexed pages now and banning me?
BTW: could someone with admin permissions please delete the links I posted in the above post? The issue is now moot and I did not come here to get a link to my site. I have thousands of organic links to my site and I don't want any floating around here.
Marcia
11-29-2005, 03:41 AM
pm3500, welcome to the forums!
It's a good thing you posted the links, because while there were no pages returned for the search including categories (which had a typo) clip art there are 35 returned without categories - just the URL and clip art
http://search.yahoo.com/search?p=site%3Agreennature.com+clip+art&prssweb=Search&ei=UTF-8&fr=moz2&fl=0&x=wrt
pm3500
11-29-2005, 01:23 PM
There must have been a database switch overnight.
This morning the 1,270 pages returned on the "categores" search. That's the only reason I found the duplicates because of the mispelling, which was changed a few months back.
The link you posted has 3,850 results rather than the 35 you mentioned. I never found the pages using a plain "clip art" search as it is part of my navigation scheme and posted on all the pages.
I believe we saw a future Yahoo DB and my site is slowly withering, from 8,500 pages on the current site:domain.com search to the one I saw last night site:domain.com returned 1,400 pages.
Anyway, which is better to use to insure those bad urls are nixed forever, and there is no more duplicate content
the redirect 301 that uses the two steps I spoke about in the previous post
or
RedirectMatch 301 /page\.php/(.*) http://domain.com/page.php
Last night I changed to the RedirectMatch and all the urls now point to the original page (note I changed the page name and url on the RedirectMatch 301 to a generic form for the forum here but they are specific to my site for .htaccess.
Also, thanks for changing the link for me:)
Update: I realized that Yahoo Slurp created those 1,200 imaginary files and could possible create more with different versions of directory/directory/ file or directory/ directory/directory/file combinations, that would get endlessly caught in a 301 redirect
So, I finally decided to
RedirectMatch done (returns a 410)
the whole lot of them.
That way Slurp will discover that these files and any other files it might want to make up, are permanently trashed, done, discarded, finito, removed, kaput, etc., al.
Now it's onto the next issue I just discovered.
pm3500
12-13-2005, 01:52 PM
Sorry to be such a scootch, but I think that the urls that are found in the Yahoo links I posted in the first post of this topic are being picked up and spidered by the bots. I had some 1,500 attemps to spider from the MSN bot for those urls 2 days ago.
It is something I definitely do not want.
Could someone with admin permissions please remove those two links also.
pm3500
02-18-2006, 11:46 PM
It's been about two months since the last post here and I thought I'd update it to give others a glimpse into Yahoo SERPS and updates.
Currently my site has 8,870 pages listed.
The previous problems of having two different search terms listed with some 834 and 1,200 pages of duplicate content has been somewhat resolved.
Currently both phrases have only 34 pages listed for each. While that is still duplicate content it shows that the Yahoo DB has deleted almost 2,000 pages of duplicate content from my site.
How did I do this? Well, I overhauled my site with new software. I created a customized 404 page with a no index no follow tag, and by checking, it DOES return a 404 code in the headers. I also checked logs for a couple of days, seeing that Slurp also registered a 404 when it picked up the pages in question.
After overhauling my site I also changed the robots.txt page to allow the spiders to crawl previously blocked folders and files. They no longer exist and the spiders are picking up the 404 code in the headers.
The new problem with Yahoo, and I'm not sure it's a problem or not because I've seen so little written about it, as follows.
As I mentioned, Yahoo has 8,800 pages listed for my site and it is almost a complete listing. However, 95% of the listings are now:
title only
url only or
some strange combination of title and url where they just make up a title.
for example, one of the listings says..
a page about the military concern
domain.com/article651.html
the title "a page about the military concern" is not the title and the words do not even show up anywhere on the page.
how about this one
més informació
domain.com/article1312.html
(please note I changed the domain name to a generic "domain" so as not to create spam links to my site)
Spanish title? huh? again, no where to be found on the page or the entire site. There is no reference to Spanish language anywhere on my site
othere titles are very truncated, for example, a page with the title
"Iowa Backpacking, Camping and Hiking Guide"
will show up as "iowa" note the non-CAPS of the title, there's a good deal of that type of listing, i.e., truncated titles without the proper Capitalization
Conclusions....
Over the course of a couple of months, you can get Yahoo to delete non-existent pages from their index by carefully crafting your robots.txt and 404 error pages.
Yahoo reindexing of sites is very problematic.
I know it is not my new software causing the indexing problem because I have another site with the same software and Yahoo has absolutely no problem indexing that site properly.
Anyone with comments about the
title only
url only
made-up titles
listings for Yahoo.
Does that mean Slurp has found the pages and has not indexed them completely, but will do so in the future?
Does that mean Slurp decided it would do whatever it wants with my site and titles?
What does it mean?
Thanks