View Full Version : Slurp not deep crawling site www.cogz.com
I have a site www.cogz.com that only one page is listed in Yahoo. When I look at the server logs they show that Slurp is visiting the site each day but not deep crawling the site. The Slurp log entries are below.
BTW the robots.txt file was just added to try to get a deeper crawl.
Any suggestions are appreciated.
66.196.90.212 - - [28/Dec/2004:07:33:23 -0500] "GET /robots.txt HTTP/1.0" 404 1870 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
66.196.91.41 - - [28/Dec/2004:07:33:30 -0500] "GET / HTTP/1.0" 200 15118 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
Marcia
12-29-2004, 01:56 AM
What's this for?
http://www.cogz.com/robots.txt
User-agent: *
Disallow:
What are you telling them to do with that?
Dave Hawley
12-29-2004, 02:51 AM
I don't know the reason, but I doubt the robots.txt file is the cause.
Marcia
12-29-2004, 04:03 AM
I don't know the reason, but I doubt the robots.txt file is the cause.
Please substantiate that, as we are all anxious for verity and eager to look at empirical evidence when it's presented.
In a nutshell, when a Robot vists a Web site, say http://www.foobar.com/, it firsts checks for http://www.foobar.com/robots.txt. If it can find this document, it will analyse its contents for records like:
User-agent: *
Disallow: /
http://www.robotstxt.org/wc/exclusion.html
So what happens when a bot that respects the robots.txt exlusion encounters that? If there is another reason fine, but we need to deal with that first off. Right or not?
So what is that tellng bots to do with the site?
Dave Hawley
12-29-2004, 05:05 AM
It's telling the bots they have complete access. To stop bots from gaining any access, it would have to be Disallow: / NOT Disallow:
Marcia, it's all explained on the link you yourself have provided.
Marcia
12-29-2004, 07:32 AM
Right
To exclude all robots from the entire server
User-agent: *
Disallow: /
To allow all robots complete access
User-agent: *
Disallow:
Or create an empty "/robots.txt" file.
Which is at the site level, and a blank file has never been a problem. But it may not be the worst idea in the world to check with the host to see what they're doing at their level, just in case; it's been known to happen.
This is the second case I've seen in the last day or two with the same problem, so it can't hurt to double-check.
Dave Hawley
12-29-2004, 10:58 PM
Agree, however I would just delete the robots.txt file to find that out and in the interim look elsewhere.
powerofeyes
01-03-2005, 01:51 PM
Did you check for possible penalities from yahoo bot??, Check on that, we know one client site whose site has homepage only listing in yahoo,
We mailed yahoo asking for that, they replied saying the site had some sort of penalty,
Marcia
01-03-2005, 02:03 PM
Any chance of duplicates of pages being on other sites?
A while back we had duplicate sites used for PPC advertizing so we knew where the clicks were comming from. They no longer exist!
Any chance of duplicates of pages being on other sites?
powerofeyes
01-03-2005, 02:59 PM
A while back we had duplicate sites used for PPC advertizing so we knew where the clicks were comming from. They no longer exist!
Ok I think we are there now, This is one valid reason for yahoo bot not crawling your site, Try mailing yahoo and ask them what the possible issue be and if it is some sort of penalty you can ask for a reinclusion,
Nacho
01-03-2005, 03:05 PM
Yahoo! is getting much smarter when clawling and analyzing pages. In SES Chicago 2004 last december Jon Glick mentioned that if pages look very similar but have very minimal variations, they would be filtered and not indexed. He did not mention it being a penalty or similar. Perhaps just forcing webmasters and site owners to do the job right.
If this is your case, then I would strongly suggest you go back to all your pages and make them VERY unique. Place special focus on content, internal linking and external linking.
Marcia
01-03-2005, 03:23 PM
First things to check with a site - robots.txt, that the site has crawlable navigation that's uniform, and check for duplicate content.
I'd check and see how close some of those pages out there are, percentage wise
http://search.yahoo.com/search?fr=slv1-&p=cogz.com
Buddha
01-13-2005, 01:24 AM
Cogz, what do you mean only 1 pg is listed? When I do site:cogz.com, I see 55 pgs indexed.
Did you get your pages re-indexed?
The pages at www.cogz.com are starting to come back.
Thanks.
Buddha
01-13-2005, 10:57 PM
Did you do anything to get your pages back? ie. email yahoo, remove link partners, etc. What was the problem? I have a site in a similar situation.
Thanks.
Did you do anything to get your pages back? ie. email yahoo, remove link partners, etc. What was the problem? I have a site in a similar situation.
Thanks.
I am not completely sure what the problem was. I suggest that you completely follow Yahoo's guide lines. Then you can contact Yahoo via email.