PDA

View Full Version : Yahoo Crawler


PeteBKay
04-14-2006, 08:04 PM
Has anyone else experienced a surge of traffic since March 1 from the Yahoo crawler?

In March, fully 1/3 of some of my clients' sites are from Slurp. This is wreaking havoc with the numbers.

If anyone else knows why this is happening or had similar experience, I would be most grateful.

Regards

Marcia
04-14-2006, 08:49 PM
There crawler can be very active and aggressive at times. Have you also noticed any Yahoo crawling identified as being from Japan/China?

PeteBKay
04-18-2006, 09:43 AM
I haven't noticed a lot of Yahoo traffic from China or Japan...total traffic from those areas is less than 1% of the whole.

refer2me
04-26-2006, 03:42 PM
We've been getting clobbered by the Yahoo spiders -- as many as 10,000 visits per day and it's skewing our site metrics program. Particular traffic peaks on April 4, 12 and 19. We get so little traffic from Yahoo ( less than 5% of search engine referrals ) that we're considering blocking the IP

Discovery
04-28-2006, 01:13 PM
Would or could a crawler like this get through JS based error checking forms and submit them?

We have a ton of blank forms all coming from YSM each day.
The blank forms start heavy at Midnight, come in pairs and continue throughout the day. We get about 40-50 blank forms everyday..
The IP addresses are used twice one right after the other, but the collection of IPs in total seems to be random.

We turned off keywords - that helped for a day
Turned of campaigns that helped for a few days
Now we simply turned off the whole account.

We get a lot of traffic from Japan, China and Korea from Yahoo.

Submitted investigation requests 4 times, starting in Jan 06 - No response from Y.

Discovery

PeteBKay
04-28-2006, 01:53 PM
Discover, I think there were some virul bots that were roaming around the Net in the latter part of the year that could be responsible.

There are some ways you can thwart these bots; one way is to do some conditional statements in the POST code that do not submit the data unless certain conditions are met.

For example one thing that happened to us was tons of form submissions with our own domain name in every field, nothing else. We set a conditional statement to detect that and not submit the form data if that condition was met.

Anyways...my point is that it sounds like a malicious bot, not a real SE spider.

PeteBKay
04-28-2006, 01:56 PM
A further note...I added Yahoo's suggested code to my robots.txt file:

User-agent: Slurp
Crawl-delay: 10

Supposedly the "10" is the number of days to delay crawling. So far it hasn't worked.

Discovery
04-28-2006, 02:12 PM
We are making that modification as well as a few others to help stop the submissions. Now if I can determine if Y is charging me for these click throughs...

PS: I know they are ad based as all of our tracking code is passed to use from the ad.

Discovery